News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

ASM for FUN - #3 SUB

Started by frktons, April 22, 2010, 07:59:37 AM

Previous topic - Next topic

frktons

In this third step we are going to put togheter the aquired knowledge to compare
three methods we can use in PB to buid a new string starting from a given one.

1] we read the screen format from a file. The screen has the DOS format 2000 bytes for 25*80 characters
    and 2000 bytes for attributes totalling 4000 bytes. The screen format contains all the displayable ASCII
    chars with the relative ASCII CODE.
2] we convert it in 3 different ways [MID$ - PEEK/POKE - INLINE ASM] for 100 times each
3] we display the results in CPU cycles on a new screen format.

Stay tuned. I'm just a part-time ASM learner  :8)

Enjoy

Frank
Mind is like a parachute. You know what to do in order to use it :-)

frktons

Here we have the first part:

Read - convert and display a DOS screen on WIN32 console. The screen displays all the
ASCII character set.

Only the conversion is done in INLINE ASM, the remaining of the CODE
is just PB.  :8)

---------------------------------------------------------------------------------------------------------------------

SUB GetScreen

    #REGISTER NONE
    LOCAL lpReadRegion AS SMALL_RECT
    LOCAL SIZE AS DWORD

    DIM y AS STRING PTR, y1 AS STRING PTR
    sBuf = SPACE$(8000)

    y = STRPTR(MainStr)
    y1 = STRPTR(sBuf)

'---------------------------------
' Converted routine
'---------------------------------
'    FOR x = 1 TO 8000 STEP 2
'      POKE y1, PEEK(y)
'      y = y + 1
'      y1 = y1 + 2
'    NEXT x
'---------------------------------
    ! push ebx
    ! push esi
    ! push edi
    ! push eax

    ! mov ebx, 1000
    ! mov esi, y
    ! mov edi, y1
    Convert:
    ! mov eax, [esi]
    ! mov [edi], al
    ! mov [edi+2], ah
    ! bswap eax
    ! mov [edi+6], al
    ! mov [edi+4], ah
    ! add esi, 4
    ! add edi, 8
    ! dec ebx
    ! jnz Convert

    ! pop eax
    ! pop edi
    ! pop esi
    ! pop ebx

    SIZE = MAKDWD(80, 25)
    lpReadRegion.xRight = 79
    lpReadRegion.xBottom = 24
    WriteConsoleOutPut GetStdHandle(%STD_OUTPUT_HANDLE), _
    BYVAL STRPTR(sBuf),BYVAL SIZE, BYVAL 0&, lpReadRegion
END SUB 
-----------------------------------------------------------------------------------------------------------------


Inside the attached zip file there are:
1) complete source
2) compiled pgm
3) dos screen file

Enjoy

Frank
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

looks good Frank
i didn't see ASCII 0   :P

frktons

Quote from: dedndave on April 22, 2010, 01:24:50 PM
looks good Frank
i didn't see ASCII 0   :P

Neither did I, or anybody else.  :8)

Working on TestScreen to display the result of a chosen qt of conversions
with 3 different algos we talked about in #1.

Enjoy

Frank
Mind is like a parachute. You know what to do in order to use it :-)

frktons

The program is almost complete, maybe it can be a little bit optimized, but it
does what is needed.

If you'd like to check it and suggest some improvements, let me know.

Enjoy

Frank
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

20,000 cycles seems long, Frank

this assignment takes a considerable amount of time
is it something that you really want to include in the measured time ?

       sBuf = SPACE$(8000)

this seems like an unnecessary step, especially inside the timed loop
if you do want to put it there, consider this
       Convert:
       ! mov eax, [esi]
       ! mov [edi], al
       ! mov [edi+2], ah
       ! bswap eax
       ! mov [edi+6], al
       ! mov [edi+4], ah
       ! add esi, 4
       ! add edi, 8
       ! dec ebx
       ! jnz Convert

inside the loop where you write to sBuf, you could just as well write the odd bytes with spaces
       ! mov eax, [esi]
       ! mov edx,eax
       ! mov al,20h        ;space char
       ! bswap eax
       ! mov al,dl
       ! mov ah,20h        ;space char
       ! bswap edx
       ! mov [edi], eax
       ! mov ah,dl
       ! mov al,20h        ;space char
       ! bswap eax
       ! mov al,dh
       ! mov ah,20h        ;space char
       ! mov [edi+4], eax
       ! add esi, 4
       ! add edi, 8
       ! dec ebx
       ! jnz Convert

there are a couple extra BSWAPs, but also fewer memory writes, and they are dword accesses
and it eliminates the sBuf = SPACE$(8000) assignment
(make sure both string buffers are dword-aligned)
it may be faster - try it out  :bg

frktons

I tried, and also put outside the cycle things unnecessary inside, but the performance is
slower then the previous version:

--------------------------------------------------------------------------------------------------------
'--------------------------------------------------------------------------------
' Convert the DOS screen for the number of cycles chosen - ASM version
'-------------------------------------------------------------------------------------------------------------
SUB TestASM(BYVAL cycles AS LONG)

    #REGISTER NONE

    DIM y AS STRING PTR, y1 AS STRING PTR
    DIM P_sBuf AS STRING PTR, P_MainStr AS STRING PTR
    DIM x AS LONG

convert_main:


    ! push ebx
    ! push esi
    ! push edi
    ! push eax
    ! push edx

    sBuf = SPACE$(8000)
    P_sBuf = STRPTR(sBuf)
    P_MainStr = STRPTR(MainStr)

    FOR x = 1 TO cycles
       
       y = P_MainStr
       y1 = P_sBuf
       

       ! mov ebx, 1000
       ! mov esi, y
       ! mov edi, y1
       Convert:
       ! mov eax, [esi]
       ! mov edx,eax
       ! mov al,32        ;space char
       ! bswap eax
       ! mov al,dl
       ! mov ah,32        ;space char
       ! bswap edx
       ! mov [edi], eax
       ! mov ah,dl
       ! mov al,32        ;space char
       ! bswap eax
       ! mov al,dh
       ! mov ah,32        ;space char
       ! mov [edi+4], eax
       ! add esi, 4
       ! add edi, 8
       ! dec ebx
       ! jnz Convert


    NEXT x
   
    ! pop edx
    ! pop eax
    ! pop edi
    ! pop esi
    ! pop ebx

  END SUB   
'-------------------------------------------------------------------------------------------------------------


A question: how do I Align the data, as you suggested doing?
Mind is like a parachute. You know what to do in order to use it :-)

frktons

#7
Here is a better version putting out of the cycle unnecessary things and
trying to optimize a couple of other minor points.
The allocation of sBuf is not necessary at all, because it is overwritten
each time in the correct way, and spaces remains where they are.

Have a look at this one

Enjoy

Frank
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

                                                                      1,000
                                                          ┴¶
j◄ ☺•
                                                •                       ♣ ◘¶☻ ♦
⌡¶☺    @☺     ☺▄¶  ┴☺ ÿ↕  ⁿ¶  └@╒@┘@-@9@=@►@¶@↑@►   @ Ç             ☺ ☻ ♥ ♦ ♣ ♠
• ◘ ○ ◙ ♂ ☻ ■ ♣ ♥ ☺ ♦ ☺ ☻ ♦ ◘ ►   @ Ç               ♥á  ☺ ☻ ♥ ♦ ♣ ♠ ►   0 @ Ç 0
► @ ►                    ☺ ☻ ♦ ◘ ►   ♦☼ ≡         ☺ ☻ ◘     Ç   ☺ ☻ ♥     ☺ ☺ ☻
☻ ö ♣ ☺ ( ☻ SriePc    ☺ ☻ ☻ ♦ ♦ ☻ ♦ ◘ ►   ►   0 @ P ` p Ç É   ►   0 @ P ` p Ç ☺
☻     ☼ • ♂ ♥ • ↓       • ◘ ○ ◙ ♂ ♀ ♀ ♪ ♫ ☼ ► ◄ ↕ @ Ç                       ☺ ☺
☻ ♥ ♥ ♦ ♣ ♠ • ► 4@  D@ê D@  ∞¶Φ   ☺  x( Cneso etfrDSt I3 cen  & Hwmn ovrin ots?1
10) ♦ ##  ♫ TsSre.c       ☺ ♦               ÇÇ└└          ☻ ♦ ◘ ► @   Ç   ☺ ☻ ♥
♦ ± ■     ☺ ☻ ☼   ☺   ☺ ☻ ♥ ♦ ♣ ♣•♠•♫•W•♪☻♂☻♫☻◘☻○☻♥☻•☻◙☻♦☻♣☻☺☻♀☻♠☻☼☻☺ ☻ ◘ ►   @
P Ç             ♣ ♦ ♣ ☻ • ☺ ♠ ♥     ☺ ☺ ☻ ♥ ♦ ♣ ♠ • ◘ ○ ♂   @ Ç
      ◘ ☻ • ♣ ♠   ♦ ◙ ☺ ○ ♥ ☺ ☻ ♥ ►   @ Ç             ☺ ☻ ♥ ♦ ♣ ♠ • ◘ ○ ◙ ♂ ♀ ♪
♫ ☼ ► ☺ ☻ ♦ ♦ ◘ ►   @ Ç       ☺ ☺ ☻ ♥ ☺ ☻ ☺ ♥ ♦ ♣ ♠ • ◘ ○ ♂ ♀ ♪ ♫ ☼ ► ◄ ↕ ‼ ¶ ☺
☻ ♥ ♦ ♣ ♠ • ◘ ○ ◙ ♂ ♫ ☼ ☺ ☻ ♦ ◘ ►   @ Ç                 ☺   ☺     ☻   ☺ ☻ ♦ ◘ ►
  @ Ç             ☺ ☻ ♥ ♦ ♣ ♠ • ◘ ○ ◙ ♂ ☻ ■ ♣ ♥ ☺ ♦ ☺ ☻ ♦ ◘ ►   @ Ç
  ♥á  ☺ ☻ ♥ ♦ ♣ ♠ ►   0 @ Ç 0 ► @ ►                    ☺ ☻ ♦ ◘ ►   ♦☼ ≡
☺ ☻ ◘     Ç   ☺ ☻ ♥     ☺ ☺ ☻ ☻ ♥ ♥ ♦ ♣ ♠ ♠ • • ◘   ☺ ☻ ☻ ♦ ♦ ☻ ♦ ◘ ►   ►   0 @
P ` p Ç É   ►   0 @ P ` p Ç ☺ ☻ ♦ ◘ ► ↨   ☺ ☻ ♥ ♦ ♣ ♠ • ◘ ○ ◙ ♂ ♀ ♀ ♪ ♫ ☼ ► ◄ ↕
@ Ç                       ☺ ☺ ☻ ♥ ♥ ♦ ♣ ♠ • ◘ ○ ◙ ♂ ♂   ☺ ☻ ♥ ♦ ♣ ♠ • ◘ ○ ◙ ♂ ♀
♪ ► ◄ ↕ ‼ ¶ § ▬ ↨ ↑ ↓ → ← ∟ ↔ ▲ ▼ @ A B C D E F G H       ♠ ∟ ▼ § ☻ ► ◄ ☺ ☻ + S
¶ ►   ☺ ◘ ♥ ☺85,926,397   í ♪ ú   ◙ ♥ å7,864,551,195ó ⌐ º ¿ á à ª ñ7,369,890N

frktons

I didn't understand the "☻ ♥ ♥ ♦ ♣ ♠ • ◘ ○ ◙ ♂ ♂   ☺ ☻ ♥ ♦ ♣ ♠ • ◘ ○ ◙ ♂ ♀" instruction.  :P

Can you translate or disassemble it for me?  :lol

Are you sure the screen file "TestScreen.scn" is in the same folder of the exe?
I included it only in the first zip, it is always the same. :8)

Here it is again, only scn file screen...

Let me know
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

THAT's better - lol
┌──────────────────────────────────────────────────────────────────────────────┐
│                           T e s t S c r e e n          Conversions: 1,000    │
└──────────────────────────────────────────────────────────────────────────────┘
┌────────────────────────┐┌──────────────────────────┐┌────────────────────────┐
│                        ││                          ││                        │
│ Test with the Algo     ││ Test with the Algo       ││ Test with the Algo     │
│                        ││                          ││                        │
│ that uses PEEK to      ││ that uses MID$ to        ││ that uses ASM opcodes  │
│                        ││                          ││                        │
│ retrieve single bytes  ││ retrieve single bytes    ││ retrieving 4 bytes at  │
│                        ││                          ││                        │
│ from source string     ││ from source string       ││ a time into a CPU      │
│                        ││                          ││                        │
│ and POKE to copy the   ││ and concatenates the     ││ register and copy the  │
│                        ││                          ││                        │
│ single bytes into the  ││ single bytes into the    ││ single bytes into the  │
│                        ││                          ││                        │
│ destination string.    ││ destination string.      ││ destination string.    │
│                        ││                          ││                        │
│ Quite good speed.      ││ Quite poor speed.        ││ Very good speed.       │
│                        ││                          ││                        │
│                        ││                          ││                        │
├────────────────────────┤├──────────────────────────┤├────────────────────────┤
│CPU Cycles: 85,674,817  ││CPU Cycles: 24,141,711,180││CPU Cycles: 7,588,778   │

much faster, too   :U

frktons

With my poor actual knowledge of ASM I don't think I can do any better.
Maybe in the furure I'll be able to use some more advanced trick with MMX
registers, FPU, SIMD and the like.

Let's get ready for the #4 SUB. During the week-end it's going to rain, maybe
I'll put my hands on something new.  :P

Cheers

Mind is like a parachute. You know what to do in order to use it :-)