News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Strings?

Started by 2-Bit Chip, November 17, 2009, 03:11:36 AM

Previous topic - Next topic

2-Bit Chip

Why are there mnemonics for strings like: LODS, LODSB, REP, STOS, STOSB?

Can't just a few simple mov's work?
mov al, byte ptr [esi + ch]
inc ch
... ; edit the character, do work..
mov byte ptr [edi + ch], al

dedndave

#1
the string operations can be very fast
especially when you want to copy a large section of data or clear out a large area of memory
the ESI register points to the source and EDI points to the destination - they are incremented or decremented automatically for you
the ECX register holds the count if a REP prefix is used (REP repeat, REPZ repeat if zero, REPNZ repeat if not zero)
the direction flag controls up (CLD) or down (STD)
you can mov/scan/compare/load/store bytes, words, or dwords
there are also I/O instructions - somewhat useless
for some of the instructions, the AL/AX/EAX register is used for data
from Randy Hyde's Art of Assembly:
http://www.arl.wustl.edu/~lockwood/class/cs306/books/artofasm/Chapter_6/CH06-4.html#HEADING4-162

2-Bit Chip

Oh! That is neat! one instruction can make up three different ones. (REPNZ)  :dance:

dedndave

you should play with them a little bit - lol
here is an example - i want to make a copy of a string...

        cld
        mov     esi,offset source_string
        mov     edi,offset destination_string
        mov     ecx,number_of_bytes
        rep     movsb

it's a little faster for copying words or dwords, i think - it was faster on an 8088 to copy words, at least

here is another example - i want to clear out 32 Kb of memory...

        cld
        mov     edi,offset memory_to_clear
        xor     eax,eax
        mov     ecx,8192           ;8192 dwords = 32 Kb
        rep     stosd


2-Bit Chip

With just simple mnemonics, I can create this:
UpThree proc uses esi edi edx ecx lpszSrc:DWORD, lpszDest:DWORD, dwCount:DWORD

    mov esi, lpszSrc
    mov edi, lpszDest
    mov edx, dwCount
    xor ecx, ecx
    mov al, 3
@@:
    mov ah, byte ptr [esi + ecx]
    cmp ah, 0
    jz @F
    add ah, al
    mov byte ptr [edi + ecx], ah
    inc ecx
    cmp ecx, edx
    je @F
    jmp @B
@@:
    ret

UpThree endp


I just can't understand how to optimize it with those higher mnemonics (rep, lodsb)

dedndave

i am not sure the string instructions may be applied here - at least, not in a way to make things go faster
you could use lodsb and stosb for single bytes, but without the REP prefix, they are kinda slow
one thing i see is the way you maintain the loop count and branch at the end of the loop
the ECX register is traditionally used as a count register, so....

        mov     ecx,dwCount
.
.
loop_start:
.
.
        dec     ecx
        jnz     loop_start

that eliminates the need to compare ECX with EDX

the processor is happy when moving data in and out of AL, as opposed to AH
also - the base+index addressing slows you down a little....

    mov esi, lpszSrc
    mov edi, lpszDest
    mov ecx, dwCount
    mov ah, 3
@@:
    mov al,[esi]
    or al,al
    jz @F
    add al,ah
    inc esi
    mov [edi],al
    inc edi
    dec ecx
    jnz @B
@@:
    ret

you could make the thing run faster by accessing all data in 4-aligned dwords
it would take a lot more code, though - you have to sort out the first few bytes until you are 4-aligned
then, load dwords and, in register, sort out if any of the bytes are 0
then, add 3 to 4 bytes at a time and store them as (again, 4-aligned) dwords
you can see where the code gets messy - but it could make the routine run quite a bit faster
it would take 3 loops
one to handle a few bytes at the beginning
one to handle the bulk of the string in dwords
and another to handle may be misaligned bytes at the end
the fact that you look for a null terminator OR a terminal count really throws a wrench in the works - lol

2-Bit Chip

Oh! I see that "dec" can set a zero-flag.

Quotethe base+index addressing slows you down a little.
Yeah, it does. :red

Are you packing data (4-aligned) to speed things up? Because that is what it looks like.

dedndave

no - i just simplified your code - i am not packing anything - lol
the loop i posted is about as good as it will get while accessing the data as bytes
you have two different strings - one source - one destination
if one is aligned and the other is not, you're underwear will get all bunchy - lol

hutch--

Beware of the old string instructions, unless used in a very limited way they can be very slow. There is special case circuitry for REP MOVS? and a few others but used individually they are way off the pace.

In most instances incremented pointer code is faster and even the special case circuitry of REP MOVSD can be beaten by MMX/XMM instructions.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

ecube


dec ecx
jnz @B


nice trick,does it have to be ecx or can be any register?

hutch--

At a byte level here is the masm32 library procedure to do it.


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

comment * -----------------------------------------------
        copied length minus terminator is returned in EAX
        ----------------------------------------------- *
align 4

szCopy proc src:DWORD,dst:DWORD

    push ebp
    push esi

    mov edx, [esp+12]
    mov ebp, [esp+16]
    mov eax, -1
    mov esi, 1

  @@:
    add eax, esi
    movzx ecx, BYTE PTR [edx+eax]
    mov [ebp+eax], cl
    test ecx, ecx
    jnz @B

    pop esi
    pop ebp

    ret 8

szCopy endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: E^cube on November 17, 2009, 06:21:52 AM

dec ecx
jnz @B


nice trick,does it have to be ecx or can be any register?

It works with any register.

RuiLoureiro

#12
hutch,
             Why to use PUSH ESI,  MOV EAX, -1 ,  MOV ESI, 1  and pop esi ?
              It could be:

; note:  copied length minus terminator is returned in EAX
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
szCopy proc src:DWORD,dst:DWORD

    push ebp

    mov edx, [esp+8]                 ; src
    mov ebp, [esp+12]                ; dst
    xor   eax, eax
   @@:
    movzx ecx, BYTE PTR [edx+eax]
    mov [ebp+eax], cl
    add   eax, 1
    test ecx, ecx
    jnz @B
    sub    eax, 1
    pop ebp

    ret 8

szCopy endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

Rui

MichaelW

Rui,

You have an error in the instructions that access the parameters:

mov edx, [esp+12]
mov ebp, [esp+16]

Because you have only one push at the top of the procedure, they should be:

mov edx, [esp+8]
mov ebp, [esp+12]

eschew obfuscation

RuiLoureiro

Hi MichaelW,
                 Yes i know, that should be

mov edx, [esp+8]                ; src
mov ebp, [esp+12]                ; dst

                  i used copy-paste and i forgot args
Rui