regarding processing strings (character by character), i was wondering which of these would be more efficient (speed- and/or size-wise): using a reference for comparison or loading to a register first.MOV esi,pString
.WHILE BYTE PTR [esi] != 0
.WHILE BYTE PTR [esi] == ' ' ;skip leading spaces
INC esi
.ENDW
.IF BYTE PTR [esi] == '"'
;do stuff
.ELSEIF BYTE PTR [esi] != 0
;do more stuff
.ELSE
.BREAK
.ENDIF
INC esi
.ENDW
VS
MOV esi,pString
CLD
LODSB ;load character into al
.WHILE al != 0
.WHILE al == ' ' ;skip leading spaces
LODSB
.ENDW
.IF al == '"'
;do stuff
.ELSEIF al != 0
;do more stuff
.ELSE
.BREAK
.ENDIF
LODSB
.ENDW
on my first draft, i used the first but in an attempt to reduce the code and use the string primative mnemonics, i came up with the second since i was pretty much copying a string into a buffer. im planning on doing a third and would like to know which approach i should take or take a bit of both (as i am about to do).
Jeff,
Almost exclusively an incremented pointer style algo is faster than the old string instructions. LODSB runs reasonable with REP but is slow on most modern hardware without it. In many instances you can do the task wih less registers as well but on the other end, if its not really speed critical and a simple algo, you get smaller code using the old string instructions.
An algo of the type you are writing is usually faster without the higher level constructs and reasonably simple to code as well. If you have an algo in mind and don't mind having it beaten to death, post a working version in the Laboratory and it will usually get some interesting variations.
thanks hutch, i am not following any algorithms in particular, but rather "making it up as i go". im not aiming for "the fastest algorithm" nor the smallest, just trying to force myself into writing my code more efficiently: first a working one that makes the most logical sense, then improve on it a bit, then cut down on unnecessities. good practice IMO. the main reason i use the high level constructs is because i dont like making up label names and try to use the @@: wherever i can. i should probably grow out of that one soon. :) (i dont remember who said it but i agree, we need a plain smiley face :green)
i dont have my first draft around since i had to make drastic changes to get to the second and just tossed it. when i was coming up with my second draft, i was trying to keep in mind that using registers vs memory locations (and references) usually means a byte or two saved in code. here's my second and third drafts respectively.
second draftOPTION EXPR32
OPTION CASEMAP:NONE
.386
.MODEL FLAT
GetCommandLineA PROTO STDCALL
ExitProcess PROTO STDCALL, :DWORD
.DATA
PUBLIC argc,argv
PUBLIC argv0,argv1,argv2,argv3,argv4,argv5,argv6,argv7,argv8,argv9
argc DWORD 0
argv DWORD OFFSET argv0
argv0 DWORD 0
argv1 DWORD 0
argv2 DWORD 0
argv3 DWORD 0
argv4 DWORD 0
argv5 DWORD 0
argv6 DWORD 0
argv7 DWORD 0
argv8 DWORD 0
argv9 DWORD 0
buffer BYTE 208 DUP(0)
.CODE
_Start:
CALL _GetArgs
XOR eax,eax
PUSH eax
CALL ExitProcess
_GetArgs:
CALL GetCommandLineA
MOV esi,eax
MOV edi,OFFSET buffer
MOV ebx,argv
CLD
XOR eax,eax
LODSB
.WHILE al != 0
.WHILE al == ' ' ;skip all spaces between arguments
LODSB
.ENDW
.IF al == '"' ;"long" argument
@@: LODSB
.IF al == 0
RET
.ENDIF
MOV DWORD PTR [ebx],edi
INC argc
.WHILE al != '"'
.IF al == 0
RET
.ENDIF
STOSB
LODSB
.ENDW
.ELSEIF al != 0 ;"short" argument
MOV DWORD PTR [ebx],edi
INC argc
STOSB
LODSB
.WHILE al != ' '
.IF al == 0
RET
.ENDIF
.IF al == '"' ;premature start of a "long" argument
INC edi
ADD ebx,4
JMP @B
.ENDIF
STOSB
LODSB
.ENDW
.ELSE
RET
.ENDIF
INC edi
LODSB
ADD ebx,4
.ENDW
RET
END _Start
third draftOPTION EXPR32
OPTION CASEMAP:NONE
.386
.MODEL FLAT
GetCommandLineA PROTO STDCALL
ExitProcess PROTO STDCALL, :DWORD
.DATA
PUBLIC argc,argv
PUBLIC argv0,argv1,argv2,argv3,argv4,argv5,argv6,argv7,argv8,argv9
argc DWORD 0
argv DWORD OFFSET argv0
argv0 DWORD 0
argv1 DWORD 0
argv2 DWORD 0
argv3 DWORD 0
argv4 DWORD 0
argv5 DWORD 0
argv6 DWORD 0
argv7 DWORD 0
argv8 DWORD 0
argv9 DWORD 0
buffer BYTE 208 DUP(0)
.CODE
_Start:
CALL GetCommandLineA
MOV esi,eax
MOV edi,OFFSET buffer
MOV ebx,argv
CLD
.WHILE BYTE PTR [esi] != 0
.WHILE BYTE PTR [esi] == ' ' ;skip all spaces between arguments
INC esi
.ENDW
.IF BYTE PTR [esi] == '"' ;"long" argument
@@: INC esi
CMP BYTE PTR [esi],0
JE @F
MOV DWORD PTR [ebx],edi
INC argc
.WHILE BYTE PTR [esi] != '"'
CMP BYTE PTR [esi],0
JE @F
MOVSB
.ENDW
.ELSEIF BYTE PTR [esi] != 0 ;"short" argument
MOV DWORD PTR [ebx],edi
INC argc
.WHILE BYTE PTR [esi] != ' '
CMP BYTE PTR [esi],0
JE @F
.IF BYTE PTR [esi] == '"' ;premature start of a "long" argument
INC edi
ADD ebx,4
JMP @B
.ENDIF
MOVSB
.ENDW
.ELSE
JMP @F
.ENDIF
INC esi
INC edi
ADD ebx,4
.ENDW
@@: XOR eax,eax
PUSH eax
CALL ExitProcess
END _Start
Jeff,
Here is a suggestion for cleaning up the lead of a string.
mov esi, lpstr
sub esi, 1
@@:
add esi, 1
cmp BYTE PTR [esi], 32 ; normal space
je @B
cmp BYTE PTR [esi], 9 ; tab
je @B
By only jumping back on a match you don't have to test for zero and you can test for a TAB as well so you clean up any mess at the beginning of a string.
The gradient range of smilies are from cheeky to stroppy.
1. :P friendly smile with tongue sticking out.
2. :bg big cheesy grin.
3. :toothy cheesy grin with teeth.
4. :green cheesy grin with teeth turning green.
5. :green2 emphasised cheesy grin with teeth and turning green.
6.:cheekygreen: context dependent green cheesy grin with congratulations.
7. :lol smilie with a touch of sarcasm.
8. :bdg smilie with a LARGE touch of sarcasm.
Quote from: hutch-- on June 26, 2005, 06:14:40 AM
Jeff,
Here is a suggestion for cleaning up the lead of a string.
mov esi, lpstr
sub esi, 1
@@:
add esi, 1
cmp BYTE PTR [esi], 32 ; normal space
je @B
cmp BYTE PTR [esi], 9 ; tab
je @B
By only jumping back on a match you don't have to test for zero and you can test for a TAB as well so you clean up any mess at the beginning of a string.
should i really be concerned about finding tab characters? the procedure will only be reading characters at the command line and apparently, tabs cannot be entered (unless there was a ALT+NumPad combination for tab but it think its safe to ignore this case).
Jeff,
I just tested CMD.EXE in win2k and it takes tabs with no problem. Can be entered before or after a command or to seperate options.
If you are sure that the user will never use tabs, you can remove,
cmp BYTE PTR [esi], 9 ; tab
je @B
I also have no problem entering tabs under Windows 2000. Even if you could not actually enter them with the keyboard, you would still need to allow for input redirection.
ok it must be XP then. tab on a blank line cycles through all available file/folder names. i'll add the tab character for the sake of completedness.
i'll produce a fourth draft without using the high level constructs to see how that goes although, what i come up with will most likely look like what is generated in the listing file.
ok, just about finished with this (and no comments :P). without the stack frame. man, that made things very complicated but pulled through. :)
in a flat memory model, the .DATA and .FARDATA still work the same yes? i only used .FARDATA because i could give it a "name" and so far, the program didnt try to throw it back up.
OPTION EXPR32
OPTION CASEMAP:NONE
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
.386
.MODEL FLAT
INCLUDELIB Kernel32.lib
GetCommandLineA PROTO STDCALL
.FARDATA CommandLineToArgvA_DATA
args DWORD 20h DUP(0)
buff BYTE 1800h DUP(0)
.CODE CommandLineToArgvA_TEXT
CommandLineToArgvA PROC STDCALL, pArgv:DWORD
PUSH ecx
PUSH edx
PUSH esi
PUSH edi
CALL GetCommandLineA
MOV esi,eax
MOV edi,OFFSET buff
MOV edx,OFFSET args
MOV eax,DWORD PTR [esp+20]
MOV DWORD PTR [eax],edx
XOR eax,eax
XOR ecx,ecx
CLD
GetArg:
CMP BYTE PTR [esi],0
JE Done
JMP NoWhiteSpace
WhiteSpace:
INC esi
NoWhiteSpace:
CMP BYTE PTR [esi],' '
JE WhiteSpace
CMP BYTE PTR [esi],9
JE WhiteSpace
CMP BYTE PTR [esi],0
JE Done
CMP BYTE PTR [esi],'"'
JNE ShortArg
LongArg:
MOV DWORD PTR [edx],edi
INC ecx
INC esi
LongArgStart:
CMP BYTE PTR [esi],'"'
JE LongArgEnd
CMP BYTE PTR [esi],0
JE Done
MOVSB
JMP LongArgStart
LongArgEnd:
JMP NextArg
ShortArg:
MOV DWORD PTR [edx],edi
INC ecx
ShortArgStart:
CMP BYTE PTR [esi],' '
JE ShortArgEnd
CMP BYTE PTR [esi],0
JE Done
CMP BYTE PTR [esi],'"'
JNE StillShort
STOSB
ADD edx,4
JMP LongArg
StillShort:
MOVSB
JMP ShortArgStart
ShortArgEnd:
NextArg:
INC esi
STOSB
ADD edx,4
JMP GetArg
Done:
MOV eax,ecx
POP edi
POP esi
POP edx
POP ecx
RET 4
CommandLineToArgvA ENDP
END