Hi,
I was seeing converters ascii-to-integer. I found ATODW in m32lib. It uses «lea ecx, dword ptr [eax+10*ecx]» (in two instructions). But:
1. It has 3 instructions that do nothing: push edi, pop edi and xor eax, eax;
2. It uses 2D in turn of 2Dh ( minus signal ) and not 2Bh (+);
3. We have no control over the result;
4. We have no control over the buffer contents;
Here is what i have in m32lib folder:
................................................................................
atodw proc String:DWORD
push esi
push edi
xor eax, eax
mov esi, [String]
xor ecx, ecx
xor edx, edx
mov al, [esi]
inc esi
cmp al, 2D
jne proceed
mov al, byte ptr [esi]
not edx
inc esi
jmp proceed
@@:
sub al, 30h
lea ecx, dword ptr [ecx+4*ecx]
lea ecx, dword ptr [eax+2*ecx]
mov al, byte ptr [esi]
inc esi
proceed:
or al, al
jne @B
lea eax, dword ptr [edx+ecx]
xor eax, edx
pop edi
pop esi
ret
atodw endp
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
When i saw «lea ecx, dword ptr [ecx+4*ecx]» i thought «good, it can be "cheese"». But quickly i found out it is ... a "trap" (we cannot control the result because LEA doesnt affect any flag)
Here is my code
; To call: invoke AtoDW, ADDR String ;[String db "??? ...",0 ]
; Out: clc=> OK; stc=> error
AtoDW proc pString:DWORD
push esi
mov esi, pString ; String pointer
xor ecx, ecx ; the result
xor edx, edx ; the sign to the result
mov al, byte ptr [esi] ; get first byte
cmp al, 2Bh ; plus ?
je _nAtoDW
cmp al, 2Dh ; minus ?
jne _iAtoDW
not edx
je _nAtoDW ; get next
@@: cmp al, " "
je _nAtoDW ; get next
; -----------------------------------------
; we must Control chars 30-39. If not error
; -----------------------------------------
; jc @F if chars not between 30-39
sub al, 30h
lea ecx, dword ptr [ecx+4*ecx]
lea ecx, dword ptr [eax+2*ecx]
_nAtoDW: inc esi
mov al, byte ptr [esi] ; get next byte
_iAtoDW: or al, al
jne @B
lea eax, dword ptr [edx+ecx]
xor eax, edx
clc
@@: pop esi
ret
AtoDW endp
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I saw this too.
From Tutorial-fputute chapter 13 ( by Raymond ) i made this modifications:
( Raymond says in his page that we can - how are you ?)
;*******************************************************************************
; atofl
;*******************************************************************************
; lodsb can be substituted by:
; mov al, byte ptr [esi]
; inc esi
atofl:
push ebx ;preserve EBX and ESI
push esi
lea esi,buffer1 ;use ESI as pointer to text buffer
xor eax,eax
xor ebx,ebx ;will be used as an accumulator
xor ecx,ecx ;will be used as a counter
;************************************************
; Skip leading spaces without generating an error
;************************************************
@@:
lodsb ;get next character
cmp al," " ;check if a space character
jz @B ;repeat until a non-space character is found
;*********************************************
; Check 1st non-space character for a +/- sign
;*********************************************
cmp al,"-" ;is it a "-" sign
je atoflerr
; jnz @F
;atoflerr:
; xor eax,eax ;set EAX to error code
; pop esi ;restore the EBX and ESI registers
; pop ebx
; ret ;return with error code
; @@:
cmp al,"+" ;is it a "+" sign
jnz short @F
; jnz nextchar
nextchar:
lodsb ;disregard a "+" sign and get next character
;***********************************************************
; From this point, space and sign characters will be invalid
;***********************************************************
;nextchar:
@@:
cmp al,0 ;check for end-of-string character
jz endinput ;exit the string parsing section
cmp al,"." ;is it the "." decimal delimiter
;other delimiters such as the "," used in some
;countries could also be allowed but would need
;additional coding to make it more generalized
jnz @F
;******************************************************************
; Only one decimal delimiter can be acceptable. The sign bit of ECX
; is used to keep a record of the first delimiter identified.
;******************************************************************
or ecx,ecx ;check if a delimiter has already been identified
js atoflerr ;exit with error code if more than 1 delimiter
stc ;set the carry flag
rcr ecx,1 ;set bit31 of ECX (the sign bit) when
;the 1st delimiter is identified
; lodsb ;get next character
jmp nextchar ;continue parsing
;***********************************************************************
; All ASCII characters other than the numerical ones will now be invalid
;***********************************************************************
@@:
cmp al,"0"
jb atoflerr
cmp al,"9"
ja atoflerr
sub al,"0" ;convert valid ASCII numerical character to binary
xchg eax,ebx ;get the accumulated integer value in EAX
;holding the new digit in EBX
mul factor10 ;multiply the accumulated value by 10
add eax,ebx ; and add the new digit
xchg eax,ebx ;store this new accumulated value back in EBX
or ecx,ecx ;check if a decimal delimiter detected yet
js @F ;jump if decimal digits are being processed
;*************************************
; Integer digits still being processed
;*************************************
cmp ebx,100 ;verify current value of integer portion
jbe nextchar ;continue processing string characters
; ja atoflerr ;abort if input for annual rate is > 100%
atoflerr:
xor eax,eax ;set EAX to error code
pop esi ;restore the EBX and ESI registers
pop ebx
ret ;return with error code
; lodsb ;get next string character
; jmp nextchar ;continue processing string characters
;*******************************************************
; The CL register is used as a counter of decimal digits
; after the decimal delimiter has been identified
;*******************************************************
@@:
inc cl ;increment count of decimal digits
; lodsb ;get next string character
jmp nextchar ;continue processing string characters
;***********************************
; Parsing of the string is completed
;***********************************
endinput:
or ebx,ebx ;check if total input was equal to 0
jz atoflerr ;abort if annual rate input is 0%
finit ;initialize FPU
push ebx ;store value of EBX on stack
fild dword ptr[esp] ;-> st(0)=EBX
add cl,2 ;increment the number of decimal digits
;to convert from % rate to a decimal rate
shl ecx,1 ;get rid of the potential sign "flag"
shr ecx,1 ;restore the count of decimal digits
fild factor10 ;-> st(0)=10, st(1)=EBX
@@:
fdiv st(1),st ;-> st(0)=10, st(1)=EBX/10
dec ecx ;decrement counter of decimal digits
jnz @B ;continue dividing by 10 until count exhausted
fstp st ;get rid of the dividing 10 in st(0)
;-> st(0)=annual rate (as a decimal rate)
pop ebx ;clean CPU stack
pop esi ;restore the EBX and ESI registers
pop ebx
or al,1 ;insure EAX != 0 (i.e. no error detected)
ret
;*******************************************************************************
stay well
The "xor eax, eax" is to prevent a register stall in the following use of AL. The PUSH/POP of EDI appears to be a left over from the last time Alex did some work on it and it is not needed but I doubt it slows anything up much.
Hi Rui,
Just to illustrate the effects that Hutch is referring to:
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.586 ; create 32 bit code
.model flat, stdcall ; 32 bit memory model
option casemap :none ; case sensitive
include \masm32\include\windows.inc
include \masm32\include\masm32.inc
include \masm32\include\kernel32.inc
includelib \masm32\lib\masm32.lib
includelib \masm32\lib\kernel32.lib
include \masm32\macros\macros.asm
include timers.asm
atodw_no_xor_eaxeax PROTO :DWORD
atodw_no_pushpop_edi PROTO :DWORD
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
teststr db "123456789",0
.code
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
LOOP_COUNT EQU 10000000
counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
invoke atodw, ADDR teststr
counter_end
mov ebx,eax
print chr$("atodw : ")
print ustr$(ebx)
print chr$(" cycles", 13, 10)
counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
invoke atodw_no_pushpop_edi, ADDR teststr
counter_end
mov ebx,eax
print chr$("atodw_no_pushpop_edi : ")
print ustr$(ebx)
print chr$(" cycles", 13, 10)
counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
invoke atodw_no_xor_eaxeax, ADDR teststr
counter_end
mov ebx,eax
print chr$("atodw_no_xor_eaxeax : ")
print ustr$(ebx)
print chr$(" cycles", 13, 10)
mov eax, input(13, 10, "Press enter to exit...")
exit
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
; Copies to play with.
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
atodw_no_pushpop_edi proc String:DWORD
; ----------------------------------------
; Convert decimal string into dword value
; return value in eax
; ----------------------------------------
push esi
;push edi
xor eax, eax
mov esi, [String]
xor ecx, ecx
xor edx, edx
mov al, [esi]
inc esi
cmp al, 2D
jne proceed
mov al, byte ptr [esi]
not edx
inc esi
jmp proceed
@@:
sub al, 30h
lea ecx, dword ptr [ecx+4*ecx]
lea ecx, dword ptr [eax+2*ecx]
mov al, byte ptr [esi]
inc esi
proceed:
or al, al
jne @B
lea eax, dword ptr [edx+ecx]
xor eax, edx
;pop edi
pop esi
ret
atodw_no_pushpop_edi endp
atodw_no_xor_eaxeax proc String:DWORD
; ----------------------------------------
; Convert decimal string into dword value
; return value in eax
; ----------------------------------------
push esi
push edi
;xor eax, eax
mov esi, [String]
xor ecx, ecx
xor edx, edx
mov al, [esi]
inc esi
cmp al, 2D
jne proceed
mov al, byte ptr [esi]
not edx
inc esi
jmp proceed
@@:
sub al, 30h
lea ecx, dword ptr [ecx+4*ecx]
lea ecx, dword ptr [eax+2*ecx]
mov al, byte ptr [esi]
inc esi
proceed:
or al, al
jne @B
lea eax, dword ptr [edx+ecx]
xor eax, edx
pop edi
pop esi
ret
atodw_no_xor_eaxeax endp
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
Results on my P3:
atodw : 57 cycles
atodw_no_pushpop_edi : 52 cycles
atodw_no_xor_eaxeax : 131 cycles
[attachment deleted by admin]
Hi MichaelW,
Thanks for the demo.
Here are the results on my P4 2.66 GHz:
Quote
atodw : 54 cycles
atodw_no_pushpop_edi : 64 cycles
atodw_no_xor_eaxeax : 60 cycles
Michael, will you modify your timers macros for P4 or are they OK to use them on a P4?
Thanks,
Erol
AFAIK there are no problems on P4 with those macros.
Your results are much different probably because P4 handles partial register access unlike P3.
Erol,
The need to clear the register with XOR reg, reg or alternatively SUB reg, reg is not so noticable on a PIV but if you want code that runs on everything properly, it must be there.
The 2D instead of 2Dh is very likely a bug.
The 2D becomes 2 when assembled, which wont match a - (minus sign).
MazeGen, Hutch
Thanks for your replies.
Erol,
The macros contain no processor-specific code, so AFAIK the results are equally valid for all of the processor families. Agner Fog states in his Pentium optimization manual that the P4 was designed to store the whole register together, instead of splitting it into separate temporary registers as for the PPro, P2, and P3, to avoid the "serious delay whenever there was a need to join different parts of a register into a single full register." This seems to me to indicate that a large timing difference should be expected.
Hi Michael,
Thanks for the technical info :U
Just a note on atodw, it was designed to handle DWORD rather than LONG values so it was never pointed at negative numbers. For the signed version, there is an algo written by Ray Filiatreault called "atol" that handles signed conversions.
Hi all
Here are the results on my P3:
atodw : 58 cycles [+1 ]
atodw_no_pushpop_edi : 53 cycles [+1 ]
atodw_no_xor_eaxeax : 131 cycles
Hi Hutch,
How are you ? I hope you are fine.
Yes, "xor eax, eax" is needed. So, it must be there [no HomeWork rule]
2D, as noted by dSouza123, is a bug. But if it has 2Dh (-), why not 2Bh (+) ?
I am guessing that when you come to our topics, many people want to see what you say. When we have not your help, sometimes, it is more difficult.
Thank you.
Hi Erol,
Are you fine ? I hope. Thanks for the contribution.
The case [P4 2.66 G ]atodw_no_xor_eaxeax: 60 cycles against
Michael case [P3 ]atodw_no_xor_eaxeax: 131 cycles is mysterious !
Hi Michael,
How are you getting along ? Thanks for your example (i will use it in other cases). In this case it is important because we can have hundred of strings to convert in one single loop (or task ). If the difference is 5 cycles (with push-pop and without), in 100 we have 500 cycles or in 200 the difference is 1000 cycles ( best case ).
I noticed one strange case: atodw_no_xor_eaxeax gives 131 cycles !!! Why this ? What is the explanation ? What i know is that without "xor eax, eax", the procedure is wrong.
Here is the corrected code [i call BufToInt in turn of AtoDW]
; In: pString => string pointer
;
; Out: clc => OK the result is in EAX ( but can be wrong -overflow problems )
;
; stc => char is not valid
;
; Info:
; 1. The string must terminated by 0;
; 2. The string can contain spaces between digit codes;
; 3. The first char. can be «-» or «+»;
; 4. We have no overflow control in the EAX result;
; 5. Destroy the contents of ECX, EDX.
;
; To call: invoke BufToInt, ADDR String ;[String db "??? ...",0 ]
;
BufToInt proc pString:DWORD
push esi
mov esi, pString ; String pointer
xor eax, eax
xor ecx, ecx ; the result
xor edx, edx ; to sign the result
mov al, byte ptr [esi] ; get first byte
cmp al, 2Bh ; plus ?
je _nBufToInt
cmp al, 2Dh ; minus ?
jne _iBufToInt
not edx ; doesnt affect flags
je _nBufToInt ; get next
@@: cmp al, " "
je _nBufToInt ; get next
cmp al,"9"
jbe _tBufToInt
_rBufToInt: stc
pop esi
ret
_tBufToInt: sub al, 30h ; most signif. byte=0
jc short _rBufToInt
lea ecx, dword ptr [ecx+4*ecx]
lea ecx, dword ptr [eax+2*ecx]
_nBufToInt: inc esi
mov al, byte ptr [esi] ; get next byte
_iBufToInt: or al, al
jne @B
lea eax, dword ptr [edx+ecx]
xor eax, edx
clc
pop esi
ret
BufToInt endp
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
About «atol», i have this code
atol proc lpSrc:DWORD
xor eax, eax
xor ecx, ecx
mov edx, lpSrc
sub edx, 1
@@:
add edx, 1
cmp BYTE PTR [edx], 32
je @B
cmp BYTE PTR [edx], 9
je @B ; [ strip spaces and tabs ]
mov al, [edx] ; [ begin ]
add edx, 1
.if al == "-"
add ecx, 1
mov al,[edx]
add edx, 1
.elseif al == "+"
mov al, [edx]
add edx, 1
.endif
push ecx ; keep sign on stack
xor ecx,ecx
@@:
sub al,"0"
jc @F
lea ecx, [ecx+ecx*4]
lea ecx, [eax+ecx*2]
mov al, [edx]
add edx, 1
jmp @B
@@:
mov eax,ecx
pop ecx ; retrieve sign
shr ecx,1
jnc @F
neg eax
@@:
ret
atol endp
Where
.if al == "-"
add ecx, 1
mov al,[edx]
add edx, 1
.elseif al == "+"
mov al,[edx]
add edx, 1
.endif
should be [ HomeWork rule ! ]
.if al == "-" ; if not, the sign is ecx=0
add ecx, 1
.endif
mov al,[edx]
add edx, 1
........................................
All best things to all of you
Stay well
RuiLoureiro, this could be the reason why removing the
XOR EAX, EAX causes a slowdown:
Quote from: hutch-- on May 28, 2005, 01:53:50 AM
The "xor eax, eax" is to prevent a register stall in the following use of AL.
Cheers, :U
QvasiModo
hmmmm,
Quote
Hi Hutch,
How are you ? I hope you are fine.
Yes, "xor eax, eax" is needed. So, it must be there [no HomeWork rule]
2D, as noted by dSouza123, is a bug. But if it has 2Dh (-), why not 2Bh (+) ?
I am guessing that when you come to our topics, many people want to see what you say. When we have not your help, sometimes, it is more difficult.
Thank you.
Thanks but I already knew about the Intel optimisation since they published it for the PIII many years ago. The only BUG in the algo is a user bug of trying to use an UNSIGNED algo for signed values, as posted before, use ATOL for signed values.
I am not sure of the point you ae trying to make with comments about the forum rules but they are in place for a reason which is to protect our members from nonsense and this will not be changed. Keep it up and the posting
WILL be changed.
Normal people don't add + to a number if it is a positive number. It is rare that we add a plus sign in front of the number and the only cases I can think of is oxidation number of an element.
On the other hand, negative sign tells us that the number is negative.
Therefore I think the cmp with plus sign is useless and slows down the code. Personally I don't like to see too many branches as it would slow down the code.
Hi
Hutch,
Thank you.
1. I never comment forum rules this way. Its an interpretation question.
It isn't a comment about that rule. That rule in your topic is a rule ( and i agree with it).
If i want to comment it i will go there to comment.
As you know, i said something there and i am not against the rule. I think this is the rule.
2. Sorry, if you have another interpretation.
3. What you are saying is that this 2 instructions
"cmp al, 2D"
"jne proceed"
are related with optimisation. It doesnt look like but may be. I can
say i dont know anything about optimisation questions.
4. Sorry, but i am not sure about this: «Keep it up and the posting WILL be changed.»
QvasiModo,
It seems.
Roticv,
I dont agree. When we want to use it after getting the string from the keyboard, we can type "+245...". Why not ?
I say the same you said: «Personally I don't like to see too many branches» but when we need to compare we should use the instructions unless we have another algo.
regards
I think humans are by nature lazy. We would prefer to keep things short and simple (K.I.S.S). I really doubt anyone would put a plus sign in front of their numbers. That's what probably The Svin thought of too when creating the routine. Anyway, if I remember correctly, the algorithm cannot handle certain numbers - need to check that out.
Maybe there are other approaches to the routine. I will think about it (hopefully) and get back to it.
Hmmmmmmm ?
Quote from: roticv on May 31, 2005, 03:49:38 PM
We would prefer to keep things short and simple (K.I.S.S).
Quote
When somebody uses computer names like «K.I.S.S.» coming from Keep Things Short and Simple, hmmmmmm ?
I think we must not use words that can have other meanings out of the context. I know that it depends on interpretation, but ....
If anyone dont like to see some word i had said, i will edit my topic to delete it or to correct.
About the algos, we can assume strings like db " 123" or
db " +234"; db " -456 567"; db "3456999999999999"; etc. Some are good for ones and not for others.
I want see other approches.
Personally I feel that it should be 2 different routine. One routine to strip the spaces and stuff (since you really insist on it) and the other for the real part. This is so as, to give a choice to make instead of forcing them to use a more bloated code with more features - features that he/she does not really need.
Finally a conversion is a conversion and it is the responsibility of the programmer to point the right data at the algo. Stripping "+ or -" is actually a striing function that has nothing to do with the conversion and it becomes a point of diminishing returns to keep adding junk to an algo because someone may point garbage at it. There will never be an idiot proof algo as there will always be a better idiot so rather than cater for the idiot level, design whatever you need to ensure the conversion algos get the correct format data.
Now the library has 2 version of atodw, the original and the extended version which is table based and if you need to convert a signed value, there is the atol algo so there is enough capacity to do any of these conversions.
atodw original algo
atodw_ex extended version
atol signed version
For any who have any accuracy doubts, the extended version is table driven so it is garranteed accurate within the DWORD range.
The other factor is that the extended version is much faster than the others.
Quote from: hutch-- on June 01, 2005, 11:54:27 PM
Now the library has 2 version of atodw, the original and the extended version
atodw_ex extended version
but I haven't this extended version atodw_ex in m32lib, Hutch.
Get a current version, it can be done by getting the latest service pack from the forum web site. The algo was written about a year ago.
Hutch,
i thk we need unicode versions, too. No ?
Feel free to write it.
MASM doubt with INCLUDE
Can we include a file FILE.INC, which has 3 sections .data, .data? and .code ( all or some ), in any place inside a main file MAIN.ASM ?
MAIN.ASM:
.data
...
.data?
...
.code
start:
...
;..................
INCLUDE /.../FILE.INC
end start
FILE.INC:
.data
...
.data?
...
.code
...
I believe so Rui, have you tried it?
These types of questions are determined by writing a simple test piece. MASM supports changing the "SECTION" within code but it is the programmers responsibility to ensure that they write the correct data or code in the correct sections.
Now for example if you had an include file that was written like,
.data
item dd 0
and included it directly in the .CODE section, then you would get errors if there is code after the insertion point as you would be trying to write code in a data section. This would simply be a programming error.
As usual, try it out and if it goes BANG, you know the answer.
atodw : 68 cycles
atodw_no_pushpop_edi : 63 cycles
atodw_no_xor_eaxeax : 69 cycles
result on my SP4600+<1.83G>.