News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

M32Lib-ATODW

Started by RuiLoureiro, May 27, 2005, 09:59:08 PM

Previous topic - Next topic

RuiLoureiro

Hi,

   I was seeing converters ascii-to-integer. I found ATODW in m32lib. It uses «lea   ecx, dword ptr [eax+10*ecx]» (in two instructions). But:

      1. It has 3 instructions that do nothing: push  edi, pop edi and xor eax, eax;
   2. It uses 2D in turn of 2Dh ( minus signal ) and not 2Bh (+);
   3. We have no control over the result;
      4. We have no control over the buffer contents;

Here is what i have in m32lib folder:
................................................................................
atodw proc String:DWORD

    push esi
    push edi

    xor eax, eax
    mov esi, [String]
    xor ecx, ecx
    xor edx, edx
    mov al, [esi]
    inc esi
    cmp al, 2D
    jne proceed
    mov al, byte ptr [esi]
    not edx
    inc esi
    jmp proceed

  @@:
    sub al, 30h
    lea ecx, dword ptr [ecx+4*ecx]
    lea ecx, dword ptr [eax+2*ecx]
    mov al, byte ptr [esi]
    inc esi

  proceed:
    or al, al
    jne @B
    lea eax, dword ptr [edx+ecx]
    xor eax, edx

    pop edi
    pop esi
    ret
atodw endp
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
When i saw  «lea   ecx, dword ptr [ecx+4*ecx]» i thought «good, it can be "cheese"». But quickly i found out it is ... a "trap" (we cannot control the result because LEA doesnt affect any flag)

Here is my code

; To call:    invoke AtoDW, ADDR String     ;[String  db "??? ...",0 ]
; Out:  clc=> OK; stc=> error
AtoDW               proc  pString:DWORD   
                    push  esi

                    mov   esi, pString              ; String pointer
                    xor   ecx, ecx                  ; the result
                    xor   edx, edx                  ; the sign to the result
   
                    mov   al, byte ptr [esi]        ; get first byte
                    cmp   al, 2Bh                   ; plus ?
                    je    _nAtoDW

                    cmp   al, 2Dh                   ; minus ?
                    jne   _iAtoDW
                   
                    not   edx
                    je    _nAtoDW                    ; get next

  @@:               cmp    al, " "
                    je    _nAtoDW                    ; get next
                    ; -----------------------------------------
                    ; we must Control chars 30-39. If not error
                    ; -----------------------------------------
                    ; jc   @F if chars not between 30-39
                    sub   al, 30h
                    lea   ecx, dword ptr [ecx+4*ecx]
                    lea   ecx, dword ptr [eax+2*ecx]
                   
_nAtoDW:            inc   esi
                    mov   al, byte ptr [esi]        ; get next byte

_iAtoDW:            or    al, al
                    jne   @B
   
                    lea   eax, dword ptr [edx+ecx]
                    xor   eax, edx
                    clc
                   
@@:                 pop   esi
                    ret
AtoDW               endp

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I saw this too.
From Tutorial-fputute chapter 13 ( by Raymond ) i made this modifications:
( Raymond says in his page that we can - how are you ?)
;*******************************************************************************
;                            atofl
;*******************************************************************************
; lodsb   can be substituted by:
;                mov    al, byte ptr [esi]
;                inc    esi
atofl:
      push  ebx         ;preserve EBX and ESI
      push  esi

      lea   esi,buffer1 ;use ESI as pointer to text buffer
      xor   eax,eax
      xor   ebx,ebx     ;will be used as an accumulator
      xor   ecx,ecx     ;will be used as a counter
;************************************************
; Skip leading spaces without generating an error
;************************************************

   @@:
      lodsb             ;get next character
      cmp   al," "      ;check if a space character
      jz    @B          ;repeat until a non-space character is found
;*********************************************
; Check 1st non-space character for a +/- sign
;*********************************************
      cmp   al,"-"      ;is it a "-" sign
   je    atoflerr
;      jnz   @F

;atoflerr:
;      xor   eax,eax     ;set EAX to error code
;      pop   esi         ;restore the EBX and ESI registers
;      pop   ebx
;      ret               ;return with error code


;   @@:
      cmp   al,"+"      ;is it a "+" sign
   jnz   short @F
;      jnz   nextchar

nextchar:
      lodsb             ;disregard a "+" sign and get next character
;***********************************************************
; From this point, space and sign characters will be invalid
;***********************************************************
;nextchar:
@@:
      cmp   al,0        ;check for end-of-string character
      jz    endinput    ;exit the string parsing section

      cmp   al,"."      ;is it the "." decimal delimiter
                        ;other delimiters such as the "," used in some
                        ;countries could also be allowed but would need
                        ;additional coding to make it more generalized
      jnz   @F
;******************************************************************
; Only one decimal delimiter can be acceptable. The sign bit of ECX
; is used to keep a record of the first delimiter identified.
;******************************************************************

      or    ecx,ecx     ;check if a delimiter has already been identified
      js    atoflerr    ;exit with error code if more than 1 delimiter
     
      stc               ;set the carry flag
      rcr   ecx,1       ;set bit31 of ECX (the sign bit) when
                        ;the 1st delimiter is identified
;      lodsb             ;get next character
      jmp   nextchar    ;continue parsing
;***********************************************************************
; All ASCII characters other than the numerical ones will now be invalid
;***********************************************************************
   @@:
      cmp   al,"0"
      jb    atoflerr
      cmp   al,"9"
      ja    atoflerr

      sub   al,"0"      ;convert valid ASCII numerical character to binary
      xchg  eax,ebx     ;get the accumulated integer value in EAX
                        ;holding the new digit in EBX
      mul   factor10    ;multiply the accumulated value by 10
      add   eax,ebx     ; and add the new digit
      xchg  eax,ebx     ;store this new accumulated value back in EBX

      or    ecx,ecx     ;check if a decimal delimiter detected yet
      js    @F          ;jump if decimal digits are being processed
;*************************************
; Integer digits still being processed
;*************************************
      cmp   ebx,100     ;verify current value of integer portion
      jbe   nextchar    ;continue processing string characters
;      ja    atoflerr    ;abort if input for annual rate is > 100%

atoflerr:
      xor   eax,eax     ;set EAX to error code
      pop   esi           ;restore the EBX and ESI registers
      pop   ebx
      ret                  ;return with error code


;      lodsb             ;get next string character
;      jmp   nextchar    ;continue processing string characters

;*******************************************************
; The CL register is used as a counter of decimal digits
; after the decimal delimiter has been identified
;*******************************************************
   @@:
      inc   cl          ;increment count of decimal digits
;      lodsb             ;get next string character
      jmp   nextchar    ;continue processing string characters
;***********************************
; Parsing of the string is completed
;***********************************
endinput:
      or    ebx,ebx     ;check if total input was equal to 0
      jz    atoflerr    ;abort if annual rate input is 0%

      finit             ;initialize FPU
      push  ebx         ;store value of EBX on stack
      fild  dword ptr[esp]    ;-> st(0)=EBX
      add   cl,2        ;increment the number of decimal digits
                        ;to convert from % rate to a decimal rate
      shl   ecx,1       ;get rid of the potential sign "flag"
      shr   ecx,1       ;restore the count of decimal digits
      fild  factor10    ;-> st(0)=10, st(1)=EBX
   @@:
      fdiv  st(1),st    ;-> st(0)=10, st(1)=EBX/10
      dec   ecx         ;decrement counter of decimal digits
      jnz   @B          ;continue dividing by 10 until count exhausted
      fstp  st          ;get rid of the dividing 10 in st(0)
                        ;-> st(0)=annual rate (as a decimal rate)
      pop   ebx         ;clean CPU stack

      pop   esi         ;restore the EBX and ESI registers
      pop   ebx
      or    al,1        ;insure EAX != 0 (i.e. no error detected)
      ret
;*******************************************************************************
stay well

hutch--

The "xor eax, eax" is to prevent a register stall in the following use of AL. The PUSH/POP of EDI appears to be a left over from the last time Alex did some work on it and it is not needed but I doubt it slows anything up much.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

MichaelW

Hi Rui,

Just to illustrate the effects that Hutch is referring to:

; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .586                       ; create 32 bit code
    .model flat, stdcall       ; 32 bit memory model
    option casemap :none       ; case sensitive

    include \masm32\include\windows.inc
    include \masm32\include\masm32.inc
    include \masm32\include\kernel32.inc

    includelib \masm32\lib\masm32.lib
    includelib \masm32\lib\kernel32.lib

    include \masm32\macros\macros.asm

    include timers.asm

    atodw_no_xor_eaxeax   PROTO :DWORD
    atodw_no_pushpop_edi  PROTO :DWORD

; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
      teststr db "123456789",0
    .code
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    LOOP_COUNT EQU 10000000

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      invoke atodw, ADDR teststr
    counter_end
    mov   ebx,eax
    print chr$("atodw                : ")
    print ustr$(ebx)
    print chr$(" cycles", 13, 10)

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      invoke atodw_no_pushpop_edi, ADDR teststr
    counter_end
    mov   ebx,eax
    print chr$("atodw_no_pushpop_edi : ")
    print ustr$(ebx)
    print chr$(" cycles", 13, 10)   

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      invoke atodw_no_xor_eaxeax, ADDR teststr
    counter_end
    mov   ebx,eax
    print chr$("atodw_no_xor_eaxeax  : ")
    print ustr$(ebx)
    print chr$(" cycles", 13, 10)
   
    mov   eax, input(13, 10, "Press enter to exit...")
    exit   

; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
; Copies to play with.   
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
atodw_no_pushpop_edi proc String:DWORD

  ; ----------------------------------------
  ; Convert decimal string into dword value
  ; return value in eax
  ; ----------------------------------------

    push esi
    ;push edi

    xor eax, eax
    mov esi, [String]
    xor ecx, ecx
    xor edx, edx
    mov al, [esi]
    inc esi
    cmp al, 2D
    jne proceed
    mov al, byte ptr [esi]
    not edx
    inc esi
    jmp proceed

  @@:
    sub al, 30h
    lea ecx, dword ptr [ecx+4*ecx]
    lea ecx, dword ptr [eax+2*ecx]
    mov al, byte ptr [esi]
    inc esi

  proceed:
    or al, al
    jne @B
    lea eax, dword ptr [edx+ecx]
    xor eax, edx

    ;pop edi
    pop esi

    ret

atodw_no_pushpop_edi endp

atodw_no_xor_eaxeax proc String:DWORD

  ; ----------------------------------------
  ; Convert decimal string into dword value
  ; return value in eax
  ; ----------------------------------------

    push esi
    push edi

    ;xor eax, eax
    mov esi, [String]
    xor ecx, ecx
    xor edx, edx
    mov al, [esi]
    inc esi
    cmp al, 2D
    jne proceed
    mov al, byte ptr [esi]
    not edx
    inc esi
    jmp proceed

  @@:
    sub al, 30h
    lea ecx, dword ptr [ecx+4*ecx]
    lea ecx, dword ptr [eax+2*ecx]
    mov al, byte ptr [esi]
    inc esi

  proceed:
    or al, al
    jne @B
    lea eax, dword ptr [edx+ecx]
    xor eax, edx

    pop edi
    pop esi

    ret

atodw_no_xor_eaxeax endp

; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start


Results on my P3:

atodw                : 57 cycles
atodw_no_pushpop_edi : 52 cycles
atodw_no_xor_eaxeax  : 131 cycles




[attachment deleted by admin]
eschew obfuscation

Vortex

Hi MichaelW,

Thanks for the demo.

Here are the results on my P4 2.66 GHz:
Quote
atodw                : 54 cycles
atodw_no_pushpop_edi : 64 cycles
atodw_no_xor_eaxeax  : 60 cycles

Michael, will you modify your timers macros for P4 or are they OK to use them on a P4?

Thanks,

Erol

MazeGen

AFAIK there are no problems on P4 with those macros.

Your results are much different probably because P4 handles partial register access unlike P3.

hutch--

Erol,

The need to clear the register with XOR reg, reg or alternatively SUB reg, reg is not so noticable on a PIV but if you want code that runs on everything properly, it must be there.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dsouza123

The 2D instead of 2Dh is very likely a bug. 
The 2D becomes 2 when assembled, which wont match a - (minus sign).

Vortex

MazeGen, Hutch

Thanks for your replies.

MichaelW

Erol,

The macros contain no processor-specific code, so AFAIK the results are equally valid for all of the processor families. Agner Fog states in his Pentium optimization manual that the P4 was designed to store the whole register together, instead of splitting it into separate temporary registers as for the PPro, P2, and P3, to avoid the "serious delay whenever there was a need to join different parts of a register into a single full register." This seems to me to indicate that a large timing difference should be expected.
eschew obfuscation

Vortex

Hi Michael,

Thanks for the technical info :U

hutch--

Just a note on atodw, it was designed to handle DWORD rather than LONG values so it was never pointed at negative numbers. For the signed version, there is an algo written by Ray Filiatreault called "atol" that handles signed conversions.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

RuiLoureiro

Hi all

Here are the results on my P3:

atodw                 :  58 cycles    [+1 ]
atodw_no_pushpop_edi  :  53 cycles    [+1 ]
atodw_no_xor_eaxeax   : 131 cycles


Hi Hutch,
   How are you ? I hope you are fine.
      Yes, "xor   eax, eax" is needed. So, it must be there [no HomeWork rule]
      2D, as noted by dSouza123, is a bug. But if it has 2Dh (-), why not 2Bh (+) ?
I am guessing that when you come to our topics, many people want to see what you say. When we have not your help, sometimes, it is more difficult.
Thank you.

Hi Erol,
       Are you fine ? I hope. Thanks for the contribution.
The case [P4 2.66 G ]atodw_no_xor_eaxeax:  60  cycles against
Michael case [P3    ]atodw_no_xor_eaxeax: 131 cycles is mysterious !

Hi Michael,
   How are you getting along ? Thanks for your example (i will use it in other cases). In this case it is important because we can have hundred of strings to convert in one single loop (or task ). If the difference is 5 cycles (with push-pop and without), in 100 we have 500 cycles or in 200 the difference is 1000 cycles ( best case ).
   I noticed one strange case: atodw_no_xor_eaxeax  gives 131 cycles !!! Why this ? What is the explanation ? What i know is that without "xor eax, eax", the procedure is wrong.

Here is the corrected code [i call BufToInt in turn of AtoDW]

; In:   pString => string pointer
;
; Out:  clc => OK    the result is in EAX ( but can be wrong -overflow problems )
;
;       stc => char is not valid
;
; Info:
;      1. The string must terminated by 0;
;       2. The string can contain spaces between digit codes;
;       3. The first char. can be «-» or «+»;
;       4. We have no overflow control in the EAX result;
;       5. Destroy the contents of ECX, EDX.

;
; To call:      invoke   BufToInt, ADDR String     ;[String  db "??? ...",0 ]
;
BufToInt         proc  pString:DWORD   
                    push  esi

                    mov   esi, pString              ; String pointer
                    xor   eax, eax                 
                    xor   ecx, ecx                  ; the result
                    xor   edx, edx                  ; to sign the result
   
                    mov   al, byte ptr [esi]        ; get first byte

                    cmp   al, 2Bh                   ; plus ?
                    je    _nBufToInt

                    cmp   al, 2Dh                   ; minus ?
                    jne   _iBufToInt
                   
                    not   edx                 ; doesnt affect flags
                    je    _nBufToInt         ; get next

  @@:            cmp    al, " "
                    je    _nBufToInt                ; get next

                    cmp   al,"9"
                    jbe   _tBufToInt

_rBufToInt:    stc
                      pop   esi
                      ret

_tBufToInt:         sub   al, 30h            ; most signif. byte=0
                         jc    short _rBufToInt

                        lea   ecx, dword ptr [ecx+4*ecx]
                        lea   ecx, dword ptr [eax+2*ecx]
                   
_nBufToInt:        inc   esi
                        mov   al, byte ptr [esi]        ; get next byte

_iBufToInt:        or    al, al
                        jne   @B
   
                        lea   eax, dword ptr [edx+ecx]
                        xor   eax, edx
                        clc
                   
                       pop   esi
                       ret
BufToInt           endp
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
About «atol», i have this code


atol           proc  lpSrc:DWORD
                xor   eax, eax
                xor   ecx, ecx
                mov   edx, lpSrc

                sub   edx, 1
  @@:       
                add   edx, 1
                cmp   BYTE PTR [edx], 32
                je    @B
                cmp   BYTE PTR [edx], 9
                je    @B                            ; [ strip spaces and tabs ]

                mov   al, [edx]                   ; [ begin ]
                add   edx, 1

   .if al == "-"
     add   ecx, 1
           mov   al,[edx]
           add   edx, 1
   .elseif al == "+"
           mov   al, [edx]
           add   edx, 1
   .endif
                push ecx             ; keep sign on stack
                xor  ecx,ecx

@@:         
                sub al,"0"
                jc    @F                 

                 lea    ecx, [ecx+ecx*4]
                 lea    ecx, [eax+ecx*2]
                 mov  al, [edx]
                 add   edx, 1
                 jmp    @B
@@:   
                mov eax,ecx
                pop ecx               ; retrieve sign
                shr ecx,1
                jnc @F
                neg eax
@@:         
                ret
atol           endp


Where

.if al == "-"
  add  ecx, 1
            mov  al,[edx]
            add  edx, 1
   .elseif al == "+"
             mov  al,[edx]
             add  edx, 1
   .endif


should be [ HomeWork rule ! ]


.if al == "-"                 ; if not, the sign is ecx=0
       add ecx, 1
.endif

mov  al,[edx]
add  edx, 1

........................................

All best things to all of you
Stay well

QvasiModo

RuiLoureiro, this could be the reason why removing the XOR EAX, EAX causes a slowdown:

Quote from: hutch-- on May 28, 2005, 01:53:50 AM
The "xor eax, eax" is to prevent a register stall in the following use of AL.

Cheers, :U
QvasiModo

hutch--

hmmmm,

Quote
Hi Hutch,
   How are you ? I hope you are fine.
      Yes, "xor   eax, eax" is needed. So, it must be there [no HomeWork rule]
      2D, as noted by dSouza123, is a bug. But if it has 2Dh (-), why not 2Bh (+) ?
I am guessing that when you come to our topics, many people want to see what you say. When we have not your help, sometimes, it is more difficult.
Thank you.

Thanks but I already knew about the Intel optimisation since they published it for the PIII many years ago. The only BUG in the algo is a user bug of trying to use an UNSIGNED algo for signed values, as posted before, use ATOL for signed values.

I am not sure of the point you ae trying to make with comments about the forum rules but they are in place for a reason which is to protect our members from nonsense and this will not be changed. Keep it up and the posting WILL be changed.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

roticv

Normal people don't add + to a number if it is a positive number. It is rare that we add a plus sign in front of the number and the only cases I can think of is oxidation number of an element.

On the other hand, negative sign tells us that the number is negative.

Therefore I think the cmp with plus sign is useless and slows down the code. Personally I don't like to see too many branches as it would slow down the code.