News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Problem with atodw_ex

Started by jj2007, August 15, 2009, 01:46:51 AM

Previous topic - Next topic

jj2007

When testing the MovVal algo, I wanted to check the ascii to integer routines for comparison. To do this, I use 314159265 (instead of 3.141592653589793238 for the ascii to float tests).

That works fine for atodw, and it's pretty fast (about 40 cycles on a Celeron M), but atodw_ex gets very slow:

--- Tests for PI=3.141592653589793238 ---
242     cycles for MovVal
876     cycles for RayLib
7456    cycles for sscanf
43      cycles for atodw (9-byte integer)
72004   cycles for atodw_ex (314159265)


The documentation is rather cryptic, and insofar incorrect as eax does contain 314159265 on return:

Quoteatodw_ex
atodw_ex proc lpstring:DWORD

Description
A high speed ascii decimal string to DWORD conversion for applications that require high speed streaming of conversion data.

Parameters
1.lpstring The address of the ascii decimal string.

Return Value
There is no return value.

Comments
This is an extended function aimed at applications where conversion speed is critical to their operation. For non speed critical string to number conversion the normal procedure should be used to save space.

P.S.: I am aware of this thread on atodw_ex.

dedndave

wow !!!   :eek
that's eatin up some clocks - lol
looking at the file, i don't see anything obvious that says it should be that slow
\masm32\m32lib\a2dw_ex.asm
you didn't misplace a counter_end or something, did you, Jochen ?
that number can't be right - lol

align 4

atodw_ex proc lpstring:DWORD

    mov [ebp+12], edi

  ; ----------------------------------------------------
  ; unrolled StrLen algo to test up to 12 bytes for zero
  ; ----------------------------------------------------
    mov eax,lpstring                    ; get pointer to string
    lea edx,[eax+3]                     ; pointer+3 used in the end

    mov edi,[eax]                       ; read first 4 bytes
    add eax,4                           ; increment pointer
    lea ecx,[edi-01010101h]             ; subtract 1 from each byte
    not edi                             ; invert all bytes
    and ecx,edi                         ; and these two
    and ecx,80808080h
    jnz @F

    mov edi,[eax]                       ; read next 4 bytes
    add eax,4                           ; increment pointer
    lea ecx,[edi-01010101h]             ; subtract 1 from each byte
    not edi                             ; invert all bytes
    and ecx,edi                         ; and these two
    and ecx,80808080h
    jnz @F

    mov edi,[eax]                       ; read last 4 bytes
    add eax,4                           ; increment pointer
    lea ecx,[edi-01010101h]             ; subtract 1 from each byte
    not edi                             ; invert all bytes
    and ecx,edi                         ; and these two
    and ecx,80808080h

  @@:
    test ecx,00008080h                  ; test first two bytes
    jnz @F
    shr ecx,16                          ; not in the first 2 bytes
    add eax,2
  @@:
    shl cl,1                            ; use carry flag to avoid branch
    sbb eax,edx                         ; compute length
  ; -------------------------------------------

    mov edi, eax                        ; length in EDI
    mov edx, lpstring
    sub edx, 1

    xor eax, eax

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+40]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+80]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+120]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+160]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+200]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+240]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+280]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+320]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+360]

  atout:

    mov edi, [ebp+12]

    ret

atodw_ex endp

jj2007

Quote from: dedndave on August 15, 2009, 01:56:58 AM
you didn't misplace a counter_end or something, did you, Jochen ?

Don't think so.... ::)

.. but I solved the mystery:

counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
push ebx
invoke atodw_ex, offset MyPI_Int
pop ebx
counter_end


Works better :bg
--- Tests for PI=3.141592653589793238 ---
236     cycles for MbVal
872     cycles for RayLib
7455    cycles for sscanf
45      cycles for atodw (9-byte integer)
37      cycles for atodw_ex (314159265)

dedndave

#3
ah yes - that is one if the things Michael fixed in counters2.asm

jj2007

Quote from: dedndave on August 15, 2009, 02:30:43 AM
ah yes - that is one if the things Michael fixed in counters2.asm

The problem is not with Michael's routine, it's with the algo:
counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
push eax
invoke atodw_ex, offset MyPI_Int
pop eax
counter_end

Works fine with eax. Any reg32 will do, actually, because edi is saved to the wrong location, trashing the stack:

    mov [ebp+12], edi
...
    mov edi, [ebp+12]


Try this:

    SaveEdi equ [ebp+8]
    mov SaveEdi, edi
...
    mov edi, SaveEdi


Olly says this proc has a stack frame - surprising for an extra fast algo. Probably somebody got lost in stack space... :bg

dedndave

oh - i see it, now - lol
maybe trying to pass a parm back on the stack ? (why else would he not just push edi/pop edi ? - lol - makes no sense)
now you know why i prefer to OPTION off the prologue epilogue and make my own frames
it's hard enough to keep em straight without masm trying to help - lol

Z sends a kiss

sinsi

Quote from: jj2007 on August 15, 2009, 02:49:10 AM
    SaveEdi equ [ebp+8]
    mov SaveEdi, edi
...
    mov edi, SaveEdi

Even that wouldn't work, because later comes
    mov edx, lpstring
and lpstring is [ebp+8] isn't it?

Light travels faster than sound, that's why some people seem bright until you hear them.

dedndave

hiya Sinsi - yah - i see it is bass-ackwards, but i don't know why else he wouldn't just push/pop
[ebp+8] should be right, but that makes no sense, either - lol
must be on drugs
if he was trying to save a clock cycle or two by using the stack parm addy to save edi, he could not have accessed lpstring later

atodw_ex proc lpstring:DWORD

    mov [ebp+12], edi

; ----------------------------------------------------
; unrolled StrLen algo to test up to 12 bytes for zero
; ----------------------------------------------------
    mov eax,lpstring                    ; get pointer to string
.
.
.
; -------------------------------------------

    mov edi, eax                        ; length in EDI
    mov edx, lpstring

i bet the routine originally had 2 parms, and he was using the second location to save edi
then, the routine got modified and oops !
that would make sense

hutch--

Since I know this algo works OK, what is the problem with the following code ?


align 4

atodw_ex proc lpstring:DWORD

    mov [ebp+12], edi


ebp + 4 holds the return address.
ebp + 8 holds lpstring
ebp + 12 is not used so its an OK place to copy a register to.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

sinsi


  call myproc
  ...

myproc: invoke atodw_ex,addr buffer
  ret

Doesn't the return address from 'call myproc' sit at [ebp+12] in atodw_ex?
Light travels faster than sound, that's why some people seem bright until you hear them.

MichaelW

Yes, the caller could be using the stack at [ebp+12].

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc

    EXTERNDEF decade :DWORD
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
      s1 db "1234567890",0
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

align 4

_atodw_ex proc lpstring:DWORD

    LOCAL edisave:DWORD

    mov edisave, edi

  ; ----------------------------------------------------
  ; unrolled StrLen algo to test up to 12 bytes for zero
  ; ----------------------------------------------------
    mov eax,lpstring                    ; get pointer to string
    lea edx,[eax+3]                     ; pointer+3 used in the end

    mov edi,[eax]                       ; read first 4 bytes
    add eax,4                           ; increment pointer
    lea ecx,[edi-01010101h]             ; subtract 1 from each byte
    not edi                             ; invert all bytes
    and ecx,edi                         ; and these two
    and ecx,80808080h
    jnz @F

    mov edi,[eax]                       ; read next 4 bytes
    add eax,4                           ; increment pointer
    lea ecx,[edi-01010101h]             ; subtract 1 from each byte
    not edi                             ; invert all bytes
    and ecx,edi                         ; and these two
    and ecx,80808080h
    jnz @F

    mov edi,[eax]                       ; read last 4 bytes
    add eax,4                           ; increment pointer
    lea ecx,[edi-01010101h]             ; subtract 1 from each byte
    not edi                             ; invert all bytes
    and ecx,edi                         ; and these two
    and ecx,80808080h

  @@:
    test ecx,00008080h                  ; test first two bytes
    jnz @F
    shr ecx,16                          ; not in the first 2 bytes
    add eax,2
  @@:
    shl cl,1                            ; use carry flag to avoid branch
    sbb eax,edx                         ; compute length
  ; -------------------------------------------

    mov edi, eax                        ; length in EDI
    mov edx, lpstring
    sub edx, 1

    xor eax, eax

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+40]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+80]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+120]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+160]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+200]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+240]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+280]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+320]
    sub edi, 1
    jz atout

    movzx ecx, BYTE PTR [edx+edi]
    add eax, [ecx*4+decade-192+360]

  atout:

    mov edi, edisave

    ret

_atodw_ex endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    push 12345678h
    invoke atodw_ex, ADDR s1
    print ustr$(eax),13,10
    pop eax
    print uhex$(eax),"h",13,10,13,10
    push 12345678h
    invoke _atodw_ex, ADDR s1
    print ustr$(eax),13,10
    pop eax
    print uhex$(eax),"h",13,10,13,10

    inkey "Press any key to exit..."
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start


1234567890
00000000h

1234567890
12345678h

eschew obfuscation

jj2007

Quote from: sinsi on August 15, 2009, 03:50:42 AM
Quote from: jj2007 on August 15, 2009, 02:49:10 AM
    SaveEdi equ [ebp+8]
    mov SaveEdi, edi
...
    mov edi, SaveEdi

Even that wouldn't work, because later comes
    mov edx, lpstring
and lpstring is [ebp+8] isn't it?


Sinsi, I had tested that version, and it works fine.
The problem becomes evident when you follow in Olly what happens after the ret: Michael's timer algo pops ebx, and gets the last value of edi...

sinsi

I don't see the logic in using the default prologue and then explicitly using [ebp+12] to save edi. Surely push/pop is better in this case?

jj, lpstring gets accessed twice in the proc, and lpstring is at [ebp+8] isn't it? Overwriting it with edi...I don't know how it could work (unless edi, by some fluke, points to the string as well).
Light travels faster than sound, that's why some people seem bright until you hear them.

jj2007

Quote from: sinsi on August 15, 2009, 07:49:42 AM
jj, lpstring gets accessed twice in the proc, and lpstring is at [ebp+8] isn't it? Overwriting it with edi...I don't know how it could work (unless edi, by some fluke, points to the string as well).

You are right, and I was wrong :8)

But still, the original algo trashes an inexistant arg2, which happens to be the content of what was pushed before invoing the proc. That is why ebx, pushed by the timers, contains the last edi when popped.

The simple solution is a push/pop edi.

sinsi

>which happens to be the content of what was pushed before invoing the proc
That is the killer, since it could be a return address, or using atodw_ex in a callback where you damned well know you saved ebx before you used it, and restored it before you returned, and your program terminates with a c0000005 error in kernel32 or something...
Light travels faster than sound, that's why some people seem bright until you hear them.