The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: MichaelW on January 03, 2005, 09:36:45 AM

Title: Binary string to dword
Post by: MichaelW on January 03, 2005, 09:36:45 AM
I coded a binary string to dword conversion thinking I was going to need it for a project. It turned out I didn't, but I thought the result was worth posting.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    .486                       ; create 32 bit code
    .model flat, stdcall       ; 32 bit memory model
    option casemap :none       ; case sensitive

    include \masm32\include\windows.inc
    include \masm32\include\masm32.inc
    include \masm32\include\kernel32.inc

    includelib \masm32\lib\masm32.lib
    includelib \masm32\lib\kernel32.lib

    include \masm32\macros\macros.asm

    btodw PROTO :DWORD

    bval MACRO lpstring     ; binary string to unsigned 32 bit integer
        invoke btodw, reparg(lpstring)
        EXITM <eax>
    ENDM

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
      buffer db "1010101010101010",0
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    print uhex$(bval(chr$("0")))
    print chr$(13,10)

    print uhex$(bval(chr$("00000001")))
    print chr$(13,10)

    print uhex$(bval(chr$("100")))
    print chr$(13,10)

    print uhex$(bval(chr$("10000001")))
    print chr$(13,10)

    print uhex$(bval(chr$("10101010")))
    print chr$(13,10)

    print uhex$(bval(chr$("11111111")))
    print chr$(13,10)

    print uhex$(bval(chr$("1000000000000000")))
    print chr$(13,10)

    print uhex$(bval(chr$("10000000000000000000000000000000")))
    print chr$(13,10)

    print uhex$(bval(chr$("11111111111111111111111111111111")))
    print chr$(13,10)

    count equ 10000000

    call GetTickCount
    push eax
    mov ebx, count
  @@:
    invoke btodw, ADDR buffer
    dec ebx
    jnz @B
    call GetTickCount
    pop ebx
    sub eax, ebx
    print ustr$(eax)   

    print chr$(13,10,"Press enter to exit...",13,10)
    invoke StdIn, ADDR buffer, 1
    exit

btodw proc String:DWORD

  ; ----------------------------------------
  ; Convert binary string into dword value
  ; return value in eax
  ; ----------------------------------------

    mov edx, [String]
    xor eax, eax
    xor ecx, ecx
    jmp @f
  fini:
    mov eax, ecx
    ret
    ALIGN 4
  @@:
    mov al, [edx]
    sub al, 30h
    js  fini
    shl ecx, 1
    add edx, 1
    or  ecx, eax
    jmp @B

btodw endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start

Title: Re: Binary string to dword
Post by: Vortex on January 03, 2005, 10:46:17 AM
Hi MichaelW,

Your work looks nice, I will give a try.
Title: Re: Binary string to dword
Post by: Petroizki on January 03, 2005, 11:56:19 AM
Nice. You could replace eax&ecx, so you would get rid of the final 'mov'. ;)

This would be at least little smaller:
btodw proc String:DWORD
mov edx, String
xor eax, eax

@@: movzx ecx, byte ptr [edx]
sub ecx, 30h
js fini

add eax, eax
add edx, 1
or eax, ecx
jmp @B

fini:
ret
btodw endp


Don't know how this would benchmark:
btodw proc String:DWORD
mov edx, String
xor eax, eax

@@: movzx ecx, byte ptr [edx]
add edx, 1

test ecx, ecx
je fini

lea eax, [eax*2 + ecx - 30h]
jmp @B

fini:
ret
btodw endp
Title: Re: Binary string to dword
Post by: Jibz on January 03, 2005, 12:09:58 PM
Unless it was very important that this function be as fast as possible, I would probably exit on any character that was not 0 or 1 -- something like:

    ...
    sub    ecx, 30h
    test   ecx, -2
    jnz    fini
    ...
Title: Re: Binary string to dword
Post by: MichaelW on January 03, 2005, 06:23:06 PM
Petroizki,

I tried swaping eax and ecx and eliminating the mov in my procedure, but doing so slowed it down. Using lea to combine the shift and the add into a single instruction is a good idea that didn't occur to me. Your second procedure is the fastest on my P3 and it would probably be the fastest on the earlier processors, but I wonder how it would do on a P4.

Jibz,

I can see the sense in making all conversion procedures handle invalid characters as the dominant HLLs do, but I have always felt that just ending the conversion is a less than optimum method of handling invalid characters. An invalid character in a numeric string usually indicates an error, and I feel it should be handled as such.  But, the HLL conventions are what they are... I could not think of any way to implement your suggestion in Petroizki's second procedure that did not involve adding two or more instructions to the loop.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    .486                       ; create 32 bit code
    .model flat, stdcall       ; 32 bit memory model
    option casemap :none       ; case sensitive

    include \masm32\include\windows.inc
    include \masm32\include\masm32.inc
    include \masm32\include\kernel32.inc

    includelib \masm32\lib\masm32.lib
    includelib \masm32\lib\kernel32.lib

    include \masm32\macros\macros.asm

    btodw PROTO :DWORD
    btodw0 PROTO :DWORD
    btodw1 PROTO :DWORD
    btodw2 PROTO :DWORD
    btodw3 PROTO :DWORD


    bval MACRO lpstring     ; binary string to unsigned 32 bit integer
        invoke btodw, reparg(lpstring)
        EXITM <eax>
    ENDM

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
      buffer db "1010101010101010",0
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    print uhex$(bval(chr$("0")))
    print chr$(13,10)

    print uhex$(bval(chr$("00000001")))
    print chr$(13,10)

    print uhex$(bval(chr$("100")))
    print chr$(13,10)

    print uhex$(bval(chr$("10000000")))
    print chr$(13,10)

    print uhex$(bval(chr$("10101010")))
    print chr$(13,10)

    print uhex$(bval(chr$("11111111")))
    print chr$(13,10)

    print uhex$(bval(chr$("1000000000000000")))
    print chr$(13,10)

    print uhex$(bval(chr$("10000000000000000000000000000000")))
    print chr$(13,10)

    print uhex$(bval(chr$("11111111111111111111111111111111")))
    print chr$(13,10)

    count equ 10000000

    call GetTickCount
    push eax
    mov ebx, count
  @@:
    invoke btodw, ADDR buffer
    dec ebx
    jnz @B
    call GetTickCount
    pop ebx
    sub eax, ebx
    print ustr$(eax)
   
    print chr$(13,10,13,10)
    print uhex$(hval(chr$("12x34")))

    print chr$(13,10)
    print ustr$(val(chr$("12x34")))

    print chr$(13,10,13,10,"Press enter to exit...",13,10)
    invoke StdIn, ADDR buffer, 1
    exit

btodw proc String:DWORD
    mov edx, String
    xor eax, eax
  @@:
    movzx ecx, byte ptr [edx]
    add edx, 1

    test ecx, 0FFFFFFCEh
    jnz fini

    test ecx, ecx
    je fini

    lea eax, [eax*2 + ecx - 30h]

    jmp @B

  fini:
    ret
btodw endp

btodw0 proc String:DWORD

  ; ----------------------------------------
  ; Convert binary string into dword value
  ; return value in eax
  ; ----------------------------------------

    mov edx, [String]
    xor eax, eax
    xor ecx, ecx
    jmp @f
  fini:
    mov eax, ecx
    ret
    ALIGN 4
  @@:
    mov al, [edx]
    sub al, 30h

    js  fini

    shl ecx, 1
    add edx, 1
    or  ecx, eax
    jmp @B

btodw0 endp

btodw1 proc String:DWORD
    mov edx, String
    xor eax, eax

@@: movzx ecx, byte ptr [edx]
    sub ecx, 30h
    js fini

    add eax, eax
    add edx, 1
    or eax, ecx
    jmp @B

  fini:
    ret
btodw1 endp

btodw2 proc String:DWORD
    mov edx, String
    xor eax, eax

@@: movzx ecx, byte ptr [edx]
    add edx, 1

    test ecx, ecx
    je fini

    lea eax, [eax*2 + ecx - 30h]
    jmp @B

  fini:
    ret
btodw2 endp

btodw3 proc String:DWORD
    mov edx, String
    xor ecx, ecx

@@: movzx eax, byte ptr [edx]
    add edx, 1

    test eax, eax
    je fini

    lea ecx, [ecx*2 + eax - 30h]
    jmp @B 

  fini:
    mov eax, ecx
ret
btodw3 endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start

As posted (btodw0):
00000000
00000001
00000004
00000081
000000AA
000000FF
00008000
80000000
FFFFFFFF
1512

Petroizki 1 (btodw1):
00000000
00000001
00000004
00000081
000000AA
000000FF
00008000
80000000
FFFFFFFF
1783

Petroizki 2 (btodw2):
00000000
00000001
00000004
00000081
000000AA
000000FF
00008000
80000000
FFFFFFFF
1432

Petroizki 2, eax & ecx swapped, mov added (btodw3):
00000000
00000001
00000004
00000081
000000AA
000000FF
00008000
80000000
FFFFFFFF
1452

Petroizki 2 with my version of Jibz mod (btodw):
00000000
00000001
00000004
00000080
000000AA
000000FF
00008000
80000000
FFFFFFFF
2073

Title: Re: Binary string to dword
Post by: Ratch on January 04, 2005, 05:47:08 AM
     My contribution to this function is below.  Notice that I read 4 bytes at a time.  The loops are unrolled, but it would be easy to roll them up.  By the way, how do you differentiate between '0', ' 00', '000', etc in the EAX value?  Ratch


;*******************************************************************************
; ABS_ABDW:  Converts a ASCII binary string into an arithmetic binary DWORD.   *
;                                                                              *
; Called by: PUSH <address of string>                                          *
;            CALL ABS_ABDW                                                     *
;                                                                              *
; Returns:   EAX=arithmetic binary value                                       *
;                                                                              *
; Notes:     This subroutine conforms with WIN32 conventions regarding         *
;            register preservation and stack balancing. EBX,EBP,ESI,EDI are    *
;            not used.                                                         *
;                                                                              *
;            This subroutine makes the assumption that the string length will  *
;            not exceed 32 bytes, and will be terminated by a zero byte.       *
;            Furthermore, since it is a ASCII binary string, it is assumed     *
;            that the string will contain only ASCII zeros and ones.           *
;                                                                              *
; Coder:     Ratch                                                             *
;*******************************************************************************
ABS_ABDW:                   ;it all begins here
MOV ECX,[ESP+DWORD]        ;ECX=address of string
XOR EAX,EAX                ;fresh beginning

@@:
MOV EDX,[ECX]              ;EDX=4 bytes of string

TEST DL,DL                 ;is zero byte terminator?
JZ  @F
  SHL EAX,1                 ;make room for next bit
  SHR EDX,1                 ;send 1 or 0 into the CF
  ADC EAX,0                 ;tack on CF to LSB of EAX
  SHR EDX,7                 ;move to the next byte

TEST DL,DL                 ;is zero byte terminator?
JZ  @F
  SHL EAX,1                 ;make room for next bit
  SHR EDX,1                 ;send 1 or 0 into the CF
  ADC EAX,0                 ;tack on CF to LSB of EAX
  SHR EDX,7                 ;move to the next byte

TEST DL,DL                 ;is zero byte terminator?
JZ  @F
  SHL EAX,1                 ;make room for next bit
  SHR EDX,1                 ;send 1 or 0 into the CF
  ADC EAX,0                 ;tack on CF to LSB of EAX
  SHR EDX,7                 ;move to the next byte

TEST DL,DL                 ;is zero byte terminator?
JZ  @F
  SHL EAX,1                 ;make room for next bit
  SHR EDX,1                 ;send 1 or 0 into the CF
  ADC EAX,0                 ;tack on CF to LSB of EAX
  SHR EDX,7                 ;move to the next byte

  ADD ECX,DWORD             ;pointer to next word
  JMP @B
@@:
RET DWORD                  ;return to sender

Title: Re: Binary string to dword
Post by: Mark_Larson on January 04, 2005, 09:51:06 PM
these are my timings on my P3 at work.  I used RDTSC, and divided by the total number of loops.  Other than that I just cut and pasted Michael's code.


btodw   : 118
btodw0 : 83
btodw1 : 80
btodw2 : 68
btodw3 : 69

Title: Re: Binary string to dword
Post by: Mark_Larson on January 04, 2005, 10:07:56 PM
  I am still trying other things.  But I got it down to 57 cycles on my P3.  The previous fastest way on my P3 was 68 cycles.  Usually I wait to post my final code until I have the fastest possible version.  But I figured I'd post this now in case it helps anyone see the light, and come up with another way that is a variation on my way.  I just basically broke the inner loop up into handling two bytes at a time.


btodw4 proc String:DWORD
    mov edx, String
    xor eax, eax
align 4
@@:
    movzx ecx, byte ptr [edx]
movzx esi, byte ptr [edx+1]
;    add edx, 1
add edx,2

    test ecx, ecx
    je fini
   
    lea eax, [eax*2 + ecx - 30h]

test esi, esi
je fini

lea eax, [eax*2 + esi - 30h]

    jmp @B

  fini:
    ret
btodw4 endp
Title: Re: Binary string to dword
Post by: Mark_Larson on January 04, 2005, 10:44:01 PM
  Here is something else I am trying.  It still has a bug in it.  I haven't figured out how to fixup it up properly at the end.  But hopefully either someone will spot the fixup ( or I'll have time to do it), or someone will post a faster version using mine as a basis.  I read a whole dword, and deal with 4 binary characters at a time.  It runs in 29 cycles on my P3 at work.


btodw5 proc String:DWORD
    mov edx, String
    xor eax, eax

align 4
@@:
    mov ecx, [edx]
    add edx,4

    sub ecx, 30303030h
   
    shl eax,4           ;we deal with 4 bytes at a time, so we multiply by 16
    or eax,ecx

;adjust for negative number from a 0 at the end
    test ecx,80000000h
    jnz fini

    jmp @B

  fini:
  ;need to do a fixup for EAX here
    ret
btodw5 endp

Title: Re: Binary string to dword
Post by: Ratch on January 05, 2005, 02:15:03 AM
Mark Larson,
Quote
I read a whole dword, and deal with 4 binary characters at a time.

     Isn't that what the program I previously posted does?  Ratch
Title: Re: Binary string to dword
Post by: Petroizki on January 05, 2005, 07:52:34 AM
mark,

Please note that your btodw4 uses esi, without restoring it. :eek
Title: Re: Binary string to dword
Post by: Mirno on January 05, 2005, 11:49:08 AM
Mark, you don't compact the bits. So after your subtract you end up with (01010101h), not (0Fh)...
Easiest way to compact is to multiply by 01020408h, then shift right by 24, but this uses eax & edx :eek

Then you need to mask the unused bits, and shift eax accordingly (i.e. not always 4 if there are only 3 bits left you'll need to shift 3 :tdown ).

Mirno
Title: Re: Binary string to dword
Post by: Mirno on January 05, 2005, 12:02:43 PM
As an aside, when I first saw the problem I mis-read it, and came up with an interesting version for strings that were always 32 characters long (i.e. don't deal with null terminators)...


  mov edx, String          ; EDX is our string pointer
  mov eax, 1               ; Use 1, so we know when we're done

@@:
  cmp BYTE PTR [edx], '1'  ; CARRY flag is set for '0' and not for '1'
  lea edx, [edx + 1]       ; Increment EDX without touching the flags
  rcl eax, 1               ; Add this bit to the result.
  jnc @B                   ; If our marker hasn't fallen off, carry on

  not eax                  ; Our result is inverted, so NOT to correct
  ret                      ; ... And relax!


Mirno
Title: Re: Binary string to dword
Post by: Mark_Larson on January 05, 2005, 04:34:49 PM
Quote from: Ratch on January 05, 2005, 02:15:03 AM
Mark Larson,
Quote
I read a whole dword, and deal with 4 binary characters at a time.

     Isn't that what the program I previously posted does?  Ratch

  What you did was read a whole dword at a time, but then handle it one byte at a time.  I read a dword at a time, and then handle a dword at a time.  See the difference?  Also your code runs in 79 cycles on my P3.  There were a few earlier posts that were faster than that.

Here's a repost of those previous timings:

btodw   : 118
btodw0 : 83
btodw1 : 80
btodw2 : 68
btodw3 : 69


Quote from: Petroizki on January 05, 2005, 07:52:34 AM
mark,

Please note that your btodw4 uses esi, without restoring it. :eek

Thanks.  I was just trying to get something up quick, without worrying about robustiness.  You can add that later.  That's the same reason I don't generally add comments to code while I am optimizing it.

 


Quote from: Mirno on January 05, 2005, 11:49:08 AM
Mark, you don't compact the bits. So after your subtract you end up with (01010101h), not (0Fh)...
Easiest way to compact is to multiply by 01020408h, then shift right by 24, but this uses eax & edx :eek

Then you need to mask the unused bits, and shift eax accordingly (i.e. not always 4 if there are only 3 bits left you'll need to shift 3 :tdown ).

Mirno


Thanks.  I am actually trying something different at the moment.  I had a bit of a brainstorm on the way home from work.  I am hoping it is faster than the read and handle a dword at a time version. 
Title: Re: Binary string to dword
Post by: Mark_Larson on January 05, 2005, 08:41:40 PM
Mirno, I didn't compact until the end.  I didn't have time to take a close look at it to see if that introduced any bugs.  I also didn't do any AND's at the end. The test value came out right.  For the string that Michael provides it prints out a correct value.  I really should do more testing to verify it works with different values.  It runs in 30 cyclces on my P3 with doing the compaction only at the end.  Modifying the compaction to doing it every loop would add a lot of overhead.


btodw5 proc String:DWORD
    mov edx, String
    xor eax, eax

align 4
@@:
    mov ecx, [edx]
    add edx,4

    sub ecx, 30303030h
   
    shl eax,4           ;we deal with 4 bytes at a time, so we multiply by 16
    or eax,ecx

;adjust for negative number from a 0 at the end
    test ecx,80000000h
    jnz fini

    jmp @B

  fini:
  ;need to do a fixup for EAX here
    mov esi,01020408h
    mul esi        ; result in EDX:EAX
    shrd eax,edx,24
    shr eax,24
    ret
btodw5 endp


EDIT:  It doesn't return the correct result.  I made a boo-boo.  I ran the wrong version when printing the result.
Title: Re: Binary string to dword
Post by: MichaelW on January 06, 2005, 12:03:17 AM
This appears to produce the correct results:

btodw5 proc uses ebx esi String:DWORD
    mov   esi, String
    xor   ebx, ebx
  align 4
  @@:
    mov   ecx, [esi]
    sub   ecx, 30303030h
    test  ecx, 80000000h
    jnz   dobytes
    add   esi, 4
    bswap ecx
    mov   eax, 01020408h ; Good stuff Mirno :)
    mul   ecx
    shl   ebx, 4
    shr   eax, 24
    or    ebx, eax
    jmp   @B
  dobytes:
    movzx ecx, byte ptr [esi]
    add   esi, 1
    test  ecx, ecx
    je    fini
    lea   ebx, [ebx*2 + ecx - 30h]
    jmp   dobytes
  fini:
    mov   eax, ebx
    ret
btodw5 endp


But, for the cases where you are not handling a multiple of four bytes, I think it will be slower than this:

btodw4 proc uses ebx esi edi String:DWORD
    mov edx, String
    xor eax, eax
align 4
  @@:
    movzx ecx, byte ptr [edx]
    movzx ebx, byte ptr [edx+1]
    movzx esi, byte ptr [edx+2]
    movzx edi, byte ptr [edx+3]
    add   edx, 4
    test  ecx, ecx
    jz    fini
    lea   eax, [eax*2 + ecx - 30h]
    test  ebx, ebx
    jz    fini
    lea   eax, [eax*2 + ebx - 30h]
    test  esi, esi
    jz    fini
    lea   eax, [eax*2 + esi - 30h]
    test  edi, edi
    jz    fini
    lea   eax, [eax*2 + edi - 30h]
    jmp   @B
  fini:
    ret
btodw4 endp


Mark,

Could you post the P4 clock counts for reference?
Title: Re: Binary string to dword
Post by: Ratch on January 06, 2005, 10:02:00 AM
MichaelW,
     I reworked your code to make it smaller, faster and fixed a bug.  A test of 80808080H is needed to find the zero byte in a string such as BYTE '111111010',0,'111111'.  Ratch


;-------------------------------------------------------------------------------
;*******************************************************************************
; ABS_ABDW:  Converts a ASCII binary string into an arithmetic binary DWORD.   *
;                                                                              *
; Called by: PUSH <address of string>                                          *
;            CALL ABS_ABDW                                                     *
;                                                                              *
; Returns:   EAX=arithmetic binary value                                       *
;                                                                              *
; Notes:     This subroutine conforms with WIN32 conventions regarding         *
;            register preservation and stack balancing. EBX,EBP,ESI,EDI are    *
;            preserved.                                                        *
;                                                                              *
;            This subroutine makes the assumption that the string length will  *
;            not exceed 32 bytes, and will be terminated by a zero byte.       *
;            Furthermore, since it is a ASCII binary string, it is assumed     *
;            that the string will contain only ASCII zeros and ones.           *
;                                                                              *
; Coder:     Ratch                                                             *
;*******************************************************************************
ABS_ABDW:              ;it all begins here
PUSH ESI              ;guard register for Windows
XOR ECX,ECX           ;clear the collector
MOV ESI,[ESP+2*DWORD] ;ESI=address of string

ALIGN 4           ;line it up
@@:
MOV EDX,[ESI]     ;the whole word and nothing but the word
SUB EDX,'0000'    ;convert to binary bytes
TEST EDX,80808080H ;any zero bytes?
JNZ @F            ;yes, do one byte at a time
MOV EAX,08040201H ;compaction constant
MUL EDX           ;compaction operation
SHL ECX,4         ;make room for a nibble
SHR EAX,24        ;line up nibble
ADD ESI,DWORD     ;next DWORD
OR  ECX,EAX       ;tack on nibble
JMP @B            ;do next DWORD

@@:
MOVZX EDX,BYTE PTR [ESI]  ;one byte at a time
TEST EDX,EDX              ;zero byte?
JE @F                     ;yes, all done
INC ESI                   ;advance pointer one byte
LEA   ECX,[ECX*2+EDX-'0'] ;tack on nibble
JMP @B                    ;do next byte

@@:
MOV EAX,ECX ;hand off for Windows
POP ESI     ;restore for Windows

RET DWORD ;return to sender and balance the stack
Title: Re: Binary string to dword
Post by: MichaelW on January 06, 2005, 11:22:56 AM
Ratch,

You are correct about needing to test with 80808080h. I ran some tests before I decided to use 80000000h, but I didn't test all possible combinations. I was not aware that reversing the compaction constant would reverse the order of the bytes at their final destination. Doing so allowed me to remove the bswap from my procedure, which did speed it up somewhat, but even so yours is faster by  ~8%.

:U
Title: Re: Binary string to dword
Post by: MichaelW on January 08, 2005, 08:23:57 PM
The attachment contains the two best implementations (IMO) along with a new test app that uses my clock_ctr macros. I took the liberty of reformatting the code posted by Petroizki and Ratch (I hope neither of you will be offended).


[attachment deleted by admin]
Title: Re: Binary string to dword
Post by: Mark_Larson on January 10, 2005, 04:48:38 PM
Quote from: MichaelW on January 06, 2005, 12:03:17 AM

Mark,

Could you post the P4 clock counts for reference?


I've been bad.  Played WOW ( World of Warcraft) all weekend ;).  So didn't run anyting.

For the new code on my P3 the small version runs in 59 cycles ,and the fast version runs in 33 cycles.


When I cut and paste btodw_fast into the framework I already have for timing it runs in 40 cycles.  My version sets the priority to REALTIME before doing any timings.  I am not sure what the difference are.  If I have time later I will post my code.
Title: Re: Binary string to dword
Post by: MichaelW on January 10, 2005, 07:55:21 PM
If I eliminate the code that compensates for the loop overhead, the cycle count goes from 39 to 45.

In the process of checking the above I noticed that the cycle counts for btodw_small increase from 59 to 72. The loop in the procedure is already aligned, and aligning the procedure, or the invoke that calls it, does not correct the problem. This leads me to suspect that something in the macros needs to be aligned.
Title: Re: Binary string to dword
Post by: Mark_Larson on January 10, 2005, 09:31:29 PM
  Let me cut and paste the timing portion of my code.  To calculate the 6 cycles of overhead that I subtract out below, I simply commented out the call to the procedure.  You automated that part in your start macro. 

  I think I figured out your bug.  Some people who ran your code posted negative times.  You can't have a negative time unless the portion oof the start macro that calculates the loop overhead is calculating too large of a value.

When you get the macro's flushed out, we need to make that post a sticky.  We get asked all the time for how to do timing code.



LOOP_CNT    equ     10000000

    invoke GetCurrentProcess
    invoke SetPriorityClass,eax,REALTIME_PRIORITY_CLASS

    cpuid
    rdtsc
    push edx
    push eax

    mov ecx,LOOP_CNT
align 4
outer_loop:
    push ecx

    invoke btodw_fast,ADDR buffer

    pop  ecx


    dec  ecx
    jnz  outer_loop


    cpuid
    rdtsc
    pop  esi
    sub  eax,esi
    pop  edi
    sbb  edx,edi

    emms

    mov  esi,LOOP_CNT
    div  esi
;accumulates 6 cycles of overhead AFTER the division
   sub eax,6
    invoke wsprintf,ADDR time_msg,ADDR time_fmt,eax
    invoke StdOut,ADDR time_msg
Title: Re: Binary string to dword
Post by: MichaelW on January 10, 2005, 10:29:29 PM
The only reason I can see for the loop overhead being wrong is alignment. And experimenting with adding variable numbers of NOPs at the start of the program, I determined that the result is definitely affected by the overall alignment. So I did essentially as Agner Fog did in his WTEST.ASM, and modified the clockctr_begin macro adding an ALIGN 16 in front of the test data, the reference loop, and the test loop. After doing so the cycle counts stabilized at 81 and 39, independent of the overall alignment. New code in the attachment.



[attachment deleted by admin]
Title: Re: Binary string to dword
Post by: Mark_Larson on January 11, 2005, 06:24:34 AM
Quote from: MichaelW on January 10, 2005, 10:29:29 PM
The only reason I can see for the loop overhead being wrong is alignment. And experimenting with adding variable numbers of NOPs at the start of the program, I determined that the result is definitely affected by the overall alignment. So I did essentially as Agner Fog did in his WTEST.ASM, and modified the clockctr_begin macro adding an ALIGN 16 in front of the test data, the reference loop, and the test loop. After doing so the cycle counts stabilized at 81 and 39, independent of the overall alignment. New code in the attachment.



WOO WOO.   Good job. :)  The 39 is only one cycle off what I got ( which was 40).  So it sounds like you got it tweaked and working :) 
Title: Re: Binary string to dword
Post by: Vortex on January 18, 2005, 10:24:39 PM
Hi MichaelW,

The results on my PIV 2.66 GHz:
btodw_small : 63 clocks
btodw_fast  : 61 clocks
Title: Re: Binary string to dword
Post by: MichaelW on January 19, 2005, 06:14:51 PM
Hi Vortex,

Interesting, a > 2:1 advantage on a P3 and a negligible advantage on a P4.

I had previously tried correcting what appeared to me to be dependency problems with the btodw_fast code (moving one instruction), and altering the alignment, but no improvement on a P3 so I discarded the changes. I have now added a btodw_fast_p4 procedure that includes these two changes. I also added a prompt at the start of the program, because a short delay after startup seems to improve the consistency of the results for the first test. Running on a P3 the clock cycle counts are 81, 39, 39.  New code in the attachment.



[attachment deleted by admin]
Title: Re: Binary string to dword
Post by: Vortex on January 19, 2005, 06:35:11 PM
Hi MichaelW,

Thanks for the new release, here the latest results:

1st  run:61,62,70
2nd run:61,62,74
3rd  run:60,63,68
Title: Re: Binary string to dword
Post by: MichaelW on January 19, 2005, 10:28:18 PM
Quote
1st run:61,62,70
2nd run:61,62,74
3rd run:60,63,68
Obviously, my optimizations weren't :bg I suspect the call to call variation in cycle counts indicates some serious problems with the code. I think the (original) fast version should be faster than the small version on any processor. I would like to know what the problems are, but unfortunately I gave my only P4 system to a nephew early last year.