News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Bin$

Started by jj2007, June 20, 2008, 01:39:06 AM

Previous topic - Next topic

hutch--

JJ,

sinsi's code still uses the bintable so it larger than you have listed. The table adds 2k of data.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: hutch-- on June 23, 2008, 02:16:50 PM
JJ,

sinsi's code still uses the bintable so it larger than you have listed. The table adds 2k of data.

No, it doesn't.

Sinsi, does the sub eax, eax have a function? I timed it repeatedly with and without but could not find a real difference.

    mov esi,[edi+eax*8+4]
    sub eax, eax  <--
    mov [edx],ebx

NightWare

Quote from: sinsi on June 23, 2008, 09:33:34 AM
[Is there any way of cutting the size of the table down? Maybe somehow using the high bit to cut it down to 128? I can't think now, too pissed... :bg
hmm, my attempt :
.DATA
ALIGN 4
BinaryTable DWORD "0000","1000","0100","1100","0010","1010","0110","1110"
DWORD "0001","1001","0101","1101","0011","1011","0111","1111"

.CODE
ALIGN 16
;
; Syntax :
; mov eax,valeur
; mov esi,String address
; call nwDw2Bin
;
nwDw2Bin PROC
mov BYTE PTR [esi+32],0
mov ecx,28
Label1: mov edx,eax
and edx,00000000Fh
mov edx,DWORD PTR [BinaryTable+edx*4]
shr eax,4
mov DWORD PTR [esi+ecx],edx
sub ecx,DWORD
jns Label1
ret
nwDw2Bin ENDP

jj2007

Welcome on board!

Testing correctness of results for BIN$, pbin2 and nwDw2Bin
11110111000000001111111100000000        BIN$
11110111000000001111111100000000        pbin2
11110111000000001111111100000000        nwDw2Bin
11110111000000001111111100000000        original value

00001000111111110000000011111111
00001000111111110000000011111111
00001000111111110000000011111111        nwDw2Bin
00001000111111110000000011111111

00100011111111000000001111111100
00100011111111000000001111111100
00100011111111000000001111111100        nwDw2Bin
00100011111111000000001111111100

10101010101010101010101010101010
10101010101010101010101010101010
10101010101010101010101010101010        nwDw2Bin
10101010101010101010101010101010

00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000        nwDw2Bin
GetKeyState (VK_CAPITAL), i.e. Caps Lock

40 cycles timing BIN$            139 bytes      472 LAMPs
45 cycles timing pbin2           147 bytes      546 LAMPs
67 cycles timing nwDw2Bin        101 bytes      673 LAMPs
128 cycles timing dw2bin_ex      2140 bytes     5921 LAMPs
351 cycles timing Dword2Bin2     57 bytes       2650 LAMPs

LAMPs = Lean And Mean Points = cycles * sqrt(size)

[attachment deleted by admin]

hutch--

Just as a note, I am buried in the middle of a mounain of work at the moment so I don't have enough time to track this topic in real detail but it would be very useful to get a conclusion that produced two seperate algos, one for streaming like the table based masm32 lib version and a much smaller one for a non "_ex" procedure.

Something I remember is an algorithm of this type that Michael Webster wrote a couple of years ago that would handle byte, word and dword sized binary conversions but I don't know where it is any longer
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

sinsi

Quote from: jj2007 on June 23, 2008, 03:09:17 PM
Quote from: hutch-- on June 23, 2008, 02:16:50 PM
JJ,

sinsi's code still uses the bintable so it larger than you have listed. The table adds 2k of data.

No, it doesn't.

    mov edi,offset bintable

yes it does.

Quote
Sinsi, does the sub eax, eax have a function? I timed it repeatedly with and without but could not find a real difference.

    mov esi,[edi+eax*8+4]
    sub eax, eax  <--
    mov [edx],ebx


    sub eax,eax
    mov [edx],ebx
    mov 4[edx],esi

    mov 32[edx],al   ;<---

I was always told to use a register...old habits die hard.
Light travels faster than sound, that's why some people seem bright until you hear them.

NightWare

stalls reduced  :red :
mov BYTE PTR [esi+32],0
mov ecx,32
Label1: mov edx,eax
and edx,00000000Fh
shr eax,4
mov edx,DWORD PTR [BinaryTable+edx*4]
sub ecx,DWORD
mov DWORD PTR [esi+ecx],edx
jnz Label1

hutch--

NightWare,

I have just had a quick play and formatted your algo into a library module. I wonder how much slower it would be with arguments passed on the stack rather than in registers ?


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    .486                      ; maximum processor model
    .model flat, stdcall      ; memory model & calling convention
    option casemap :none      ; case sensitive

    ; Syntax :
    ; mov eax, DWORD value
    ; mov esi, String address
    ; call nwDw2Bin

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    align 16

  ; ------------------------------------
  ; NightWare's DWORD to bin string algo
  ; ------------------------------------

nwDw2Bin PROC

    push esi

    .data
      BinaryTable DWORD "0000","1000","0100","1100","0010","1010","0110","1110"
                  DWORD "0001","1001","0101","1101","0011","1011","0111","1111"
    .code

    mov BYTE PTR [esi+32], 0
    mov ecx, 32

  lbl0:
    mov edx, eax
    and edx, 00000000Fh
    shr eax, 4
    mov edx, DWORD PTR [BinaryTable+edx*4]
    sub ecx, 4
    mov DWORD PTR [esi+ecx], edx
    jnz lbl0

    pop esi

    ret

nwDw2Bin ENDP

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

hutch--

It got bigger but it also got faster.

Results

00000000000000000000010011010010 sinsi_ex
00000000000000000000010011010010 NightWare
00000000000000000000010011010010 hutch_ex2

656 NightWare
547 sinsi_ex
657 hutch_ex2
656 NightWare
547 sinsi_ex
656 hutch_ex2
656 NightWare
547 sinsi_ex
657 hutch_ex2
656 NightWare
547 sinsi_ex
656 hutch_ex2
656 NightWare
547 sinsi_ex
657 hutch_ex2
672 NightWare
547 sinsi_ex
656 hutch_ex2
656 NightWare
547 sinsi_ex
656 hutch_ex2
656 NightWare
547 sinsi_ex
656 hutch_ex2

NightWare timing average = 658
sinsi_ex  timing average = 547
hutch_ex2 timing average = 656
Press any key to continue ...


Algo

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

    align 16

  ; ---------------------------------------------
  ; Modified NightWare's DWORD to bin string algo
  ; ---------------------------------------------

NightWare PROC value:DWORD,buffer:DWORD

    .data
      align 16
      BinaryTable DWORD "0000","1000","0100","1100","0010","1010","0110","1110"
                  DWORD "0001","1001","0101","1101","0011","1011","0111","1111"
    .code

    push esi
    push edi

    mov eax, [esp+4][8]     ;; value
    mov esi, [esp+8][8]     ;; buffer

    mov BYTE PTR [esi+32], 0
    mov ecx, 32

    mov edx, eax
    and edx, 00000000Fh
    shr eax, 4
    mov edx, DWORD PTR [BinaryTable+edx*4]
    mov DWORD PTR [esi+ecx-4], edx

    mov edx, eax
    and edx, 00000000Fh
    shr eax, 4
    mov edx, DWORD PTR [BinaryTable+edx*4]
    mov DWORD PTR [esi+ecx-8], edx

    mov edx, eax
    and edx, 00000000Fh
    shr eax, 4
    mov edx, DWORD PTR [BinaryTable+edx*4]
    mov DWORD PTR [esi+ecx-12], edx

    mov edx, eax
    and edx, 00000000Fh
    shr eax, 4
    mov edx, DWORD PTR [BinaryTable+edx*4]
    mov DWORD PTR [esi+ecx-16], edx

    mov edx, eax
    and edx, 00000000Fh
    shr eax, 4
    mov edx, DWORD PTR [BinaryTable+edx*4]
    mov DWORD PTR [esi+ecx-20], edx

    mov edx, eax
    and edx, 00000000Fh
    shr eax, 4
    mov edx, DWORD PTR [BinaryTable+edx*4]
    mov DWORD PTR [esi+ecx-24], edx

    mov edx, eax
    and edx, 00000000Fh
    shr eax, 4
    mov edx, DWORD PTR [BinaryTable+edx*4]
    mov DWORD PTR [esi+ecx-28], edx

    mov edx, eax
    and edx, 00000000Fh
    shr eax, 4
    mov edx, DWORD PTR [BinaryTable+edx*4]
    mov DWORD PTR [esi+ecx-32], edx

    pop edi
    pop esi

    ret 8

NightWare ENDP

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

I apologise for the fact that the attachments I supplied have become apparently so confusing that you dare no longer read them. As said above, Sinsi's modified code does no longer need a table, and in the final "BIN$" version it is also by far the fastest of all previous versions (including Nightware's code). So there is no longer a need for separate fast vs compact codes.

Cheers, JJ

hutch--

Here is a tweak of sinsi's version, basically some instruction re-ordering and swapping the names of some registers.


00000000000000000000010011010010 sinsi_ex
00000000000000000000010011010010 NightWare
00000000000000000000010011010010 hutch_ex2

657 NightWare
516 sinsi_ex
656 hutch_ex2
656 NightWare
516 sinsi_ex
657 hutch_ex2
656 NightWare
515 sinsi_ex
656 hutch_ex2
656 NightWare
515 sinsi_ex
656 hutch_ex2
657 NightWare
516 sinsi_ex
656 hutch_ex2
656 NightWare
516 sinsi_ex
657 hutch_ex2
656 NightWare
515 sinsi_ex
656 hutch_ex2
656 NightWare
515 sinsi_ex
656 hutch_ex2

NightWare timing average = 656
sinsi_ex  timing average = 515
hutch_ex2 timing average = 656
Press any key to continue ...


The algo

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 16

sinsi_ex proc var:DWORD,buffer:DWORD

    mov ecx,4[esp]
      push ebx
    mov edx,12[esp]
      push esi
    movzx esi,cl
      push edi

    mov edi,offset bintable

    mov ebx,[edi+esi*8]
    mov eax,[edi+esi*8+4]
      movzx esi,ch
    mov 24[edx],ebx
    mov 28[edx],eax

    mov ebx,[edi+esi*8]
    mov eax,[edi+esi*8+4]
      shr ecx,16
      movzx esi,cl
    mov 16[edx],ebx
    mov 20[edx],eax

    mov ebx,[edi+esi*8]
    mov eax,[edi+esi*8+4]
      movzx esi,ch
      mov BYTE PTR 32[edx], 0
    mov 8[edx],ebx
    mov 12[edx],eax

    mov ecx,[edi+esi*8]
    mov eax,[edi+esi*8+4]
      pop edi
      pop esi
      pop ebx
    mov [edx],ecx
    mov 4[edx],eax

    ret 8

sinsi_ex endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤


JJ, would you just post your fast algo so I can put it into a test piece ?
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Cycles            Code             size including tables etc.
47 cycles timing BIN$            139 bytes
56 cycles timing pbin2           147 bytes
107 cycles timing nwDw2Bin       101 bytes
107 cycles timing dw2bin_ex      2140 bytes
402 cycles timing Dword2Bin2     57 bytes

Usage:
invoke MessageBox, 0, BIN$(11111000001111100000111111110000b), chr$("A binary string:"), MB_OK
or
mov xxx, BIN$(11111000001111100000111111110000b)
or
mov edx, offset Dw2BinBuffer
mov eax, 11111000001111100000111111110000b
call Dword2Bin
invoke MessageBox, 0, addr Dw2BinBuffer, chr$("A binary string:"), MB_OK

The macro:
BIN$ MACRO dwArg:REQ, tgt:=<0>
  cc INSTR <dwArg>, <eax> ;; to avoid a mov eax, eax
  if cc eq 0
mov eax, dwArg ;; eax passes value to translate
  endif
  if @SizeStr(tgt) gt 1
mov edx, tgt ;; dest buffer - if 0, use Dw2BinBuffer
  else
mov edx, offset Dw2BinBuffer
  endif
  call Dword2Bin
  EXITM <edx>
ENDM

.data?
Dw2BinBuffer dd 9 dup (?) ; 32 are needed, we pad 4 for alignment
Dw2BinTable dd 2048/4 dup (?) ; former bintable now in .data? section


The proc
align 4
Dword2Bin proc
  ifndef Dw2BinBuffer ; credits to Sinsi of the Masm32 Forum
.data?
Dw2BinBuffer dd 36/4 dup (?) ; 32 are needed, we pad 4 for alignment
Dw2BinTable dd 2048/4 dup (?) ; old bintable
.code
  endif
  push ebx
  push esi
  push edi

  mov edi, offset Dw2BinTable ; old \masm32\m32lib\bintbl.asm
  mov ecx, [edi]
  .if ecx==0 ; seems you have to initialise your bintable...
push eax ; save value
push edx ; save destination
mov edx, offset Dw2BinTable+2048 ; destination bintable
mov esi, 63 ; outer loop counter
mov edi, 0FCFDFEFFh ; seed for creating table
btInit:
mov eax, edi
sub edi, 04040404h
mov ecx, 31 ; inner loop counter

@@: xor ebx, ebx
dec edx ; on exit, edx will point to the string
sar eax, 1 ; we work from right to left
adc ebx, 48 ; +48: Ascii 0 or, with carry, +49 : Ascii 1
mov [edx], bl
dec ecx ; decrement inner counter
jge @B ; dec sets only the sign flag

dec esi ; decrement outer counter
jge btInit ; dec sets only the sign flag
mov edi, edx
pop edx ; get destination
pop eax ; get value
  .endif

  movzx ecx,  al ; the value to translate
  mov ebx, [edi+ecx*8]
  mov esi, [edi+ecx*8+4]
  mov 24[edx], ebx
  mov 28[edx], esi

  movzx ecx,  ah
  mov ebx, [edi+ecx*8]
  mov esi, [edi+ecx*8+4]
  shr eax, 16
  mov 16[edx], ebx
  mov 20[edx], esi

  movzx ecx,  al
  mov ebx, [edi+ecx*8]
  mov esi, [edi+ecx*8+4]
  mov 8[edx], ebx
  mov 12[edx], esi

  movzx ecx,  ah
  mov ebx, [edi+ecx*8]
  mov esi, [edi+ecx*8+4]
  mov [edx], ebx
  mov 4[edx], esi
  mov 32[edx], ch ; null terminator
  pop edi
  pop esi
  pop ebx

  ret ; not: 8
Dword2Bin endp

jj2007

Quote from: sinsi on June 23, 2008, 11:59:48 PM

    mov edi,offset bintable

yes it does.
Sorry, I forgot to say that I ruthlessly eliminated the table from your code ;-)


    sub eax,eax
    mov [edx],ebx
    mov 4[edx],esi

    mov 32[edx],al   ;<---

I was always told to use a register...old habits die hard.
Quote

    movzx eax,ch
    mov ebx,[edi+eax*8]
    mov esi,[edi+eax*8+4]
    ; sub eax, eax
    mov [edx],ebx
    mov 4[edx],esi

    mov 32[edx],ah  ; al is non-zero

If you agree...  :bg

jj2007

Quote from: NightWare on June 24, 2008, 01:29:06 AM
stalls reduced  :red :

Thanks. The effect is not very clear, maybe the increased counter compensates part of the gain:
mov ecx, 28+4   ; +4 because sub ecx was shifted up??

EDIT:
I just saw that you changed the jns to jnz at the end of the loop.
If I use mov ecx, 32 and jnz, as in your last post, I get garbage.
If I use mov ecx, 28 and jnz, code produces correct results and is faster.
EDIT (2):
Forget Edit (1) and see next post below.

nwDw2Bin PROC
mov BYTE PTR [esi+32],0
mov ecx, 28
Label1:
mov edx, eax
and edx, 00000000Fh
shr eax,4
mov edx, DWORD PTR [BinaryTable+edx*4]
sub ecx, DWORD
; shr eax,4 moved up to reduce stalls
mov DWORD PTR [esi+ecx+4], edx
; sub ecx,DWORD moved up to reduce stalls
jnz Label1 ; ex jns Label1
ret

sinsi

Quote from: jj2007 on June 24, 2008, 08:28:27 AM
Sorry, I forgot to say that I ruthlessly eliminated the table from your code ;-)
Being ruthless is what asm is all about...

Quote from: jj2007 on June 24, 2008, 08:28:27 AM
If you agree...  :bg
Everyone hates a smartarse. :bdg
Anyway, it's hard to see through beer goggles. :dazzled:
Light travels faster than sound, that's why some people seem bright until you hear them.