News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

bin2byte_ex

Started by jj2007, November 16, 2009, 09:13:31 PM

Previous topic - Next topic

jj2007

Just for fun, and because I haven't posted in the Lab for a long time, here a bin2byte_ex variant...

Timings for Celeron M:
259     cycles for BinVal1
55      cycles for BinVal2
54      cycles for BinVal3
68      cycles for bin2byte_exLib

269     cycles for BinVal1
56      cycles for BinVal2
53      cycles for BinVal3
85      cycles for bin2byte_exLib

224     cycles for BinVal1
55      cycles for BinVal2
53      cycles for BinVal3
85      cycles for bin2byte_exLib

0
3
12
48
192
255

Code size:
40 bytes P1
89 bytes P2
85 bytes P3
8760 bytes Lib


dedndave

prescott, Jochen   :U
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

229     cycles for BinVal1
79      cycles for BinVal2
69      cycles for BinVal3
109     cycles for bin2byte_exLib

221     cycles for BinVal1
65      cycles for BinVal2
72      cycles for BinVal3
109     cycles for bin2byte_exLib

225     cycles for BinVal1
85      cycles for BinVal2
84      cycles for BinVal3
109     cycles for bin2byte_exLib

0
3
12
48
192
255

Code size:
40 bytes P1
89 bytes P2
85 bytes P3
8760 bytes Lib

oex

 :bg Awesome, if I had had more time on my hands I would have preempted, I've been pretty impressed with ur time trials jj you introduced me to sse :) I dare not go head to head with you now you've started :lol

Something I would note though... I rarely want to convert just 1 byte just dl'd your code, not looked yet but I would think variable length binbyte (and vise versa) would be better

AMD Sempron(tm) Processor 3100+ (SSE3)

162     cycles for BinVal1
74      cycles for BinVal2
52      cycles for BinVal3
65      cycles for bin2byte_exLib

165     cycles for BinVal1
73      cycles for BinVal2
52      cycles for BinVal3
65      cycles for bin2byte_exLib

166     cycles for BinVal1
74      cycles for BinVal2
52      cycles for BinVal3
64      cycles for bin2byte_exLib

0
3
12
48
192
255

Code size:
40 bytes P1
89 bytes P2
85 bytes P3
8760 bytes Lib

I've only been coding ASM properly for 6 months, you guys make me feel slow :D
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

dedndave

don't feel bad - that nice processor you have makes us feel the same way

jj2007

Quote from: oex on November 16, 2009, 10:34:31 PM
I dare not go head to head with you now you've started :lol

Don't be shy :bg

Quote
I rarely want to convert just 1 byte just dl'd your code, not looked yet but I would think variable length binbyte

Yes indeed. This is just for fun, and the library version does exactly that: convert one byte of text. In MasmBasic I have the Val() macro which autodetects decimal and hex, but adding binary would require two more tests (for 1111b and 1111y), so I am not eager to implement binary autodetection.
If you really want variable length, check the BinVal1 algo, and replace bvpError with bvpDone :wink

oex

If only it were mine Dave but work owns it  :lol

Also yes jj my implimentation was 5 bit however I mentioned because your post brought your sse2_copy function to mind from another post

I'm having a look at the code now but dont hold your breath :lol

Only just realised what [esp+4] is doh  ::)  :red
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

qWord

just for addition: an sse2 version  :green
OPTION PROLOGUE : NONE
OPTION EPILOGUE : NONE
align 16
simd_bin2byte proc psz:DWORD

mov eax,[esp+4]
movsd xmm0,QWORD ptr [eax]
pxor xmm1,xmm1
psllq xmm0,7
punpcklbw xmm0,xmm1
pshuflw xmm0,xmm0,00011011y
pshufhw xmm0,xmm0,00011011y
packuswb xmm0,xmm0
pshufd xmm0,xmm0,011100001y
pmovmskb eax,xmm0
and eax,0ffh
ret 4

simd_bin2byte endp


regards, qWord
FPU in a trice: SmplMath
It's that simple!

oex

lol

I dont think I can make it any quicker, not completely sure of rules and not want to plagerise your work so my only suggestion for #3 is to remove shl eax, 4 and use ebx in place of second edx (might work lol)
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

jj2007

Quote from: qWord on November 16, 2009, 11:58:07 PM
just for addition: an sse2 version  :green
OPTION PROLOGUE : NONE
OPTION EPILOGUE : NONE
align 16
simd_bin2byte proc psz:DWORD

mov eax,[esp+4]
movsd xmm0,QWORD ptr [eax]
pxor xmm1,xmm1
psllq xmm0,7
punpcklbw xmm0,xmm1
pshuflw xmm0,xmm0,00011011y
pshufhw xmm0,xmm0,00011011y
packuswb xmm0,xmm0
pshufd xmm0,xmm0,011100001y
pmovmskb eax,xmm0
and eax,0ffh
ret 4

simd_bin2byte endp


regards, qWord

Cute but unnecessarily bloated, young friend :green

   and eax,0ffh
   movzx eax, al

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

245     cycles for BinVal1
55      cycles for BinVal2
55      cycles for BinVal3
38      cycles for simd_bin2byte
72      cycles for bin2byte_exLib

219     cycles for BinVal1
56      cycles for BinVal2
54      cycles for BinVal3
38      cycles for simd_bin2byte
72      cycles for bin2byte_exLib

243     cycles for BinVal1
55      cycles for BinVal2
54      cycles for BinVal3
38      cycles for simd_bin2byte
76      cycles for bin2byte_exLib

0
3
12
48
192
255
1431655765=1431655765

Code size:
27 bytes P1
89 bytes P2
85 bytes P3
50 bytes SSE2 qWord


EDIT: I forgot to mention that the new    BinVal1 is slow but small and flexible: It eats up to 31 chars.
push chr$("01010101010101010101010101010101y")
call BinVal1

oex

We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

qWord

Quote from: jj2007 on November 17, 2009, 12:11:50 AMCute but unnecessarily bloated, young friend :green
You've got me :lol
FPU in a trice: SmplMath
It's that simple!

drizz

The truth cannot be learned ... it can only be recognized.

jj2007

Quote from: drizz on November 17, 2009, 03:51:26 AM
Hello,

http://www.masm32.com/board/index.php?topic=4709.0

mine is at the bottom of page 1

I should have known that this topic was already covered...
Yours and Lingo's "fast" algo are included now. Note that only one algo (BinVal1) is able to interpret 1010y or 10101010101010b properly, all others require either a trailing nullbyte or a fixed 8-bit value.

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

445     cycles for BinVal1
104     cycles for BinVal2
85      cycles for BinVal3
121     cycles for BinToDw (drizz)
119     cycles for BinToDw2 (drizz+JJ)
116     cycles for b2dw1 (Lingo)
79      cycles for simd_bin2byte
110     cycles for bin2byte_exLib

461     cycles for BinVal1
73      cycles for BinVal2
85      cycles for BinVal3
120     cycles for BinToDw (drizz)
170     cycles for BinToDw2 (drizz+JJ)
113     cycles for b2dw1 (Lingo)
77      cycles for simd_bin2byte
107     cycles for bin2byte_exLib

453     cycles for BinVal1
88      cycles for BinVal2
65      cycles for BinVal3
121     cycles for BinToDw (drizz)
119     cycles for BinToDw2 (drizz+JJ)
113     cycles for b2dw1 (Lingo)
83      cycles for simd_bin2byte
167     cycles for bin2byte_exLib


Testing b2dw1 (2 lines must match):
0, 3, 12, 48, 192, 255, 1431655765=1431655765
0, 3, 12, 48, 192, 255, -1431655765=1431655765

Code size:
27 bytes BinVal1
89 bytes BinVal2
85 bytes BinVal3
50 bytes SSE2 qWord
25 bytes BinToDw
25 bytes BinToDw2
133 bytes b2dw1
8760 bytes bin2byte_ex (Masm32 library)

hutch--

This is on a quad.


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)

279     cycles for BinVal1
37      cycles for BinVal2
38      cycles for BinVal3
100     cycles for BinToDw (drizz)
67      cycles for BinToDw2 (drizz+JJ)
44      cycles for b2dw1 (Lingo)
27      cycles for simd_bin2byte
50      cycles for bin2byte_exLib

265     cycles for BinVal1
37      cycles for BinVal2
37      cycles for BinVal3
70      cycles for BinToDw (drizz)
70      cycles for BinToDw2 (drizz+JJ)
44      cycles for b2dw1 (Lingo)
27      cycles for simd_bin2byte
50      cycles for bin2byte_exLib

267     cycles for BinVal1
37      cycles for BinVal2
37      cycles for BinVal3
100     cycles for BinToDw (drizz)
67      cycles for BinToDw2 (drizz+JJ)
44      cycles for b2dw1 (Lingo)
27      cycles for simd_bin2byte
49      cycles for bin2byte_exLib


Testing b2dw1 (2 lines must match):
0, 3, 12, 48, 192, 255, 1431655765=1431655765
0, 3, 12, 48, 192, 255, -1431655765=1431655765

Code size:
27 bytes BinVal1
89 bytes BinVal2
85 bytes BinVal3
50 bytes SSE2 qWord
25 bytes BinToDw
25 bytes BinToDw2
133 bytes b2dw1
8760 bytes bin2byte_ex (Masm32 library)

Hit any key to get outta here

Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Brand new BinValSSE eats variable size Bin$ and stops at the first byte that is neither 0 nor 1. Isn't that cute?

Timings are for 8-bit input. The BinValSSE algo uses the bitswap algo shown here.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
88      cycles for BinValSSE
208     cycles for BinVal0
56      cycles for BinVal3
91      cycles for BinToDw (drizz)
100     cycles for BinToDw2 (drizz+JJ)
73      cycles for b2dw1 (Lingo)
47      cycles for simd_bin2byte
67      cycles for bin2byte_exLib