Hi,
I was thinking about doing a binary to decimal display routine
using a multiply rather than dividing by ten as is usually done.
I have read discussions of this, but never seen any code for the
X86. So I hope this is also relatively new to some of you.
The idea is simple in decimal, the number is treated as a
fraction, and multiplying by ten pulls off the leading digit.
123 => .123, 123 * 10 = 1.230, pull off the integer and repeat.
So on a decimal computer it probably could make sense.
So I coded up a routine to do the equivalent with a binary
number to how good it would be. I used byte numbers to do a proof
of concept. And basically it works, but the conventional divide
algorithm is better. Thus no effort to write a word or larger
routine.
It requires word multiplies to process a byte (actually three
digit) number, whereas a divide works with a byte divide. And the
suppression of leading zeroes looks to be more involved. With the
change in number of digits to print out, the first multiplication
would require a different multiplier, and the divide routines do
not require changes in the divisor. The first multiplier has a bit
of tolerance with a byte, but for usage up to 999 only 290H worked.
So a word version should be doable, though there seems no point.
Examples using the numbers 255 and 999:
Thinking in terms of fixed point arithmetic, 0FFH is .99609375;
.5 + .25 + .125 + .0625 + .03125 + .015625 + .0078125 + .00390625.
And 0290H (656) is then 2.5625. 2.5625 * 0.99609375 = 2.55249,
giving the first digit and the proper fraction so that multiplying
by ten can get the next digits. And 999 => 3E7H, 3E7 * 290 = 9FFF0,
which is decimal 9.99+. You can see that the math is not exact, a
minor nit that the divide algorithm avoids.
Code follows, the SCALL macro was written for DOS, so replace for
another environment. I can attach the macro file if it is allowed,
but it was not created by me. Dated 1984 by ZDS, with a boilerplate
message. Though I see no real proprietary content.
Comments?
Steve N.
TITLE - BINary to DECimal conversion, test inverted logic.
COMMENT *
Do a binary to decimal routine trying to use a left to right
output order using a multiply, rather than a divide.
Do with bytes at first to simplify debugging.
26 June 2008 by SRN.
*
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
.XCREF
.XLIST
INCLUDE DEFMS.ASM ; MACROs and MS-DOS definitions from Heath/Zenith software.
.LIST
.CREF
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; Example of usage.
MOV AL,BYTE PTR [Dividend]
CALL Bin2DecM ; Prints AL with min 3 Digits.
CALL Newline
SCALL EXIT ; .EXE exit
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; BINary to DECimal conversion a different way. Assume unsigned numbers.
; The number in AL is printed to the console as three digits. Actually
; lightly tested up to 999.
; 26 June 2008, 19 September 2008 cleaned up for posting.
; INPUT: AL contains the number to be converted to decimal ASCII.
; OUTPUT: No registers changed.
Bin2DecM:
PUSH AX ; Save register used. Change to EAX as needed.
PUSH BX
PUSH DX
XOR AH,AH ; Optional to ensure AH is clear.
MOV BX,290H ; Multiplier to get first digit into DL.
MUL BX ; Do a fixed point 8:8 x 8:8 multiply to get a 16:16.
PUSH AX ; Calling DOS functions destroy AX.
ADD DL,'0' ; Convert binary to ASCII.
SCALL CONOUT ; Print leading decimal digit.
POP AX ; And restore fraction.
MOV BX,10 ; Multiplier to get remaining digits.
MUL BX ; And repeat as necessary.
PUSH AX
ADD DL,'0'
SCALL CONOUT
POP AX
MUL BX
ADD DL,'0' ; The last digit, so no need to preserve AX.
SCALL CONOUT
POP DX ; Restore and return.
POP BX
POP AX
RET
And of course this morning, figured out how to do it all with byte
arithmetic. This is because the first multiplier has its low nybble
set to zero. One can shift 290H down to 29H and not lose precision.
This then places the integer in the high nybble of AH, were is in the
middle of bit soup, and must be shifted up into DL or down into the
low nybble of AH.
The first is equivalent to two multiplies and an implicit use of
word size logic. Not to mention getting the four bits out of AH/AX
and into DL/DX. Not much of a gain over explicit use of a single word
multiply with the integer ending up in DL for free.
The second is then equivalent to a multiply followed by a divide
through the use of a word shift (or byte shifts, and byte rotates
through carry), and then requires a move of AH to DL for each digit.
And on checking, it produces the wrong answer due to a loss of
precision.
Ugly. But it implies a word (dword) size routine might be possible
using only word (dword) logic, if the multiplier gets really lucky,
and you cheat a little.
Phooey, I thought this was dead. I guess i'll have to get a
stake to finish it off (calculate the numbers for those cases).
It figures that I only figure these things out after posting.
Oops,
Steve N.
I am going to analyze your code. I know you are going to be posting new code. You want to make sure you don't do any 16-bit code in Windows in Intel processors. It is very slow. You want to use a byte or dword.
Quote from: FORTRANS on September 19, 2008, 04:40:14 PM
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; BINary to DECimal conversion a different way. Assume unsigned numbers.
; The number in AL is printed to the console as three digits. Actually
; lightly tested up to 999.
; 26 June 2008, 19 September 2008 cleaned up for posting.
; INPUT: AL contains the number to be converted to decimal ASCII.
; OUTPUT: No registers changed.
Bin2DecM:
; With Windows you don't need to preserve ax, or dx ( eax, or edx).
; you also don't have to preserve ecx, so you should use it in place of bx.
; pushing AX shouldn't work under windows, the stack is 32-bit
PUSH AX ; Save register used. Change to EAX as needed.
PUSH BX
PUSH DX
;this causes a stall since it only updates part of hte
; register, use xor eax,eax
XOR AH,AH ; Optional to ensure AH is clear.
MOV BX,290H ; Multiplier to get first digit into DL.
Quote from: Mark_Larson on September 20, 2008, 06:27:00 PM
I am going to analyze your code.
Good. Nice to see some interest.
Quote
I know you are going to be posting new code. You want to make sure you don't do any 16-bit code in Windows in Intel processors. It is very slow. You want to use a byte or dword.
This is just "proof of concept" and/or algorithm tweaking. If it
proves to be useful I will follow your guidelines.
; With Windows you don't need to preserve ax, or dx ( eax, or edx).
; you also don't have to preserve ecx, so you should use it in place of bx.
; pushing AX shouldn't work under windows, the stack is 32-bit
Noted. I am developing using DOS as I am more familar with it,
and not currently set up for windows. I _am_ trying to get a Windows
environment set up. I am preserving registers due to the debug
style environment. And of course if the word version pans out, I
will look at a double word version, and will try to follow Windows
coding conventions.
PUSH AX ; Save register used. Change to EAX as needed.
;this causes a stall since it only updates part of hte
; register, use xor eax,eax
XOR AH,AH ; Optional to ensure AH is clear.
Um, no can do, the input is in AL. Is
AND AX,00FFH
better than
XOR AH,AH
? Is
AND EAX,000000FFH
again better?
Thank you for your inputs.
Regards,
Steve N.
Steve,
If you can make the conversion is your direct coding from 16 bit to full 32 bit it will be like getting out of a T model ford into a formula one car, the difference is so great. Full FLAT memory model gives you gigabytes of address space, more instructions that are a lot faster and the addressing is cleaner and simpler.
Hi Hutch,
I'm working on it. It would be nice to not run out of memory.
But about half (or a bit more) of my projects targets an 80186
with DOS 5.0. Hmm, more instructions to misuse.
And I procrastinate. I've download your MASM32 package, and
I'll put it on a computer when I find the disk space on the proper
one of them them. And I got tied up on the current project.
Needed to code up a fixed point arithmetic decoder. Kept making
errors trying to calculate the constants.
Best regards,
Steve N.
I rarely do "windows" programs. Drives me nuts. I always do "console" programs under Windows, which is exactly like running under DOS, except you can't use DOS interrupts. But you get the full 32-bit mode. It's also easier to program ( less coding) than doing Windows programming. Your programs are really similar to DOS programs ( I was a big DOS programmer, go figure).
I'd highly recommend you try it. You can still call the Win32 API, which is really important, and you don't have to worry about a Window.
The one exception is if I am doing 3D programming, I have to use a Window, but some 3D APIs allow for window creation through the API, and you don't have to do much.
SDL does that, and that is what I use. Supports Windows and Linux, so you can write one set of code and use both OSes. Supports threads, audio and other cool stuff, that also ports back and forth. Windows and Linux use really different threading models.
You have to do VERY little in order to do a Window in SDL as compared to Windows.
It also allows you to look for events (keyboard, mouse), other than that, there is no other Windows stuff that you have to handle.
It supports at the low level, both DirectX and OpenGl, or just a software renderer. I use the software renderer when I do my frame buffer for raytracing. If you want to do hardware acceleration you can pick DirectX or OpenGl, the api under SDL is the same for both. You just have to specifiy which one you want.
www.libsdl.org
I have been using it for quite a few years, and I love it :)
Hi,
Coded up the 16-bit binary to decimal conversion using 32-bit code.
Tried to use word logic, but when I finally got the correct multiplier
it demonstrated that a word has insufficient precision. And I got to
write up a fixed point number display routine to speed up the search
for the best multiplier. I tried to follow Mark_Larson's suggestions
for 32-bit code. Replace the macro to output DL and it should work in
Windows.
Summary:
I coded up an algorithm to convert a binary number to ASCII decimal
using multiplies rather than the usual divide based routines.
Pros;
Uses multiplies rather than divides. I do not know how much that
matters on current processors, but it sounded good at the beginning.
It outputs the digits in a left to right fashion. This means that
no temporary storage is needed, unlike most divide based algorithms.
Those tend to generate the digits right to left, saving them, to then
display them right to left.
Cons;
It uses five multiplies even if the number is small. A divide based
routine can check for zero after each digit is created, and exit early.
Of course that then requires testing the number, conditional jumps, and
other logic.
The multiply algorithm requires more precision than the divide based
routine. To convert a word, a number larger than a word is used. That
one will finish this exercise for this investigation. The byte sized
routine could work around that to a certain extent but the word routine
can't.
If you do not want to display the leading zeroes, the divide routine
is probably easier to use.
If you want to print out a different number of digits, the initial
multiplier must be recalculated. The divide routine just uses ten for
any size number.
Regards,
Steve.
P.S.
Hi Mark_Larson,
Just saw your post. Hey, that looks interesting.
SRN
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; BINary to DECimal conversion using multiplies. Assume unsigned numbers.
; The number in AX is printed to the console as five digits. Actually,
; works with all five digit numbers.
; 21 Sep Start word logic. Which didn't work.
; 23 September 2008, start double word version.
; INPUT: AX or low half of EAX contains the 16 bit number to be
; converted to decimal ASCII.
; Uses: EAX, ECX, EDX
; Calls: SCALL CONOUT, macro to write DL to standard output.
Bin2DecM:
AND EAX,0000FFFFH ; Optional safety check, limit to 16 bits.
MOV EDX,00068DB9H ; Multiplier to get leading decimal digit
; into low byte of EDX (DL).
MUL EDX ; Do a fixed point 16:16 x 16:16 multiply to get
; a 32:32 result.
PUSH EAX
ADD DL,'0' ; Convert binary to ASCII.
SCALL CONOUT ; Print leading decimal digit.
POP EAX ; And restore fraction.
MOV ECX,10 ; Multiplier to get remaining digits.
MUL ECX ; Second digit.
PUSH EAX
ADD DL,'0'
SCALL CONOUT
POP EAX
MUL ECX ; And repeat as necessary.
PUSH EAX
ADD DL,'0'
SCALL CONOUT
POP EAX
MUL ECX
PUSH EAX
ADD DL,'0'
SCALL CONOUT
POP EAX
MUL ECX
ADD DL,'0'
SCALL CONOUT ; Final digit
RET
hi, fortrans,
this is a very interesting idea! I've test your code and there was no problems for all values (0-0ffffh).
Because I'm not familiar with fixed-Point Arithmetic i want to ask the following:
Is it possible to obtain more than one digit in one step (multiplication by FP-constant)? For example from the DWORD-value 0123456789 the leading 5 digits (01234)
regards, qWord
EDIT: an example to my question:
mov eax,Fpconstant
mov edx,075BCD15h ; = 0123456789
mul edx
now edx = 04d2h = 1234
is this doable?
I modified the code to copy the digits to a buffer, so I could compare cycle counts with the MASM32 dwtoa procedure. Running on my P3 the modified code is more than twice as fast as dwtoa. Typical results:
39 cycles, Bin2DecM
95 cycles, dwtoa
If the code were modified to handle the full 32-bit range, and optimized, it might still be faster than dwtoa, even if the parameters were passed on the stack.
[attachment deleted by admin]
Quote from: qWord on September 23, 2008, 10:01:34 PM
EDIT: an example to my question:
mov eax,Fpconstant
mov edx,075BCD15h ; = 0123456789
mul edx
now edx = 04d2h = 1234
is this doable?
Yes it's doable!
you basically multiply the number (rounded) by (2^32/10^(something))
for example:
mov eax,1234567890; **
mov edx,01ADH; 2^32/10000000
mul edx; you'll get 123
mov eax,1234567890
mov edx,010C7H; 2^32/1000000
mul edx; you'll get 1234
mov eax,1234567890
mov edx,0A7C6H; 2^32/100000
mul edx; you'll get 12345
mov eax,1234567890
mov edx,68DB9H; 2^32/10000
mul edx; you'll get 123456
mov eax,1234567890
mov edx,418937h; 2^32/1000
mul edx; you'll get 1234567you can add some extra precision by multiplying the magic number by 2^x (so you can adjust by shifting right)
mov eax,1234567890; **
mov edx,1AD7F2Ah; (2^32/10000000)*2.0^16
mul edx
shr edx,16; you'll get 123 Of course this constants have to be carefully tested as it may happen to loose precision on some numbers and get wrong results.
The longer the integer part of division of 2^x/10^y is the less is the chance for error. (large enough magic number)
thx drizz,
I've just got the idea that it could be possible to obtain the quotient and reminder of and division by 100000 with only two multiplications (particularly with regard to SIMD-instructions), but afaics the precession is the problem.
regards qWord
guys, read this topic http://www.masm32.com/board/index.php?topic=8974.0 ,you recreate the michaelw/pdixon algo...
Quote from: qWord on September 24, 2008, 02:24:43 AMbut afaics the precession is the problem
it's ok until you don't exceed 100000...
qWord check this out:
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
DwToStr2 proc dwValue,pBuffer
push edi
push ebx
mov edi,[esp+2*4+8];buf
mov ebx,[esp+1*4+8];val
;; split the value to two five digit numbers
mov edx,0A7C5AC47h; 1/100000
lea eax,[ebx+1]
mul edx
shr edx,16
mov eax,edx
imul edx,100000
sub ebx,edx
; first five
mov ecx,68DB9h
mul ecx
add dl,'0'
mov [edi+0],dl
xi = 1
rept 4
mov edx,10
mul edx
add dl,'0'
mov [edi+xi],dl
xi = xi + 1
endm
; next five
mov eax,ebx
mul ecx
add dl,'0'
mov [edi+5],dl
xi = 6
rept 4
mov edx,10
mul edx
add dl,'0'
mov [edi+xi],dl
xi = xi + 1
endm
mov byte ptr [edi+10],0
pop ebx
pop edi
ret 2*4
DwToStr2 endp
OPTION PROLOGUE:PROLOGUEDEF
OPTION EPILOGUE:EPILOGUEDEF
Hi MichaelW,
Thanks for the test case.
23 cycles, Bin2DecM
53 cycles, dwtoa
AMD 2000 MHz
However, you can comment out the push and pop of EAX, as that was used
because the macro destroys AX.
; PUSH EAX
ADD DL,'0'
;SCALL CONOUT
mov [ebx+1], dl
; POP EAX
NightWare,
Thanks for the link. Now that I wrote my own code, that makes more
sense than it did on first reading. My code is way simpler than that!
Now I see how that thread was compensating for the loss of precision.
It's food for thought. I Just tried adjusting my code to print another
digit. It is good up to 900,000, and fails somewhere greater than that.
qWord,
I see your question was answered. But I wrote a small DOS program
to play with 16:16 bit fixed point arithmetic in binary and decimal,
and I could post the binary if it interests you. I used it to find
the multiplier for my code. It was coded up quickly, so it may still
have a bug, though it seems alright. It looks something like.
Fixed Point to Decimal calculator.
Esc = Quit, 1 = Set, 0 = Reset, Move = Cursor
0 0 0 0 : 0 0 0 0 : 0 0 0 0 : 1 0 1 0 x 0 1 1 1 : 1 0 0 0 : 0 0 0 0 : 0 0 0 0
000A:7800
00010.4687500000000000
Regards,
Steve N.
Edit: The 900,00 is a bit wrong. the multiplier does not cover the full range.
The multiplier for 1 - 100,000 is too big for, say 400,000 - 500,000. You
would need to test for what range the number was and adjust for it.
like this
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
DwToStr5 proc dwValue,pBuffer
mov eax,[esp+1*4];val
mov edx,89705F41h; bits(4) == 3 == b, 2^(64-b)/1000000000
add eax,1
mov ecx,10
jz @F
mul edx
shrd eax,edx,(64-3) and 31
shr edx,(64-3) and 31
for mmx,<0,1,2,3,4,5,6>
%movd mm&mmx&,edx
mul ecx
endm
movd mm7,edx
punpcklbw mm0,mm1
punpcklbw mm2,mm3
punpcklbw mm4,mm5
punpcklbw mm6,mm7
punpcklwd mm0,mm2
punpcklwd mm4,mm6
punpckldq mm0,mm4
mul ecx
movd mm1,edx
mul ecx
movd mm2,edx
punpcklbw mm1,mm2
mov edx,'00'
mov eax,'0000'
mov ecx,[esp+2*4];buf
movd mm6,edx
movd mm7,eax
punpckldq mm7,mm7
paddb mm1,mm6
paddb mm0,mm7
movq [ecx+8],mm1
movq [ecx+0],mm0
ret 2*4
@@:
mov edx,[esp+2*4];buf
mov dword ptr [edx+0],'4924'
mov dword ptr [edx+4],'2769'
mov dword ptr [edx+8],'59'
ret 2*4
DwToStr5 endp
OPTION PROLOGUE:PROLOGUEDEF
OPTION EPILOGUE:EPILOGUEDEF
Quote from: drizz on September 24, 2008, 04:53:41 PM
it's fun reinventing the wheel :bg
yep, especially this algo, it's a very interesting one, stable (only small difference between large and small values), very fast (it appear slower than LUT in speed test, but in real use it's another thing :wink) and more important it's very adaptable to many case/digits...
Quote from: FORTRANS on September 24, 2008, 04:33:46 PM
... I could post the binary if it interests you.
look intersting, pleas post - TIA
-----
Quote from: drizz on September 24, 2008, 08:20:54 PM
like this
on my core2duo and it takes ~39 clocks
I've also written an function using SSE2:
;RETURN: eax == pointer in buffer
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
d2a proc uDword:DWORD,lpBuffer:DWORD
.data
align 16
fp_100k_div OWORD 0A7C5AC47h
fp_100k_mul OWORD 100000
fp_const QWORD 068DB9h, 068DB9h
fp_10 QWORD 10 , 10
fp_asc db 4 dup(030h)
db 030h,0,0,0
db 4 dup(030h)
db 030h,0,0,0
; fp_cmp db 10 dup (030h)
; db 6 dup (0)
.code
mov eax,DWORD ptr [esp+4] ; uDword
test eax,eax
.if !ZERO?
.if eax != -1
lea eax,[eax+1]
movd xmm0,eax
pmuludq xmm0,fp_100k_div
psrlq xmm0,16
movdqa xmm1,xmm0
pmuludq xmm1,fp_100k_mul
psrlq xmm0,32
psrlq xmm1,32
punpcklqdq xmm0,xmm1
pmuludq xmm0,OWORD ptr fp_const
movdqa xmm1,xmm0
pmuludq xmm1,OWORD ptr fp_10
movdqa xmm2,xmm1
pslld xmm1,8
pmuludq xmm2,OWORD ptr fp_10
movdqa xmm3,xmm2
pslld xmm2,16
pmuludq xmm3,OWORD ptr fp_10
movdqa xmm4,xmm3
pslld xmm3,24
por xmm0,xmm1
pmuludq xmm4,OWORD ptr fp_10
por xmm2,xmm3
psrlq xmm4,32
psllq xmm4,32
por xmm0,xmm2
mov eax,DWORD ptr [esp+8] ; lpBuffer
pxor xmm2,xmm2
psrlq xmm0,32
por xmm0,xmm4
paddb xmm0,OWORD ptr fp_asc
movdqa xmm1,xmm0
punpcklqdq xmm1,xmm2
punpckhqdq xmm2,xmm0
psrldq xmm2,3
por xmm1,xmm2
;movdqa xmm7,xmm1 ; suppress leading zeros
;pcmpeqb xmm7,OWORD ptr fp_cmp ;
;pmovmskb edx,xmm7 ;
;not edx ;
;bsf edx,edx ;
movdqa OWORD ptr [eax],xmm1
lea eax,[eax+edx]
ret 8
.else
mov eax,DWORD ptr [esp+8] ; lpBuffer
mov DWORD ptr [eax],034393234h
mov DWORD ptr [eax+4],032373639h
mov DWORD ptr [eax+8],03539h
ret 8
.endif
.else
mov eax,DWORD ptr [esp+8] ;lpBuffer
mov DWORD ptr [eax],030h
ret 8
.endif
d2a endp
OPTION PROLOGUE:PROLOGUEDEF
OPTION EPILOGUE:EPILOGUEDEF
its a bit faster than drizz's one:
(test-value = 1234567890)
without suppressing leading zeros: ~ 29 clocks :green
with suppressing: ~ 37 clocks
regards qWord
EDIT: there was some sensless instructions (movdqa xmm3,xmm0 and psubd xmm3,xmm1 ) in code. I've delete them out
Quote from: qWord on September 24, 2008, 11:59:40 PM
Quote from: FORTRANS on September 24, 2008, 04:33:46 PM
... I could post the binary if it interests you.
look intersting, pleas post - TIA
Here it is.
I should note that the thread that NightWare pointed out also shows that
leading zero suppression is easily done using the fact that a MULtiply sets the
carry and overflow flags if the high byte, word, or double word is nonzero. I
stated earlier that that looked messy. Oops. And using a constant in memory,
rather than loading it in a register, would save an instruction.
Regards,
Steve N.
[attachment deleted by admin]
Quote from: qWord on September 24, 2008, 11:59:40 PM
I've also written an function using SSE2:
Looks interesting, especially since it does not use the FPU registers. Can you extend it to qwords?
258 cycles for
float$ REAL8 1.234568 (http://www.masm32.com/board/index.php?topic=9756.msg72641#msg72641)
and what do you guys think of this one :)
; no frame
align 16
DwToStr7 proc dwValue,pBuffer
mov edx,089705F41H
mov eax,[esp+1*4];val
mul edx
mov [esp+1*4],ebx
add eax,070000000H
adc edx,0
movd mm0,edx
psrld mm0,1
and edx,01FFFFFFFH
mov ecx,eax
mov ebx,edx
shld edx,eax,2;mul by 5
shl eax,2
add eax,ecx
adc edx,ebx
mov ecx,00FFFFFFFH
movd mm1,edx
and edx,ecx
add edx,edx
lea edx,[edx*4+edx]
movd mm2,edx
and edx,ecx
add edx,edx
lea edx,[edx*4+edx]
movd mm3,edx
and edx,ecx
add edx,edx
lea edx,[edx*4+edx]
movd mm4,edx
and edx,ecx
add edx,edx
lea edx,[edx*4+edx]
movd mm5,edx
and edx,ecx
add edx,edx
lea edx,[edx*4+edx]
movd mm6,edx
and edx,ecx
add edx,edx
lea edx,[edx*4+edx]
movd mm7,edx
punpckldq mm0,mm1
punpckldq mm2,mm3
punpckldq mm4,mm5
punpckldq mm6,mm7
psrld mm0,32-4
psrld mm2,32-4
psrld mm4,32-4
psrld mm6,32-4
mov eax,'0000'
movd mm7,eax
packssdw mm0,mm2
packssdw mm4,mm6
punpckldq mm7,mm7
packsswb mm0,mm4
paddb mm0,mm7
and edx,ecx
add edx,edx
lea edx,[edx*4+edx]
and ecx,edx
shr edx,28
add ecx,ecx
lea ecx,[ecx*4+ecx]
shr ecx,28-8
mov ebx,[esp+1*4]
mov eax,[esp+2*4];buf
and ecx,0FF00h
lea edx,[edx+ecx+'00']
movq [eax+0],mm0
mov [eax+8],edx
ret 2*4
DwToStr7 endp
Quote from: jj2007 on September 25, 2008, 09:52:54 PMLooks interesting, especially since it does not use the FPU registers. Can you extend it to qwords?
jj, i've no time for a real8 version, but this real4 version may help you :
.DATA
ALIGN 16
_TCA_Simd_Multiplicateur_Decimales_ REAL4 100000000.0f
.CODE
ALIGN 16
;
; convertir une valeur IEEE en texte au format décimale signé (format : -x xxx xxx xxx . xxx xxx xx, soit 20 caractères
; en comptant le signe...).
; note : XMM0 et XMM1 sont modifiés
;
; syntaxe :
; mov eax,{the real4 value}
; mov esi,{OFFSET of the string to create}
; call Real4ToString
;
; Return :
; eax = length of the string
;
;
Real4ToString PROC
push ebx ;; empiler ebx
push ecx ;; empiler ecx
push edx ;; empiler edx
push esi ;; empiler esi
push edi ;; empiler edi
; on commence par obtenir la valeur absolue, on teste si la valeur est signée
btr eax,31 ;; tester le signe, et conserver la valeur absolue de eax
jnc Label00 ;; si le bit n'est pas positionné, aller Label00
mov BYTE PTR [esi],"-" ;; sinon, on place le signe - dans le premier octet de la chaine
inc esi ;; et on incrémente l'adresse en esi
; ensuite on va séparer la partie entière et la partie décimales
Label00: movd XMM1,eax ;; placer la variable das XMM1
cvttss2si eax,XMM1 ;; placer l'entier (sans arrondi) de XMM1 dans eax
cvtsi2ss XMM0,eax ;; placer l'entier en eax dans XMM0 au format réel4
subss XMM1,XMM0 ;; soustraire XMM0 à XMM1
mulss XMM1,_TCA_Simd_Multiplicateur_Decimales_ ;; multiplier les décimales restantes par le multiplicateur (maintenant XMM0 posséde les décimales dans la partie entière)
; là on teste s'il y a un arrondi à l'entier supérieur
cvtss2si edi,XMM1 ;; placer l'entier (les décimales multipliées par notre multiplicateur) de XMM1 dans edi
cmp edi,99999999 ;; on teste si il reste quelquechose dans edi (après un traitement décimale virtuel)
jnae Label01 ;; si ce n'est pas supérieur ou égal, il n'y a pas d'arrondi, alors aller Label01
inc eax ;; sinon, c'est qu'il faut arrondir à la valeur supérieure, donc on augmente eax d'1
Label01: mov ecx,eax ;; copier eax dans ecx
; inc eax ;; pour que la division qui suit donne le résultat exact (inutile puisque valeurs signées)
; jnz Label02 ;; si FFFFFFFFh+1 <>0, aller Label02
;; CasSpecial :
; mov DWORD PTR [esi],"4924" ;; ) placer directement la valeur correspondante
; mov DWORD PTR [esi+4],"2769" ;; )
; mov WORD PTR [esi+8],"59" ;; )
; jmp Label14 ;; aller Label14
;ALIGN 4
; on va diviser eax par 100000 de manière optimisée
Label02: mov edx,2814749768 ;; placer 2814749768 (remplacement de 2814749767 et eax+1) dans edx
; mov edx,2814749767 ;; placer 2814749767 dans edx
mul edx ;; multiplier eax par 2814749768
shr edx,16 ;; décaler edx de 16 bits à droite
mov eax,100000 ;; placer 100000 dans eax
mov ebx,edx ;; copier edx dans ebx
mul edx ;; multiplier edx par 100000
sub ecx,eax ;; soustraire le résultat en eax à ecx
test ebx,ebx ;; fixer les flags de ebx
mov edx,ebx ;; replacer ebx dans edx
mov ebx,10 ;; placer 10 (notre multiplicateur décimale) dans ebx
jz Label03 ;; si c'est égal à 0, aller Label03
; sinon, on traite la partie supérieure du nombre
mov eax,429497 ;; on remplace eax par 429497
mul edx ;; multiplier edx par 429497 (pour pouvoir extraire correctement les 5 décimales supérieures)
jc Label04 ;; s'il existe un dépassement, aller Label04
dec esi ;; décrémenter l'adresse
mul ebx ;; multiplier eax par 10
jc Label05 ;; s'il existe un dépassement, aller Label05
dec esi ;; décrémenter l'adresse
mul ebx ;; multiplier eax par 10
jc Label06 ;; s'il existe un dépassement, aller Label06
dec esi ;; décrémenter l'adresse
mul ebx ;; multiplier eax par 10
jc Label07 ;; s'il existe un dépassement, aller Label07
dec esi ;; décrémenter l'adresse
jmp Label08 ;; Label08
;ALIGN 4
; ici, on traite la partie inférieure du nombre
Label03: mov eax,429497 ;; on remplace eax par 429497
sub esi,5 ;; enlever 5 caractères à esi
mul ecx ;; multiplier ecx par 429497 (pour pouvoir extraire correctement les 5 décimales supérieures)
jc Label09 ;; s'il existe un dépassement, aller Label09
dec esi ;; décrémenter l'adresse
mul ebx ;; multiplier eax par 10
jc Label10 ;; s'il existe un dépassement, aller Label10
dec esi ;; décrémenter l'adresse
mul ebx ;; multiplier eax par 10
jc Label11 ;; s'il existe un dépassement, aller Label11
dec esi ;; décrémenter l'adresse
mul ebx ;; multiplier eax par 10
jc Label12 ;; s'il existe un dépassement, aller Label12
dec esi ;; décrémenter l'adresse
jmp Label13 ;; aller Label13
;ALIGN 4
; ici, on va placer XXXXXXXXXX et le zéro final
Label04: add dl,"0" ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi],dl ;; placer l'octet en dl à l'adresse en esi
mul ebx ;; multiplier eax par 10
Label05: add dl,"0" ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+1],dl ;; placer l'octet en dl à l'adresse en esi
mul ebx ;; multiplier eax par 10
Label06: add dl,"0" ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+2],dl ;; placer l'octet en dl à l'adresse en esi
mul ebx ;; multiplier eax par 10
Label07: add dl,"0" ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+3],dl ;; placer l'octet en dl à l'adresse en esi
Label08: mul ebx ;; multiplier eax par 10
add dl,"0" ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+4],dl ;; placer l'octet en dl à l'adresse en esi
mov eax,429497 ;; on remplace eax par 429497
mul ecx ;; multiplier ecx par 429497 (pour pouvoir extraire correctement les 5 décimales supérieures)
Label09: add dl,"0" ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+5],dl ;; placer l'octet en dl à l'adresse en esi
mul ebx ;; multiplier eax par 10
Label10: add dl,"0" ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+6],dl ;; placer l'octet en dl à l'adresse en esi
mul ebx ;; multiplier eax par 10
Label11: add dl,"0" ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+7],dl ;; placer l'octet en dl à l'adresse en esi
mul ebx ;; multiplier eax par 10
Label12: add dl,"0" ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+8],dl ;; placer l'octet en dl à l'adresse en esi
Label13: mul ebx ;; multiplier eax par 10
add dl,"0" ;; ajouter le caractère de base "0" à la valeur en dx
mov BYTE PTR [esi+9],dl ;; placer l'octet en dl et le 0 final en dh à l'adresse en esi
; ensuite on teste si il y a une partie décimale
Label14: add esi,10 ;; (peut pas regrouper cette addition avec celle qui suit, a cause du cas spécial...)
test edi,edi ;; ) pas de décimales ?, alors aller Label18
jz Label18 ;; )
cmp edi,99999999 ;; ) on a déjà arrondi le nombre ?, alors aller Label18
ja Label18 ;; )
; on poursuis en placant le point de séparation et les décimales
mov eax,edi ;; placer les décimales dans eax
lea edi,[esi+1] ;; sauvegarder l'adresse (du début des décimales) dans edi
mov BYTE PTR [esi],"." ;; sinon, écrire le point de séparation
add esi,9 ;; placer la limite de la chaîne (adresse de début des décimales + le nombre de décimales maxi) dans esi
mov ecx,3435973837 ;; placer notre multiplicateur magique dans ecx
Label15: dec esi ;; décrémenter l'adresse à écrire
mov ebx,eax ;; sauvegarder eax dans ebx
mul ecx ;; ) diviser eax par 10
shr edx,3 ;; )
mov eax,edx ;; copier le dépassement obtenu en edx dans eax
lea edx,[edx*4+eax] ;; ) multiplier eax par 10, et placer le résultat dans edx
add edx,edx ;; )
sub ebx,edx ;; soustraire edx à ebx
jz Label15 ;; si ebx est égal à 0 (rien n'a à être inscript), aller Label15
add bl,"0" ;; sinon, ajouter le caractère de base (et passer à la boucle suivante...)
mov BYTE PTR [esi],bl ;; sauvegarde du caractére
cmp esi,edi ;; ) test à effectuer, s'il n'y a qu'un seul caractère
jbe Label17 ;; )
push esi ;; sauvegarder l'adresse du dernier caractère
; ici, on n'a plus a se soucier des zéros de fin
Label16: dec esi ;; décrémenter l'adresse à écrire
mov ebx,eax ;; sauvegarder eax dans ebx
mul ecx ;; ) diviser eax par 10
shr edx,3 ;; )
mov eax,edx ;; copier le dépassement obtenu en edx dans eax
lea edx,[edx*4+edx] ;; ) multiplier edx par 10
add edx,edx ;; )
sub ebx,edx ;; soustraire edx à ebx
add bl,"0" ;; ajouter le caractère de base
mov BYTE PTR [esi],bl ;; sauvegarde du caractére
cmp esi,edi ;; comparer l'adresse à celle de début de la partie décimale
ja Label16 ;; tant que c'est supérieur, aller Label16
; enfin, on sort
pop esi ;; restaurer l'adresse du dernier caractère
Label17: inc esi ;; incrémenter l'adresse pour placer le zéro final
; sortie alternative... (quand pas de partie décimale)
Label18: mov BYTE PTR [esi],0 ;; placer le 0 final
mov eax,esi ;; copier esi (l'adresse en cours) dans eax
pop edi ;; désempiler edi
pop esi ;; désempiler esi
pop edx ;; désempiler edx
pop ecx ;; désempiler ecx
pop ebx ;; désempiler ebx
sub eax,esi ;; pour obtenir la taille de la chaîne créée dans eax
ret ;; retourner (sortir de la procédure)
Real4ToString ENDP
Quote from: NightWare on September 26, 2008, 02:13:07 AM
jj, i've no time for a real8 version, but this real4 version may help you :
Merci beaucoup, je vais voir si ça accelère les r4.
here 2 uDw2A algos, except instead of 5+5 digits it's 2+4+4 and 4+4+2 (easier to convert to mmx/sse2), both algos are a bit slower for large values, but a bit faster for small (most used) values. beside, here (due to the used "divisions") no need to take care of FFFFFFFFh.
ALIGN 16
;
; convert a dword to ascii string (2+4+4 digits)
;
; syntax :
; mov eax,{the value}
; mov esi,{OFFSET of the string to create}
; call uDw2A
;
; Return :
; eax = length of the string
;
uDw2A PROC
push ebx ;; empiler ebx
push ecx ;; empiler ecx
push edx ;; empiler edx
push esi ;; empiler esi
push edi ;; empiler edi
; on va diviser eax par 10000 de manière optimisée
mov edx,3518437209 ;; placer 3518437209 dans edx
mov ecx,eax ;; copier eax dans ecx
mul edx ;; multiplier eax par 3518437209
shr edx,13 ;; décaler edx de 13 bits à droite
mov eax,10000 ;; placer 10000 dans eax
mov edi,edx ;; copier edx dans edi
mul edx ;; multiplier edx par 10000
sub ecx,eax ;; soustraire le résultat en eax à ecx
test edi,edi ;; fixer les flags de edi
mov edx,edi ;; replacer edi dans edx
mov edi,10 ;; placer 10 (notre multiplicateur décimale) dans edi
jz Label01 ;; si c'est égal à 0, aller Label01
; on va diviser eax par 10000 de manière optimisée
mov eax,3518437209 ;; placer 3518437209 dans eax
mov ebx,edx ;; copier edx dans ebx
mul edx ;; multiplier eax par 3518437209
shr edx,13 ;; décaler edx de 13 bits à droite
mov eax,10000 ;; placer 10000 dans eax
mov edi,edx ;; copier edx dans edi
mul edx ;; multiplier edx par 10000
sub ebx,eax ;; soustraire le résultat en eax à ecx
test edi,edi ;; fixer les flags de edi
mov edx,edi ;; replacer edi dans edx
mov edi,10 ;; placer 10 (notre multiplicateur décimale) dans edi
jz Label00 ;; si c'est égal à 0, aller Label00
;ALIGN 4
; sinon, on traite la partie XX-------- du nombre
mov eax,429496730 ;; on remplace eax par 429496730
mul edx ;; multiplier eax par 10
jc Label02 ;; s'il existe un dépassement, aller Label02
dec esi ;; décrémenter l'adresse
jmp Label03 ;; aller Label03
;ALIGN 4
; sinon, on traite la partie --XXXX---- du nombre
Label00: sub esi,2 ;; enlever 2 caractères à esi (XX--------)
mov eax,4294968 ;; on remplace eax par 4294968
mul ebx ;; multiplier ecx par 4294968 (pour pouvoir extraire correctement les 4 décimales supérieures)
jc Label04 ;; s'il existe un dépassement, aller Label04
dec esi ;; décrémenter l'adresse
mul edi ;; multiplier eax par 10
jc Label05 ;; s'il existe un dépassement, aller Label05
dec esi ;; décrémenter l'adresse
mul edi ;; multiplier eax par 10
jc Label06 ;; s'il existe un dépassement, aller Label06
dec esi ;; décrémenter l'adresse
jmp Label07 ;; aller Label07
;ALIGN 4
; ici, on traite la partie ------XXXX du nombre
Label01: sub esi,6 ;; enlever 6 caractères à esi (XXXXXX----)
mov eax,4294968 ;; on remplace eax par 4294968
mul ecx ;; multiplier ecx par 4294968 (pour pouvoir extraire correctement les 4 décimales supérieures)
jc Label08 ;; s'il existe un dépassement, aller Label08
dec esi ;; décrémenter l'adresse
mul edi ;; multiplier eax par 10
jc Label09 ;; s'il existe un dépassement, aller Label09
dec esi ;; décrémenter l'adresse
mul edi ;; multiplier eax par 10
jc Label10 ;; s'il existe un dépassement, aller Label10
dec esi ;; décrémenter l'adresse
jmp Label11 ;; aller Label11
;ALIGN 4
; ici, on va placer XXXXXXXXXX et le zéro final
Label02: add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi],dl ;; placer l'octet en dl à l'adresse en esi
Label03: mul edi ;; multiplier eax par 10
add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+1],dl ;; placer l'octet en dl à l'adresse en esi
mov eax,4294968 ;; on remplace eax par 4294968
mul ebx ;; multiplier ebx par 4294968 (pour pouvoir extraire correctement les 4 décimales supérieures)
Label04: add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+2],dl ;; placer l'octet en dl à l'adresse en esi
mul edi ;; multiplier eax par 10
Label05: add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+3],dl ;; placer l'octet en dl à l'adresse en esi
mul edi ;; multiplier eax par 10
Label06: add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+4],dl ;; placer l'octet en dl à l'adresse en esi
Label07: mul edi ;; multiplier eax par 10
add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+5],dl ;; placer l'octet en dl à l'adresse en esi
mov eax,4294968 ;; on remplace eax par 4294968
mul ecx ;; multiplier ecx par 4294968 (pour pouvoir extraire correctement les 4 décimales supérieures)
Label08: add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+6],dl ;; placer l'octet en dl à l'adresse en esi
mul edi ;; multiplier eax par 10
Label09: add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+7],dl ;; placer l'octet en dl à l'adresse en esi
mul edi ;; multiplier eax par 10
Label10: add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+8],dl ;; placer l'octet en dl à l'adresse en esi
Label11: mul edi ;; multiplier eax par 10
add dx,30h ;; ajouter le caractère de base "0" à la valeur en dx
mov WORD PTR [esi+9],dx ;; placer l'octet en dl et le 0 final en dh à l'adresse en esi
lea eax,[esi+10] ;; copier esi (l'adresse en cours+10) dans eax
pop edi ;; désempiler edi
pop esi ;; désempiler esi
pop edx ;; désempiler edx
pop ecx ;; désempiler ecx
pop ebx ;; désempiler ebx
sub eax,esi ;; pour obtenir la taille de la chaîne créée dans eax
ret ;; retourner (sortir de la procédure)
uDw2A ENDP
ALIGN 16
;
; convert a dword to ascii string (4+4+2 digits)
;
; syntax :
; mov eax,{the value}
; mov esi,{OFFSET of the string to create}
; call uDw2A
;
; Return :
; eax = length of the string
;
uDw2A PROC
push ebx ;; empiler ebx
push ecx ;; empiler ecx
push edx ;; empiler edx
push esi ;; empiler esi
push edi ;; empiler edi
; on va diviser eax par 100 de manière optimisée
mov edx,2748779070 ;; placer 2748779069+1 dans edx
mov ecx,eax ;; copier eax dans ecx
mul edx ;; multiplier eax par 3518437209
shr edx,6 ;; décaler edx de 6 bits à droite
mov eax,100 ;; placer 100 dans eax
mov edi,edx ;; copier edx dans edi
mul edx ;; multiplier edx par 10000
sub ecx,eax ;; soustraire le résultat en eax à ecx
test edi,edi ;; fixer les flags de edi
mov edx,edi ;; replacer edi dans edx
mov edi,10 ;; placer 10 (notre multiplicateur décimale) dans edi
jz Label01 ;; si c'est égal à 0, aller Label01
; on va diviser eax par 10000 de manière optimisée
mov eax,3518437209 ;; placer 3518437209 dans eax
mov ebx,edx ;; copier edx dans ebx
mul edx ;; multiplier eax par 3518437209
shr edx,13 ;; décaler edx de 19 bits à droite
mov eax,10000 ;; placer 10000 dans eax
mov edi,edx ;; copier edx dans edi
mul edx ;; multiplier edx par 10000
sub ebx,eax ;; soustraire le résultat en eax à ecx
test edi,edi ;; fixer les flags de edi
mov edx,edi ;; replacer edi dans edx
mov edi,10 ;; placer 10 (notre multiplicateur décimale) dans edi
jz Label00 ;; si c'est égal à 0, aller Label00
;ALIGN 4
; sinon, on traite la partie XXXX------ du nombre
mov eax,4294968 ;; on remplace eax par 4294968 (pour pouvoir extraire correctement les 4 décimales supérieures)
mul edx ;; multiplier eax par 10
jc Label02 ;; s'il existe un dépassement, aller Label02
dec esi ;; décrémenter l'adresse
mul edi ;; multiplier eax par 10
jc Label03 ;; s'il existe un dépassement, aller Label03
dec esi ;; décrémenter l'adresse
mul edi ;; multiplier eax par 10
jc Label04 ;; s'il existe un dépassement, aller Label04
dec esi ;; décrémenter l'adresse
jmp Label05 ;; s'il existe un dépassement, aller Label05
;ALIGN 4
; sinon, on traite la partie ----XXXX-- du nombre
Label00: mov eax,4294968 ;; on remplace eax par 4294968 (pour pouvoir extraire correctement les 4 décimales supérieures)
sub esi,4 ;; enlever 4 caractères à esi (XXXX------)
mul ebx ;; multiplier ebx par 4294968
jc Label06 ;; s'il existe un dépassement, aller Label06
dec esi ;; décrémenter l'adresse
mul edi ;; multiplier eax par 10
jc Label07 ;; s'il existe un dépassement, aller Label07
dec esi ;; décrémenter l'adresse
mul edi ;; multiplier eax par 10
jc Label08 ;; s'il existe un dépassement, aller Label08
dec esi ;; décrémenter l'adresse
jmp Label09 ;; s'il existe un dépassement, aller Label09
;ALIGN 4
; ici, on traite la partie --------XX du nombre
Label01: mov eax,429496730 ;; on remplace eax par 429496730
sub esi,8 ;; enlever 8 caractères à esi (XXXXXXXX--)
mul ecx ;; multiplier ecx par 429496730 (pour pouvoir extraire correctement les 2 décimales supérieures)
jc Label10 ;; s'il existe un dépassement, aller Label10
dec esi ;; décrémenter l'adresse
jmp Label11 ;; aller Label11
;ALIGN 4
; ici, on va placer XXXXXXXXXX et le zéro final
Label02: add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi],dl ;; placer l'octet en dl à l'adresse en esi
mul edi ;; multiplier eax par 10
Label03: add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+1],dl ;; placer l'octet en dl à l'adresse en esi
mul edi ;; multiplier eax par 10
Label04: add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+2],dl ;; placer l'octet en dl à l'adresse en esi
Label05: mul edi ;; multiplier eax par 10
add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+3],dl ;; placer l'octet en dl à l'adresse en esi
mov eax,4294968 ;; on remplace eax par 4294968
mul ebx ;; multiplier ebx par 4294968 (pour pouvoir extraire correctement les 4 décimales supérieures)
Label06: add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+4],dl ;; placer l'octet en dl à l'adresse en esi
mul edi ;; multiplier eax par 10
Label07: add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+5],dl ;; placer l'octet en dl à l'adresse en esi
mul edi ;; multiplier eax par 10
Label08: add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+6],dl ;; placer l'octet en dl à l'adresse en esi
Label09: mul edi ;; multiplier eax par 10
add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+7],dl ;; placer l'octet en dl à l'adresse en esi
mov eax,429496730 ;; on remplace eax par 429496730
mul ecx ;; multiplier ecx par 4294968 (pour pouvoir extraire correctement les 4 décimales supérieures)
Label10: add dl,30h ;; ajouter le caractère de base "0" à la valeur en edx
mov BYTE PTR [esi+8],dl ;; placer l'octet en dl à l'adresse en esi
Label11: mul edi ;; multiplier eax par 10
add dx,30h ;; ajouter le caractère de base "0" à la valeur en dx
mov WORD PTR [esi+9],dx ;; placer l'octet en dl et le 0 final en dh à l'adresse en esi
lea eax,[esi+10] ;; copier esi (l'adresse en cours+10) dans eax
pop edi ;; désempiler edi
pop esi ;; désempiler esi
pop edx ;; désempiler edx
pop ecx ;; désempiler ecx
pop ebx ;; désempiler ebx
sub eax,esi ;; pour obtenir la taille de la chaîne créée dans eax
ret ;; retourner (sortir de la procédure)
uDw2A ENDP
and here a sse2 test, 21 cycles on my computer :
.DATA
ALIGN 16
Div100 DWORD 0A3D70A3Eh,000000000h,000000000h,000000000h ;; 0,0,0,2748779070 (pratiquer un shr ,6 ensuite)
Div10000 DWORD 0D1B71759h,000000000h,000000000h,000000000h ;; 0,0,0,3518437209 (pratiquer un shr ,13 ensuite)
Mul10x2 DWORD 00000000Ah,000000000h,00000000Ah,000000000h ;; 0,10,0,10
Mul100 DWORD 000000064h,000000000h,000000000h,000000000h ;; 0,0,0,100
Mul10000 DWORD 000002710h,000000000h,000000000h,000000000h ;; 0,0,0,10000
PushBits100 DWORD 01999999Ah,000000000h,000000000h,000000000h ;; 0,0,0,429496730
PushBits10000x2 DWORD 000418938h,000000000h,000418938h,000000000h ;; 0,4294968,0,4294968
BaseDecValues DWORD 030303030h,030303030h,000003030h,000000000h ;; "0000000000"
.CODE
ALIGN 16
;
; convert a dword to ascii string, sse2 version but leading zeros
;
; syntax :
; mov eax,{the value}
; mov esi,{OFFSET of the string to create}
; call uDw2A_Sse2
;
; Return :
; nothing
;
uDw2A_Sse2 PROC
movd XMM3,eax ;; XMM3 = 0,0,0,Val
; on sépare les parties XXXX,XXXX et XX
movss XMM6,DWORD PTR Div100 ;; XMM6 = 0,0,0,2748779070
movss XMM7,DWORD PTR Mul100 ;; XMM7 = 0,0,0,100
movdqa XMM4,XMM3 ;; XMM4 = _,_,_,Val
pmuludq XMM6,XMM3 ;; XMM6 = _,_,Hi+Mi,_
psrlq XMM6,32+6 ;; XMM6 = 0,0,0,Hi+Mi/100
movss XMM3,XMM6 ;; XMM3 = 0,0,0,Hi+Mi/100
pmuludq XMM7,XMM6 ;; XMM7 = 0,0,0,Hi+Mi
psubd XMM4,XMM7 ;; XMM4 = 0,0,0,Lo
; on sépare les parties XXXX et XXXX
movss XMM6,DWORD PTR Div10000 ;; XMM6 = 0,0,0,3518437209
movss XMM7,DWORD PTR Mul10000 ;; XMM7 = 0,0,0,10000
movdqa XMM1,XMM3 ;; XMM1 = _,_,_,Hi+Mi
pmuludq XMM6,XMM3 ;; XMM6 = _,_,Hi,_
psrlq XMM6,32+13 ;; XMM6 = 0,0,0,Hi/10000
movss XMM0,XMM6 ;; XMM0 = 0,0,0,Hi/10000
pmuludq XMM7,XMM6 ;; XMM7 = 0,0,0,Hi
psubd XMM1,XMM7 ;; XMM4 = 0,0,0,Mi
; ici, on calcul et sépare les valeurs des caractères
movdqa XMM6,OWORD PTR Mul10x2 ;; XMM6 = 0,10,0,10
movlhps XMM0,XMM1 ;; XMM0 = 0,Mi,0,Hi
pmuludq XMM0,OWORD PTR PushBits10000x2 ;; XMM0 = _,Mi*4294968 (D4),_,Hi*4294968 (D0)
pmuludq XMM4,OWORD PTR PushBits100 ;; XMM0 = _,_,_,Lo*429496730 (D8)
movdqa XMM1,XMM0 ;; XMM1 = _Mi,_,Hi
pmuludq XMM1,XMM6 ;; XMM1 = _,Mi*10 (D5),_,Hi*10 (D1)
movdqa XMM5,XMM4 ;; XMM5 = _,_,_,Lo
movdqa XMM2,XMM1 ;; XMM2 = _Mi,_,Hi
pmuludq XMM2,XMM6 ;; XMM2 = _,Mi*10 (D6),_,Hi*10 (D2)
pmuludq XMM5,XMM6 ;; XMM1 = _,_,_,Lo*10 (D9)
movdqa XMM3,XMM2 ;; XMM3 = _Mi,_,Hi
pmuludq XMM3,XMM6 ;; XMM3 = _,Mi*10 (D7),_,Hi*10 (D3)
; ici, on fusionne les caractères
punpcklbw XMM4,XMM5 ;; XMM4 = _,D9+D8,_,_
pslld XMM1,8 ;; XMM1 = D5,_,D1,_
pslld XMM2,16 ;; XMM2 = D6,_,D2,_
pslld XMM3,24 ;; XMM3 = D7,_,D3,_
por XMM0,XMM1 ;; ) XMM0 = Mi,_,Hi,_
por XMM0,XMM2 ;; )
por XMM0,XMM3 ;; )
shufps XMM0,XMM4,0EDh ;; XMM0 = _,Lo,Mi,Hi
; ici, on ajoute le caractère de base aux caractères obtenus, et on sauvegarde
paddb XMM0,OWORD PTR BaseDecValues ;; XMM0 = _,Lo,Mi,Hi + "0000000000"
movdqa OWORD PTR [esi],XMM0 ;; placer la valeur
; movdqa XMM1,XMM0 ;; ) suppress leading zeros
; pcmpeqb XMM1,OWORD PTR BaseDecValues ;; )
; pmovmskb edx,XMM1 ;; )
; not edx ;; )
; bsf edx,edx ;; )
; add esi,edx ;; )
ret ;; retourner (sortir de la procédure)
uDw2A_Sse2 ENDP
Quote from: qWord on September 24, 2008, 02:24:43 AM
thx drizz,
I've just got the idea that it could be possible to obtain the quotient and reminder of and division by 100000 with only two multiplications (particularly with regard to SIMD-instructions), but afaics the precession is the problem.
regards qWord
.
Actually, there is a slight significance from what your saying.
Look here. http://www.masm32.com/board/index.php?topic=10039.0
Check the lastDigit,thirdDigit.... section below.
The only difference is that its hex to decimal conversion and it is build on 16-bit DOS asm.
As matter of fact it is possible. From my experience in 16-bit programming, the quotient will be stored in ax while the remainder is stored in dx if you divide ax with dx.