There are two old conversion routines in the masm32 library that I would like to modernise, the signed version "dwtoa" and an unsigned version done by Comrade that has a very similar looking conversion done in it.
They both perform the conversion then reverse the string to get it in the right order. I would be interested in seeing if anyone has a better or faster way to do either of these conversions.
Whats the actual specification?
So people dont have to go looking at the masm32 source, the current lib routines are
dwtoa proc dwValue:DWORD, lpBuffer:DWORD
; -------------------------------------------------------------
; convert DWORD to ascii string
; dwValue is value to be converted
; lpBuffer is the address of the receiving buffer
; EXAMPLE:
; invoke dwtoa,edx,ADDR buffer
;
; Uses: eax, ecx, edx.
; -------------------------------------------------------------
push ebx
push esi
push edi
mov eax, dwValue
mov edi, [lpBuffer]
test eax,eax
jnz sign
zero:
mov word ptr [edi],30h
jmp dtaexit
sign:
jns pos
mov byte ptr [edi],'-'
neg eax
add edi, 1
pos:
mov ecx, 3435973837
mov esi, edi
.while (eax > 0)
mov ebx,eax
mul ecx
shr edx, 3
mov eax,edx
lea edx,[edx*4+edx]
add edx,edx
sub ebx,edx
add bl,'0'
mov [edi],bl
add edi, 1
.endw
mov byte ptr [edi], 0 ; terminate the string
; We now have all the digits, but in reverse order.
.while (esi < edi)
sub edi, 1
mov al, [esi]
mov ah, [edi]
mov [edi], al
mov [esi], ah
add esi, 1
.endw
dtaexit:
pop edi
pop esi
pop ebx
ret
dwtoa endp
and
udw2str proc dwNumber:DWORD, pszString:DWORD
push ebx
push esi
push edi
mov eax, [dwNumber]
mov esi, [pszString]
mov edi, [pszString]
mov ecx,429496730
@@redo:
mov ebx,eax
mul ecx
mov eax,edx
lea edx,[edx*4+edx]
add edx,edx
sub ebx,edx
add bl,'0'
mov [esi],bl
inc esi
test eax, eax
jnz @@redo
jmp @@chks
@@invs:
dec esi
mov al, [edi]
xchg [esi], al
mov [edi], al
inc edi
@@chks:
cmp edi, esi
jb @@invs
pop edi
pop esi
pop ebx
ret
udw2str endp
The large magic numbers (3435973837 and 429496730) are 0.32 fixed point reciprocals (0.8 and 0.1, respectively) so that divisions dont have to be performed.
I'm not sure why 0.9 wasn't used in the first case.. maybe a precision issue. Later I will post a reliability study of these two fixed point reciprocals to prove that they work for all 32-bit input values.
Testing the 0.1 reciprocal...
elapsed time = 10.4776606000014 seconds
errors = 644245094
first error = 1073741829
1073741829 / 10 = 107374182.9
(1073741829 * 429496730) >> 32 = 107374183
done
udw2str() is broken!!
Testing the 0.8 reciprocal
elapsed time = 4.49280790000194 seconds
errors = 0
done
The 0.8 reciprocal is good for all unsigned 32-bit inputs (needs unsigned multiplication!)
The first routine works by multiplying by 0.8, and then dividing by 8 (via shift) to get the 0.1 reciprocal actually needed.
The reciprocals for 0.9, 0.4 and 0.2 also do not work, all erroring out below before hitting 2^31
Quote from: hutch-- on August 17, 2010, 04:40:43 PMThey both perform the conversion then reverse the string to get it in the right order. I would be interested in seeing if anyone has a better or faster way to do either of these conversions.
Sounds familiar (http://www.masm32.com/board/index.php?topic=11781.msg89037#msg89037)... Str$() does the job, and is a bit faster than wsprintf but slower than dwtoa. It avoids the reversion by anticipating the overall length and working backwards. But I would not recommend that road for a general purpose dword-only routine. Reversing itself may cost around 16 cycles with Lingo's routine (http://www.masm32.com/board/index.php?topic=14041.msg111377#msg111377).
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
xmm0=12345.68
237 cycles for unsigned Str$(xmm0)
231 cycles for unsigned Str$(12345)
460 cycles for wsprintf 12345
65 cycles for dwtoa
If anyone is interested in comparisons, the umdtoa (and smdtoa for signed dwords) in the latest version of the fpulib is quite similar to the dwtoa routine. The minor differences are:
- no string reversal, but the address of the first character of the null-terminated string is returned in EAX
- the number of characters of the string (excluding the terminating 0) is returned in ECX
Similar procedures for qwords (umqtoa and smqtoa) are also included in the same fpulib.
Thanks Ray,
Its good to get someone whose maths is better developed than mine. :U (eenie meanie minie moe technology) :bg
Ray,
I am getting a 404 page not found error on the link on your page.
No matter, I cheated and FTP downloaded them.
Quote from: raymond on August 18, 2010, 02:19:38 AM
If anyone is interested in comparisons, the umdtoa (and smdtoa for signed dwords) in the latest version of the fpulib is quite similar to the dwtoa routine. The minor differences are:
- no string reversal, but the address of the first character of the null-terminated string is returned in EAX
- the number of characters of the string (excluding the terminating 0) is returned in ECX
Same speed as dwtoa, at least on my P4.
MasmBasic Str$() returns a zero-delimited string in eax and the number of
used chars in edx - useful for parsing & replacing. It converts also qwords and similar stuff.
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
xmm0=12345.68
12345
12345
568 cycles for unsigned Str$(xmm0)
587 cycles for signed Str$(xmm0)
529 cycles for unsigned Str$(12345)
1106 cycles for wsprintf 12345
104 cycles for umdtoa 12345
105 cycles for dwtoa 12345
547 cycles for unsigned Str$(xmm0)
592 cycles for signed Str$(xmm0)
532 cycles for unsigned Str$(12345)
1105 cycles for wsprintf 12345
106 cycles for umdtoa 12345
106 cycles for dwtoa 12345
JJ, did you upload the right file? Doesnt look like it.
Quote from: Rockoon on August 18, 2010, 05:34:19 PM
JJ, did you upload the right file? Doesnt look like it.
It will assemble fine if you either set useMB = 0 or install MasmBasic (http://www.masm32.com/board/index.php?topic=12460.msg95703#msg95703) (the version of today ::) - I hesitate to advertise it because it's not thoroughly tested*),
and install Ray's lib posted by Hutch above.
* Test if Str$() returns correct results:
include \masm32\MasmBasic\MasmBasic.inc
Init
xor ebx, ebx
xor edi, edi ; error count
.Repeat
.if !bx
Print Str$("ebx=%i\n", ebx) ; do some animation, the whole proggie takes ages...
.endif
.if ebx!=Val(Str$(ebx))
deb 2, "We got a problem", eax, ebx ; didn't see this one, though ;-)
inc edi
.endif
dec ebx
.Until Zero?
deb 2, "All done, thanks", ebx, edi
Exit
end start
QuoteI am getting a 404 page not found error on the link on your page.
Fixed it. The source code had the file name in uppercase letters while the zip file had its name mostly in lower case letters. :red
Seems to work OK now. Thanks for reporting it. I'll try to be a bit more careful in the future.
if that's the worst thing you do this week, Raymond, your ok in my book :U
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
1119 cycles for wsprintf 12345
115 cycles for umdtoa 12345
107 cycles for dwtoa 12345
1148 cycles for wsprintf 12345
106 cycles for umdtoa 12345
105 cycles for dwtoa 12345
1133 cycles for wsprintf 12345
108 cycles for umdtoa 12345
104 cycles for dwtoa 12345
1174 cycles for wsprintf 12345
105 cycles for umdtoa 12345
104 cycles for dwtoa 12345
Hi,
PIII, dwtoa_vs_Str$ results.
Regards,
Steve N.
pre-P4 (SSE1)
12345
12345
579 cycles for wsprintf 12345
82 cycles for umdtoa 12345
96 cycles for dwtoa 12345
579 cycles for wsprintf 12345
82 cycles for umdtoa 12345
96 cycles for dwtoa 12345
--- ok ---
How about a modification of this method:
http://www.codeproject.com/Tips/103025/Converting-numbers-to-the-word-equivalent.aspx
http://www.powerbasic.com/support/pbforums/showpost.php?p=106821&postcount=30
I have modified Ray's unsigned DWORD version so that it can be used from both the bufffer address or the return value. I have not timed it but I have tested it over the full unsigned DWORD range and it is producing the correct results for the whole range.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
utoa32 proc valu:DWORD,dst:DWORD
mov eax, [esp+4] ; valu
mov ecx, [esp+8] ; dst ; buffer address
push ebx
push esi
push edi
mov esi, 0CCCCCCCDh ; "magic number" multiplier for division by 10
mov ebx, 10
sub ecx, 1
@@:
add ecx, 1 ; increment to next digit in buffer
mul esi ; multiply by magic number
shrd eax, edx, 3 ; binary fractional "remainder" back in EAX
shr edx, 3 ; EDX = quotient
add eax, 1 ; precaution against occasional "underflow"
mov edi, edx ; save current quotient for next division
mul ebx ; x10 gets "decimal" remainder into EDX
add dl, 48 ; convert remainder to ascii
mov [ecx], dl ; insert it into buffer
mov eax, edi ; retrieve current quotient
test edi, edi ; test if done
jnz @B ; continue if not done
mov BYTE PTR [ecx+1], 0 ; terminate string
mov eax, [esp+20] ; get 1st byte address in destination
REPEAT 5 ; reverse the string
movzx edx, BYTE PTR [eax]
movzx ebx, BYTE PTR [ecx]
mov [eax], bl
mov [ecx], dl
add eax, 1
sub ecx, 1
cmp eax, ecx
jge @F
ENDM
@@:
pop edi
pop esi
pop ebx
mov eax, [esp+8] ; return the buffer address
ret 8
utoa32 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
and..
http://www.masm32.com/board/index.php?topic=3051.0
Thanks Paul, it looks like an interesting algo using the character table.
Just for fun:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
1.6 Ghz: 200 cycles == 8 Mio loops per second
210 cycles for signed Str$(12345)
210 cycles for unsigned Str$(12345)
468 cycles for wsprintf 12345
69 cycles for umdtoa 12345
65 cycles for dwtoa 12345
207 cycles for signed Str$(12345)
210 cycles for unsigned Str$(12345)
453 cycles for wsprintf 12345
69 cycles for umdtoa 12345
65 cycles for dwtoa 12345
Some fun with Str$():
On FPU, ST(6): 6000.00
On FPU, ST(1): 1000.00
PI in ST(0): 3.141592653589793238
Qword: 1234567890123456789
Real4: 12345.67773437500000
Real8: 1.234567889999999925e+37
as xmm: 1.234567889999999925e+37
Real10: -1.234567890123456789e-111
ecx=-1, uns.: 4294967295
ecx=-1, signed: -1
ecx=-1, cx: 65535
ecx=-1, cl: 255
Simple arithmetics (eax=100, xmm0=200, ecx=1000):
-eax*xmm0/cx+100=80.00000
You need good arguments to get me back to invoke dwtoa, reg32, addr buffer ;-)
Print "Some fun with Str$():", CrLf$
push 1000*7
REPEAT 7
fild dword ptr [esp]
sub dword ptr [esp], 1000
ENDM
fldpi
Print Str$("On FPU, ST(6):\t%6f\n", ST(6)) ; prints but 6+7 are trashed afterwards
Print Str$("On FPU, ST(1):\t%5f\n", ST(1))
Print Str$("PI in ST(0):\t%Jf\n", ST(0))
Print Str$("Qword: \t%i\n", MyQword)
Print Str$("Real4: \t%Jf\n", MyR4)
Print Str$("Real8: \t%Jf\n", MyR8)
movq xmm1, MyR8
Print Str$("as xmm: \t%Jf\n", f:xmm1)
Print Str$("Real10: \t%Jf\n", MyR10)
mov ecx, -1
Print Str$("ecx=-1, uns.:\t%u\n", ecx)
Print Str$("ecx=-1, signed:\t%i\n", ecx)
Print Str$("ecx=-1, cx:\t%i\n", cx)
Print Str$("ecx=-1, cl:\t%i\n", cl)
Print "Simple arithmetics (eax=100, xmm0=200, ecx=1000):", CrLf$
mov eax, 200
movd xmm0, eax
mov eax, 100
mov ecx, 1000
Print Str$("-eax*xmm0/cx+100=%f\n", -eax*xmm0/cx+100)
pop eax
well - speed is what they are going for :P
however, not very often do you need to convert thousands of dwords to ascii in a big hurry - lol
taking that into consideration, the routine should be small, flexible and, of course, reliable
The main factor in using Ray's algo is it delivers the correct results over the whole DWORD range. There may be minor speedups but its reasonably fast as it is allowing that I put back the string reverse so that the result began at the beginning of the allocated buffer.
I need to have a look at Paul Dixon's table based algo as it has the potential to be faster again, main problem is I will have to have a good play with it to comprehend how its done.
Hutch,
it'll not take long to verify if it works on all possible input values, signed and unsigned. If it does work then there's no need to understand why!
Paul.
Hutch,
here's an attempt to explain the code, I don't know if it'll help you or confuse you even more:
Step 1. The sign (if needed)
There are 2 entry points, sdword: will calculate the signed result and udword will calculate the unsigned result.
Calling sdword sorts out the sign and then falls into udword as the routine is the same for both once the sign is sorted.
Step 2. Multiply the number by 2^45 \ 10000.
Imagine if the number was multiplied by 2^32. That would result in the number being transferred from eax to edx, just like a 32 bit shift.
To multiply by 2^32 \ 10000 will calculate the number\10000 to edx.
2^45 = 2^32 x 2^ 13 so to multiply the number by 2^45 \ 10000 will calculate the number\10000 shifted 13 bits left to edx
The 2^32 is hopefully clear.
The \10000 is splitting the 10 digit (max) number into 2 parts, the high 6 digits and the low 4 digits. \10000 is leaving 4 digits off.
The extra 2^13 is needed bacause the accuracy of the multiply causes rounding errors and those errors are reduced to < half a bit (i.e.no error) because of the extra 13 bits of accuracy.
step 3. Shift edx right 13 bits
This corrects for the 13 bit extra shift explained above and leaves just the top 6 digits in edx.
If this results in zero then the calculation of the first 6 digits can be skipped completely to save time.
Step 4. Calculate low 4 digits
We have the top 6 digits so multiply by 10000 will add 4 zeros to the end and subtract from the original number to give the lower 4 digits.
At this point we have the original number split into 2 parts, the top 6 digits in edx and the bottom 4 digits in edi.
We repeat a very similar process on each of those sets of digits to break them down into pairs of digits to look up in the table.
Step 5. Extract digits
Multiply a 6 digit number by 2^32\10000 + 1 and edx will receive the top 2 digits. The plus 1 is a rounding correction to make sure the result always rounds the correct way.
Subsequent multiplies by 100 will shift the next 2 digits into edx.
Each pair of digits is checked to see if it's no digits (00) 1 digit (0d) or 2 digits (dd)
No digits is ignored if it's at the start of a number. 1 digit is treated as a special case the first time it's found to account for an odd number of digits in the number as the leading zero needs to be supressed.
Once a digit is found, subsequent digits don't need to be checked so a faster "ZeroSupressed" code does the same thing but passes all digits to the result as they're all relevant once the first digit is found.
e.g if the number is 000100 then the first pair of zeroes is ignored. The next pair of digits (01) is treated as a special case and checked to see if it's 1 digit or 2 and treated accordingly. In this case it's 1 digit (the 1). Now that a valid digit is found, we must pass all following digits to the result as they're all relevant. The next 2 digits 00 are not supressed like the first pair or we'd get a result of 1 instead of 100.
There are 2 blocks of code which are very similar. The first block (labels next1, next2 etc.) searches for the first digit and as soon as the first significant digit is found it jumps to the corresponding position in the second block (label's zs1, zs2 etc.) where the leading zero supression checks are no longer needed so can be coded more efficiently.
When it comes to processing the lower 4 digits we multiply by 2^32\100 +1. The plus 1 is again the correction to ensure rounding goes the right way. This puts the first 2 digits in edx, as before and a subsequent multiply of eax by 100 gives the next 2 digits.
Since the numbers are extracted in order from sign, most significant character to Least signficant character there is no awkward reversing of the strings.
3 divisions are needed but they're all integrated with required multiplictaions so they take no aditional time.
Worst case there are only 7 multplies and no divides so the routine is quite fast.
Paul.
I don't know if this one is of interest but it's a QWORD to ASCII version based on the same code. It has entry points for signed and unsigned DWORD and QWORD to ASCII.
It's a PowerBasic program so you'll need to extract the routine and put it in your favourite test program to check it out.
Paul.
Paul,
I am tied up for the next day or so due to the election here in Australia but I will come back as soon as I can as this will be useful stuff.
Quote from: hutch-- on August 20, 2010, 06:37:46 PM
I am tied up for the next day or so due to the election here in Australia
We're all rooting for you Hutch :bg
Hutch,
you have elections in Australia? I thought we just sent down a Governor to keep you in line.
Paul.
:bg
Paul,
I think we lost that lurk in a round of budget cutbacks in 1901. Still, we get the Royal family for free and it keeps a polititian out of the job of President.
i thought they hashed it out at the watering hole with a nice cold foster's - lol
Hi,
Did the elections go as you wanted?
Regards,
Steve N.
:bg
Nah,
Its a fiasco, looks like a hung parliament, neither side has won so far and the balance of power will be held by a collection of independents including 1 or 2 greens. The Greens have won the balance of power in the senate which was predictable.
using the value 12345 as a test value is likely to result with a slower-than-optimal selection
i tried 7FFFFFFFh with dwtoa and it took 3 times as long
if you all optimize your routines for 12345, then select a routine based on those times..... :red
Quote from: dedndave on August 23, 2010, 02:26:48 PM
i tried 7FFFFFFFh with dwtoa and it took 3 times as long
Strange. For Str$() it's about 20% more, probably because the resulting string is longer; but 3 times as long?? ::)
well - that was my result on a P4 - not representative of modern CPU's
but - that's not the point, really
the point is that you guys have been using a crappy method of selecting a routine
i just didn't want to say it that way :lol
i suggest writing the testbed to take times from several values and find the typical
the same testbed should probably be used for optimizing
JJ - Z sends a K :bg
This post is mainly for Paul Dixon as it is a modified version of his conversion algo that Ian_B modified.
I put it into a test piece after doing minor mods so it took normal stack arguments and a remote bufffer address and did the exhaustive range test for unsigned DWORD values from 0 to 4 gig. The algo works on the 1st 3 gig correctly but fails in the last gig with a non numeric character in the result. I have tested it against the conversion in MSVCRT which runs the test by itself over the full range with no errors.
.................................................
3150499999 1st error
3150499999 MSVCRT
31505Φ9998 algo error result
Press any key to continue ...
This is the test piece.
IF 0 ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
utoa_ex PROTO :DWORD,:DWORD
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
main proc
LOCAL pbuf :DWORD
LOCAL buffer[32]:BYTE
push ebx
push esi
push edi
xor edi, edi
xor esi, esi ; counter
mov esi, 3100500000
stlp:
mov pbuf, ptr$(buffer)
invoke utoa_ex,esi,pbuf ; call the procedure
fn szCmp,ustr$(esi),pbuf ; compare its results to MSVCRT
test eax, eax ; if identical string continue loop
jnz @F
print chr$(13,10) ; else display error and exit
print ustr$(esi)," 1st error",13,10
print ustr$(esi)," MSVCRT",13,10
print pbuf," algo error result",13,10
ret
@@:
add edi, 1
cmp edi, 1000000
jb nxt
print "."
xor edi, edi
nxt:
add esi, 1
cmp esi, -1
jne stlp
pop edi
pop esi
pop ebx
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
utoa_ex proc value:DWORD,buffer:DWORD
comment * ------------------------------------------------------------
Convert unsigned DWORD to ASCII string (no sign) by Paul Dixon
with specific references/optimisations added by IanB.
On entry:
value = value to convert
buffer = address to write result
On exit:
EAX = address of result buffer
buffer contains the zero terminated result
Modified to pass arguments on the stack and to use an
external buffer for the result - hutch
------------------------------------------------------------ *
mov eax, [esp+4]
push ebx
push edi
push eax ; save absolute value of number
; call it 1234567890
mov ecx, 2814749768
mov edi, [esp+8+12] ; buffer
mul ecx ; fast div by 100000
mov ecx, 100000 ; prepare to multiply up the top 5 digits
shr edx, 16 ; shift top 5 digits into place, EDX=12345
mov eax, edx ; copy to eax to multiply up to real size again
mov ebx, edx ; save a copy of top 5 digits
mul ecx ; EAX=1234500000
sub [esp], eax ; sub from original number leaves 0000067890
mov eax, ebx ; get back high digits
mov ecx, 429497 ; about to do div 10000 by reciprocal multiply
mov ebx, 10
or eax, eax ; if top 5 digit = 0 then skip that part
jz SkipTop5
mul ecx ; div top 5 digits by 10000
jc digit1 ; if digit is not zero then process it
mul ebx ; else multiply by 10 to get next digit into EDX
jc digit2
mul ebx
jc digit3
mul ebx
jc digit4
mul ebx
jc digit5
SkipTop5:
pop eax ; retrieve lower 5 digits
mul ecx ; div 10000
jc digit6
mul ebx ; multiply by 10 to get next digit in EDX
jc digit7
mul ebx
jc digit8
mul ebx
jc digit9
mul ebx
jmp digit10
digit1:
add edx, 30h ; top digit is left in EDX, convert to ascii
mov [edi], dl ; store top digit
mul ebx ; multiply by 10 to get next digit into EDX
add edi, 1
digit2:
add edx, 30h
mov [edi], dl
mul ebx
add edi, 1
digit3:
add edx, 30h
mov [edi], dl
mul ebx
add edi, 1
digit4:
add edx, 30h
mov [edi], dl
mul ebx
add edi, 1
digit5:
add edx, 30h
mov [edi], dl
pop eax ; retrieve lower 5 digits
mul ecx ; div 10000
add edi, 1
digit6:
add edx, 30h
mov [edi], dl
mul ebx
add edi, 1
digit7:
add edx, 30h
mov [edi], dl
mul ebx
add edi, 1
digit8:
add edx, 30h
mov [edi], dl
mul ebx
add edi, 1
digit9:
add edx, 30h
mov [edi], dl
mul ebx
add edi, 1
digit10:
add edx, 30h
lea eax, [esp+8+12] ; return buffer address
mov [edi], dx ; last digit, store DX not DL to give a
; zero termination for the result string
pop edi
pop ebx
ret 8
utoa_ex endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
This is Paul's original algorithm converted from PowerBASIC to MASM and this one is correct across the full signed range. The test piece is only doing the negative range but the algo is sound and produces the correct results.
IF 0 ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
ltoa_ex PROTO :DWORD,:DWORD
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
main proc
LOCAL pbuf :DWORD
LOCAL buffer[64]:BYTE
push ebx
push esi
push edi
mov pbuf, ptr$(buffer)
mov esi, -1
shr esi, 1
print ustr$(esi),13,10
xor ebx, ebx
@@:
mov edi, ustr$(esi) ; MSVCRT
invoke ltoa_ex,esi,pbuf ; algo
invoke szCmp,pbuf,edi ; compare strings
test eax, eax
jnz forward
print ustr$(esi),13,10 ; show error on count and exit
ret
forward:
add ebx, 1
cmp ebx, 10000000
jl nxt
print "."
xor ebx, ebx
nxt:
sub esi, 1
cmp esi, 0 ; full negative signed range
jne @B
pop edi
pop esi
pop ebx
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
ltoa_ex proc LongVar:DWORD,answer:DWORD
; --------------------------------------------------------------------------------
; this algorithm was written by Paul Dixon and has been converted to MASM notation
; --------------------------------------------------------------------------------
push esi
push edi
mov eax, LongVar ; get number
mov ecx, answer ; get pointer to answer string
jmp over
chartab:
dd "00","10","20","30","40","50","60","70","80","90"
dd "01","11","21","31","41","51","61","71","81","91"
dd "02","12","22","32","42","52","62","72","82","92"
dd "03","13","23","33","43","53","63","73","83","93"
dd "04","14","24","34","44","54","64","74","84","94"
dd "05","15","25","35","45","55","65","75","85","95"
dd "06","16","26","36","46","56","66","76","86","96"
dd "07","17","27","37","47","57","67","77","87","97"
dd "08","18","28","38","48","58","68","78","88","98"
dd "09","19","29","39","49","59","69","79","89","99"
over:
; on entry eax=number to convert, ecx=pointer to answer buffer (minimum 12 bytes)
; on exit, eax,ecx,edx are undefined, all other registers are preserved.
; answer is in location pointed to by ecx on entry
signed:
; do a signed DWORD to ASCII
or eax,eax ; test sign
jns udword ; if +ve, continue as for unsigned
neg eax ; else, make number positive
mov byte ptr [ecx],"-" ; include the - sign
inc ecx ; update the pointer
udword:
; unsigned DWORD to ASCII
mov esi,ecx ; get pointer to answer
mov edi,eax ; save a copy of the number
mov edx, 0D1B71759h ; =2^45\10000 13 bit extra shift
mul edx ; gives 6 high digits in edx
mov eax, 068DB9h ; =2^32\10000+1
shr edx,13 ; correct for multiplier offset used to give better accuracy
jz short skiphighdigits ; if zero then don't need to process the top 6 digits
mov ecx,edx ; get a copy of high digits
imul ecx,10000 ; scale up high digits
sub edi,ecx ; subtract high digits from original. EDI now = lower 4 digits
mul edx ; get first 2 digits in edx
mov ecx,100 ; load ready for later
jnc short next1 ; if zero, supress them by ignoring
cmp edx,9 ; 1 digit or 2?
ja short ZeroSupressed ; 2 digits, just continue with pairs of digits to the end
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dh ; but only write the 1 we need, supress the leading zero
inc esi ; update pointer by 1
jmp short ZS1 ; continue with pairs of digits to the end
next1:
mul ecx ; get next 2 digits
jnc short next2 ; if zero, supress them by ignoring
cmp edx,9 ; 1 digit or 2?
ja short ZS1a ; 2 digits, just continue with pairs of digits to the end
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dh ; but only write the 1 we need, supress the leading zero
inc esi ; update pointer by 1
jmp short ZS2 ; continue with pairs of digits to the end
next2:
mul ecx ; get next 2 digits
jnc short next3 ; if zero, supress them by ignoring
cmp edx,9 ; 1 digit or 2?
ja short ZS2a ; 2 digits, just continue with pairs of digits to the end
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dh ; but only write the 1 we need, supress the leading zero
inc esi ; update pointer by 1
jmp short ZS3 ; continue with pairs of digits to the end
next3:
skiphighdigits:
mov eax,edi ; get lower 4 digits
mov ecx,100
mov edx,28F5C29h ; 2^32\100 +1
mul edx
jnc short next4 ; if zero, supress them by ignoring
cmp edx,9 ; 1 digit or 2?
ja short ZS3a ; 2 digits, just continue with pairs of digits to the end
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dh ; but only write the 1 we need, supress the leading zero
inc esi ; update pointer by 1
jmp short ZS4 ; continue with pairs of digits to the end
next4:
mul ecx ; this is the last pair so don; t supress a single zero
cmp edx,9 ; 1 digit or 2?
ja short ZS4a ; 2 digits, just continue with pairs of digits to the end
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dh ; but only write the 1 we need, supress the leading zero
mov byte ptr [esi+1],0 ; zero terminate string
jmp short xit ; all done
ZeroSupressed:
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dx
add esi,2 ; write them to answer
ZS1:
mul ecx ; get next 2 digits
ZS1a:
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dx ; write them to answer
add esi,2
ZS2:
mul ecx ; get next 2 digits
ZS2a:
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dx ; write them to answer
add esi,2
ZS3:
mov eax,edi ; get lower 4 digits
mov edx,28F5C29h ; 2^32\100 +1
mul edx ; edx= top pair
ZS3a:
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dx ; write to answer
add esi,2 ; update pointer
ZS4:
mul ecx ; get final 2 digits
ZS4a:
mov edx,chartab[edx*4] ; look them up
mov [esi],dx ; write to answer
mov byte ptr [esi+2],0 ; zero terminate string
xit:
sdwordend:
pop edi
pop esi
ret
ltoa_ex endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
This is the same algo from Paul Dixon but with the stack frame removed, table aligned and algo aligned, picked up about 10% improvement but against the old dwtoa() its about 3 times faster.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
ltoa_ex proc LongVar:DWORD,answer:DWORD
; --------------------------------------------------------------------------------
; this algorithm was written by Paul Dixon and has been converted to MASM notation
; --------------------------------------------------------------------------------
push esi
push edi
mov eax, [esp+4+8] ; LongVar ; get number
mov ecx, [esp+8+8] ; answer ; get pointer to answer string
jmp over
align 16
chartab:
dd "00","10","20","30","40","50","60","70","80","90"
dd "01","11","21","31","41","51","61","71","81","91"
dd "02","12","22","32","42","52","62","72","82","92"
dd "03","13","23","33","43","53","63","73","83","93"
dd "04","14","24","34","44","54","64","74","84","94"
dd "05","15","25","35","45","55","65","75","85","95"
dd "06","16","26","36","46","56","66","76","86","96"
dd "07","17","27","37","47","57","67","77","87","97"
dd "08","18","28","38","48","58","68","78","88","98"
dd "09","19","29","39","49","59","69","79","89","99"
over:
; on entry eax=number to convert, ecx=pointer to answer buffer (minimum 12 bytes)
; on exit, eax,ecx,edx are undefined, all other registers are preserved.
; answer is in location pointed to by ecx on entry
signed:
; do a signed DWORD to ASCII
or eax,eax ; test sign
jns udword ; if +ve, continue as for unsigned
neg eax ; else, make number positive
mov byte ptr [ecx],"-" ; include the - sign
add ecx, 1 ; update the pointer
udword:
; unsigned DWORD to ASCII
mov esi,ecx ; get pointer to answer
mov edi,eax ; save a copy of the number
mov edx, 0D1B71759h ; =2^45\10000 13 bit extra shift
mul edx ; gives 6 high digits in edx
mov eax, 068DB9h ; =2^32\10000+1
shr edx,13 ; correct for multiplier offset used to give better accuracy
jz skiphighdigits ; if zero then don't need to process the top 6 digits
mov ecx,edx ; get a copy of high digits
imul ecx,10000 ; scale up high digits
sub edi,ecx ; subtract high digits from original. EDI now = lower 4 digits
mul edx ; get first 2 digits in edx
mov ecx,100 ; load ready for later
jnc next1 ; if zero, supress them by ignoring
cmp edx,9 ; 1 digit or 2?
ja ZeroSupressed ; 2 digits, just continue with pairs of digits to the end
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dh ; but only write the 1 we need, supress the leading zero
add esi, 1
jmp ZS1 ; continue with pairs of digits to the end
next1:
mul ecx ; get next 2 digits
jnc next2 ; if zero, supress them by ignoring
cmp edx,9 ; 1 digit or 2?
ja ZS1a ; 2 digits, just continue with pairs of digits to the end
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dh ; but only write the 1 we need, supress the leading zero
add esi, 1
jmp ZS2 ; continue with pairs of digits to the end
next2:
mul ecx ; get next 2 digits
jnc short next3 ; if zero, supress them by ignoring
cmp edx,9 ; 1 digit or 2?
ja ZS2a ; 2 digits, just continue with pairs of digits to the end
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dh ; but only write the 1 we need, supress the leading zero
add esi, 1
jmp ZS3 ; continue with pairs of digits to the end
next3:
skiphighdigits:
mov eax,edi ; get lower 4 digits
mov ecx,100
mov edx,28F5C29h ; 2^32\100 +1
mul edx
jnc next4 ; if zero, supress them by ignoring
cmp edx,9 ; 1 digit or 2?
ja ZS3a ; 2 digits, just continue with pairs of digits to the end
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dh ; but only write the 1 we need, supress the leading zero
add esi, 1
jmp ZS4 ; continue with pairs of digits to the end
next4:
mul ecx ; this is the last pair so don; t supress a single zero
cmp edx,9 ; 1 digit or 2?
ja ZS4a ; 2 digits, just continue with pairs of digits to the end
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dh ; but only write the 1 we need, supress the leading zero
mov byte ptr [esi+1],0 ; zero terminate string
jmp xit ; all done
ZeroSupressed:
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dx
add esi,2 ; write them to answer
ZS1:
mul ecx ; get next 2 digits
ZS1a:
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dx ; write them to answer
add esi,2
ZS2:
mul ecx ; get next 2 digits
ZS2a:
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dx ; write them to answer
add esi,2
ZS3:
mov eax,edi ; get lower 4 digits
mov edx,28F5C29h ; 2^32\100 +1
mul edx ; edx= top pair
ZS3a:
mov edx,chartab[edx*4] ; look up 2 digits
mov [esi],dx ; write to answer
add esi,2 ; update pointer
ZS4:
mul ecx ; get final 2 digits
ZS4a:
mov edx,chartab[edx*4] ; look them up
mov [esi],dx ; write to answer
mov byte ptr [esi+2],0 ; zero terminate string
xit:
sdwordend:
pop edi
pop esi
ret 8
ltoa_ex endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
This is the same algo from Paul Dixon but with 2 times smaller table and better time... :lol.
.data
align 2
chartabL dw "00","10","20","30","40","50","60","70","80","90"
dw "01","11","21","31","41","51","61","71","81","91"
dw "02","12","22","32","42","52","62","72","82","92"
dw "03","13","23","33","43","53","63","73","83","93"
dw "04","14","24","34","44","54","64","74","84","94"
dw "05","15","25","35","45","55","65","75","85","95"
dw "06","16","26","36","46","56","66","76","86","96"
dw "07","17","27","37","47","57","67","77","87","97"
dw "08","18","28","38","48","58","68","78","88","98"
dw "09","19","29","39","49","59","69","79","89","99"
.code
OPTION PROLOGUE:None
OPTION EPILOGUE:None
align 16
ltoa_exLingo proc LongVar:DWORD,answer:DWORD
mov eax, [esp+1*4] ; eax->number
mov edx, 0D1B71759h
mov ecx, [esp+2*4] ; ecx-> lpResult
test eax, eax
jns @f
mov byte ptr [ecx], "-"
inc ecx
neg eax
@@:
mov [esp+1*4], edi
mov edi, eax
mov [esp+2*4], esi
mul edx
shr edx, 13
mov eax, 68DB9h
je LoNext3
imul esi, edx, 10000
sub edi, esi
mul edx
mov esi, 100
jnc LoNext1
cmp edx, 9
jc Lo0
movzx edx, word ptr chartabL[edx+edx]
add ecx, 8
mov [ecx-8], edx
Lo1:
mul esi
Lo1a:
movzx edx, word ptr chartabL[edx+edx]
mov [ecx-6], edx
Lo2:
mul esi
Lo2a:
movzx edx, word ptr chartabL[edx+edx]
mov [ecx-4], edx
Lo3:
mov eax, 28F5C29h
mul edi
Lo3a:
movzx edx, word ptr chartabL[edx+edx]
mov [ecx-2], edx
Lo4:
mul esi
Lo4a:
movzx edx, word ptr chartabL[edx+edx]
pop eax
lea eax, [ecx+2]
pop edi
mov [ecx], edx
pop esi
jmp dword ptr [esp-3*4]
Lo0:
add ecx, 7
add edx, 30h
mov [ecx-7], edx
jne Lo1
LoNext1:
mul esi
jnc @f
add ecx, 6
cmp edx, 9
ja Lo1a
add edx, 30h
sub ecx, 1
mov [ecx-5], edx
jnz Lo2
@@:
mul esi
jnc LoNext3
add ecx, 4
cmp edx, 9
ja Lo2a
add edx, 30h
sub ecx, 1
mov [ecx-3], edx
jnz Lo3
LoNext3:
mov eax, 28F5C29h
mul edi
mov esi, 100
jnc @f
add ecx, 2
cmp edx, 9
ja Lo3a
add edx, 30h
sub ecx, 1
mov [ecx-1], edx
jnz Lo4
@@:
mul esi
cmp edx, 9
ja Lo4a
add edx, 30h
pop eax
lea eax, [ecx+1]
pop edi
mov [ecx], edx
pop esi
jmp dword ptr [esp-3*4]
ltoa_exLingo endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
and results:
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz (SSE4)
10 cycles for ltoa_exLingo
12 cycles for ltoa_exHutch
13 cycles for ltoa_ex
10 cycles for ltoa_exLingo
12 cycles for ltoa_exHutch
14 cycles for ltoa_ex
10 cycles for ltoa_exLingo
12 cycles for ltoa_exHutch
13 cycles for ltoa_ex
10 cycles for ltoa_exLingo
12 cycles for ltoa_exHutch
13 cycles for ltoa_ex
--- ok ---
Congrats, Lingo, that's roughly 45% faster (http://www.masm32.com/board/index.php?topic=14626.msg119221#msg119221) :U
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
30 cycles for ltoa_exLingo
31 cycles for ltoa_exHutch
31 cycles for ltoa_ex
Quote from: lingo on August 25, 2010, 09:11:04 PM
This is the same algo from Paul Dixon but with 2 times smaller table and better time... :lol.
Wow, GrandLamer lingo theif and "optimize" algos of other members? Impossible! :bdg
Alex
these guys with their LUT's take all the fun out of writing math functions - lol
(just kidding, Paul :bg )
but, seriously, it seems like everything boils down to a collection of huge tables
and the fastest routine is the one that addresses the table faster - lol
maybe what we want is a routine that handles signed OR unsigned and left-justifies
gotta make some kind of challenge out of it :P
oh - and.....
don't tell me we switched from a single test of 12345 to a single test of -1 :red
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
25 cycles for ltoa_exLingo
30 cycles for ltoa_exHutch
30 cycles for ltoa_ex
25 cycles for ltoa_exLingo
29 cycles for ltoa_exHutch
30 cycles for ltoa_ex
25 cycles for ltoa_exLingo
30 cycles for ltoa_exHutch
30 cycles for ltoa_ex
25 cycles for ltoa_exLingo
30 cycles for ltoa_exHutch
30 cycles for ltoa_ex
"Congrats, Lingo, that's roughly 45% faster"
For me it is enough so far, because P.Dixon is not so stupid like you and tubeteikin.
So, for tubeteikin I understand because it not easy to learn something from the aboriginal people
with archaic computers and living in the asian post communist ignorance and stupidity with 80% radical Muslims, etc...,
but you live in one of the most civilized country.... :lol
I have my doubts over the validity of the test piece in this context, an algorithm of this type if it has a place over the shorter versions for ordinary usage is for streaming output of different numbers over the signed range and to test the algo over this range you need number that rannge from single characters in gradient up to 10 characters.
When I tested and timed the second version of Paul's algo I ran it for about 6 seconds over a wider numeric range, short test may deliver cute numbers that game the test piece but the test needs to match the usage.
I agree, so feel free to change the test as you want. I just have no time for more...Sorry. :wink
maybe these tables will help
as you can see, the longest strings occur more often
although, it depends on the application to some degree
if you only evaluate longer integers, a routine that is fast on short ones gets slighted
frequency of occurance
32 bit integers converted to decimal strings
unsigned (number of characters = number of digits)
1 digit 10
2 digits 90
3 digits 900
4 digits 9000
5 digits 90000
6 digits 900000
7 digits 9000000
8 digits 90000000
9 digits 900000000
10 digits 3294967296
signed positive (number of characters = number of digits)
1 digit 10
2 digits 90
3 digits 900
4 digits 9000
5 digits 90000
6 digits 900000
7 digits 9000000
8 digits 90000000
9 digits 900000000
10 digits 1147483648
signed negative (number of characters = number of digits plus 1)
1 digit 9
2 digits 90
3 digits 900
4 digits 9000
5 digits 90000
6 digits 900000
7 digits 9000000
8 digits 90000000
9 digits 900000000
10 digits 1147483649
Dave,
Quote
Posted by: dedndave
...
but, seriously, it seems like everything boils down to a collection of huge tables
and the fastest routine is the one that addresses the table faster - lol
That is what I jokingly was referring to at the beginning of this thread:
http://www.masm32.com/board/index.php?topic=14642.msg118665#msg118665
Dave.