show file size in bytes thats over 4gigs

donkey · April 30, 2010, 04:24:12 AM

Quote from: E^cube on April 30, 2010, 04:06:13 AM
on quick testing of the optimization of xor, it actually seems to of made it slower...

You're right, I wonder where I tested it that I found it faster ?

ecube · April 30, 2010, 04:35:03 AM

Code Select


AMD Athlon(tm) 64 Processor 3000+
1630 cycles for I64toDecimalASCII       18067432769859648
1636 cycles for I64toDecimalASCII Modified      18067432769859648
988 cycles for Asc64    18067432769859648
998 cycles for Asc64 Modified   18067432769859648
341 cycles for dq2ascii Modified        18067432769859648
Press any key to continue ...

didn't mean to say modified for your alg donkey, oh well

sinsi · April 30, 2010, 05:03:54 AM

If you don't want such a long number, have a look at http://www.masm32.com/board/index.php?topic=9585.0
With 64-bit windows you can use wsprintf with "%I64" as well.

jj2007 · April 30, 2010, 09:55:48 AM

It seems Str$() is not so competitive :(

Code Select

	counter_begin LoopCount, HIGH_PRIORITY_CLASS
		mov edi, Str$(q:My64a)
	counter_end

Code Select

Intel(R) Pentium(R) 4 CPU 3.40GHz
2917 cycles for I64toDecimalASCII       18067432769859648
2764 cycles for I64toDecimalASCII Modified      18067432769859648
1973 cycles for Asc64   18067432769859648
1991 cycles for Asc64 Modified  18067432769859648
765 cycles for dq2ascii Modified        18067432769859648
1015 cycles for MasmBasic Str$  18067432769859648

clive · April 30, 2010, 12:13:46 PM

Quote from: donkey
You're right, I wonder where I tested it that I found it faster ?

XOR reg,reg 386:2, 486:1
vs
XCHG reg,reg 386:3, 486:3

It would be heaps faster on memory, but you'd break the atomicity(sp)

dedndave · April 30, 2010, 02:39:27 PM

i was curious how my ling long kai fang BigNum routine would fair...
results on a prescott

Code Select


526 clock cycles
528 clock cycles
526 clock cycles
525 clock cycles
526 clock cycles

i think Drizz's routine is under 200 clock cycles and Lingo's modified Dixon SSE/LUT routine is under 100
but, the LLKF9 routines may be used for larger integers - signed and/or unsigned
if you had other requirements elsewhere in the program, it could take care of all of em :P

maybe i will write a special 64-bit LLKF routine someday, just to see how it compares

dioxin · April 30, 2010, 04:18:39 PM

Wasn't QWord to ASCII done a while ago?
Check out reply #25 in this thread: http://www.masm32.com/board/index.php?topic=3051.msg24570#msg24570

The attachment there does qword to ascii quiite quickly.

Paul.

dedndave · April 30, 2010, 05:37:56 PM

yes Paul - i think that is the one of yours that Lingo later optimized - which is the fastest as far as i know
i don't remember ever seeing timings for your original version, though

64-bit integer to decimal is a fun algo to work on
we will probably see it pop up again and again :P

jj2007 · April 30, 2010, 06:22:47 PM

I use a modified version of drizz' algo in MasmBasic. It is pretty fast, too, although it obviously suffers a bit from the overhead of an all-purpose Str$:

Code Select

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz
854 cycles for I64toDecimalASCII        18067432769859648
1121 cycles for I64toDecimalASCII2 mod  18067432769859648
562 cycles for Asc64    18067432769859648
688 cycles for Asc642 mod       18067432769859648
294 cycles for dq2ascii Modified        18067432769859648
484 cycles for MasmBasic Str$   18067432769859648

54 bytes for I64toDecimalASCII
69 bytes for I64toDecimalASCII2
52 bytes for Asc64
82 bytes for Asc642
284 bytes for dq2ascii
1540 bytes for Str$

dedndave · April 30, 2010, 06:30:38 PM

Quote... although it obviously suffers a bit from the overhead of an all-purpose Str$

what you need is a nice ling long kai fang BigNum routine (signed-unsigned mode selectable)
i know where there is one already written :bg

dioxin · April 30, 2010, 06:52:25 PM

dedndave,

Quotei think that is the one of yours that Lingo later optimized - which is the fastest as far as i know

Someone made it faster? I might have to take another look!

Quotei don't remember ever seeing timings for your original version, though

It originlly ran on an Athlon XP in 100clks for a signed 19 digit quad, faster for less digits or unsigned.
The same code now runs on a Phenom II in about 60clks, again faster for less digits.

Paul.

dedndave · April 30, 2010, 07:07:38 PM

yah - someplace Lingo had optimized it :P

as a side note, i read a post of yours in some other forum from long ago that had a 16-bit multiple-precision divide routine
my first one (posted earlier in this thread) was inspired by that one :bg

News:

show file size in bytes thats over 4gigs

ecube