News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

show file size in bytes thats over 4gigs

Started by ecube, April 30, 2010, 01:28:00 AM

Previous topic - Next topic

donkey

Quote from: E^cube on April 30, 2010, 04:06:13 AM
on quick testing of the optimization of xor, it actually seems to of made it slower...

You're right, I wonder where I tested it that I found it faster ?
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

ecube


AMD Athlon(tm) 64 Processor 3000+
1630 cycles for I64toDecimalASCII       18067432769859648
1636 cycles for I64toDecimalASCII Modified      18067432769859648
988 cycles for Asc64    18067432769859648
998 cycles for Asc64 Modified   18067432769859648
341 cycles for dq2ascii Modified        18067432769859648
Press any key to continue ...


didn't mean to say modified for your alg donkey, oh well

sinsi

If you don't want such a long number, have a look at http://www.masm32.com/board/index.php?topic=9585.0
With 64-bit windows you can use wsprintf with "%I64" as well.
Light travels faster than sound, that's why some people seem bright until you hear them.

jj2007

It seems Str$() is not so competitive :(
counter_begin LoopCount, HIGH_PRIORITY_CLASS
mov edi, Str$(q:My64a)
counter_end

Intel(R) Pentium(R) 4 CPU 3.40GHz
2917 cycles for I64toDecimalASCII       18067432769859648
2764 cycles for I64toDecimalASCII Modified      18067432769859648
1973 cycles for Asc64   18067432769859648
1991 cycles for Asc64 Modified  18067432769859648
765 cycles for dq2ascii Modified        18067432769859648
1015 cycles for MasmBasic Str$  18067432769859648

clive

Quote from: donkey
You're right, I wonder where I tested it that I found it faster ?

XOR reg,reg  386:2, 486:1
vs
XCHG reg,reg 386:3, 486:3

It would be heaps faster on memory, but you'd break the atomicity(sp)

It could be a random act of randomness. Those happen a lot as well.

dedndave

i was curious how my ling long kai fang BigNum routine would fair...
results on a prescott

526 clock cycles
528 clock cycles
526 clock cycles
525 clock cycles
526 clock cycles

i think Drizz's routine is under 200 clock cycles and Lingo's modified Dixon SSE/LUT routine is under 100
but, the LLKF9 routines may be used for larger integers - signed and/or unsigned
if you had other requirements elsewhere in the program, it could take care of all of em   :P

maybe i will write a special 64-bit LLKF routine someday, just to see how it compares

dioxin

Wasn't QWord to ASCII done a while ago?
Check out reply #25 in this thread: http://www.masm32.com/board/index.php?topic=3051.msg24570#msg24570

The attachment there does qword to ascii quiite quickly.

Paul.

dedndave

yes Paul - i think that is the one of yours that Lingo later optimized - which is the fastest as far as i know
i don't remember ever seeing timings for your original version, though

64-bit integer to decimal is a fun algo to work on
we will probably see it pop up again and again   :P

jj2007

I use a modified version of drizz' algo in MasmBasic. It is pretty fast, too, although it obviously suffers a bit from the overhead of an all-purpose Str$:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz
854 cycles for I64toDecimalASCII        18067432769859648
1121 cycles for I64toDecimalASCII2 mod  18067432769859648
562 cycles for Asc64    18067432769859648
688 cycles for Asc642 mod       18067432769859648
294 cycles for dq2ascii Modified        18067432769859648
484 cycles for MasmBasic Str$   18067432769859648

54 bytes for I64toDecimalASCII
69 bytes for I64toDecimalASCII2
52 bytes for Asc64
82 bytes for Asc642
284 bytes for dq2ascii
1540 bytes for Str$

dedndave

Quote... although it obviously suffers a bit from the overhead of an all-purpose Str$
what you need is a nice ling long kai fang BigNum routine (signed-unsigned mode selectable)
i know where there is one already written   :bg

dioxin

dedndave,
Quotei think that is the one of yours that Lingo later optimized - which is the fastest as far as i know
Someone made it faster? I might have to take another look!


Quotei don't remember ever seeing timings for your original version, though
It originlly ran on an Athlon XP in 100clks for a signed 19 digit quad, faster for less digits or unsigned.
The same code now runs on a Phenom II in about 60clks, again faster for less digits.

Paul.

dedndave

yah - someplace Lingo had optimized it   :P

as a side note, i read a post of yours in some other forum from long ago that had a 16-bit multiple-precision divide routine
my first one (posted earlier in this thread) was inspired by that one   :bg