News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

decimal as KiB,Mib,GiB etc.

Started by sinsi, July 29, 2008, 12:38:02 PM

Previous topic - Next topic

GregL

#60
I worked on getting Mark's code working earlier, I did get it working. Then I took what I learned from that and modified my previous code. Not sure if it's any faster, just another take on it.

Edit: This code is slower than my previous code.



[attachment deleted by admin]

jj2007

Looks ok, Greg. Re speed: crt_printf seems very slow, and can be replaced by my "dwtoa hack". Same applies to finit - a very slow instruction that can in this  case be replaced by an ffree.

jj2007

Quote from: Greg on August 07, 2008, 01:05:02 AM
      lodword:
        xor eax, eax    ;if the SRC register [edx] is 0, BSR won't correctly update the DEST register
        bsr eax, [edx]
        jz done         ;it's zero

Interesting indeed, and good to know. The zero flag is set correctly though.

Mark_Larson

Quote from: jj2007 on August 07, 2008, 11:11:05 AM
Quote from: Greg on August 07, 2008, 01:05:02 AM
      lodword:
        xor eax, eax    ;if the SRC register [edx] is 0, BSR won't correctly update the DEST register
        bsr eax, [edx]
        jz done         ;it's zero

Interesting indeed, and good to know. The zero flag is set correctly though.

  That's actually from my code :)
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

Mark_Larson

Quote from: Greg on August 07, 2008, 01:05:02 AM
I worked on getting Mark's code working earlier, I did get it working. Then I took what I learned from that and modified my previous code. Not sure if it's any faster, just another take on it.

did you time it at all?  I re-posted my new code at the end of page 4.  It came out to 20 cycles on my core 2 duo, except for the printf
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

Mark_Larson

#65
Quote from: jj2007 on August 07, 2008, 07:01:33 AM
Looks ok, Greg. Re speed: crt_printf seems very slow, and can be replaced by my "dwtoa hack". Same applies to finit - a very slow instruction that can in this  case be replaced by an ffree.

yea I picked up your ffree trick in my code.  Thanks.

The majority of the time when I print to the screen it doesn't need to be fast.  I print out debugging statistics and other stuff.  In general I don't print at all if the code needs to be fast.  Or if you have to have one, you can  try and move the PRINTFs to outside the part that needs to be fast.  So for instance I collected data statistics on the data I was looking at.  I saved that in memory, and then printed it out later.  So in my opinion you should never do printfs in time critical code.  Same with malloc() and free().  I've seen otherwise fast code bog down at constantly having to allocate a small piece of data.  Allocate one big buffer in the beginning and break off chunks as you need it.

EDIT:  Pbrennick was asking about optimiation tricks in a seperate thread.  I posted my own website with 60 optimization tricks.  THe funny thing is I specifically say use BSR to get the highest power of 2 ( tip #17 on the webpage)

http://www.mark.masmcode.com/
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

GregL

Quote from: jj2007Looks ok, Greg. Re speed: crt_printf seems very slow, and can be replaced by my "dwtoa hack". Same applies to finit - a very slow instruction that can in this  case be replaced by an ffree.

If the code was too slow for the application, then I would worry about it. Regarding finit, I think the benefits of using it far outweigh any speed considerations.

xor eax, eax    ;if the SRC register [edx] is 0, BSR won't correctly update the DEST register
Quote from: jj2007Interesting indeed, and good to know. The zero flag is set correctly though.

Yes, that was Mark's idea. I left the comment in my code as a reminder of what it was.


GregL

Quote from: Mark Larsondid you time it at all?

No, I took your word for it.  I imagine the div in the last code I posted is not helping the speed any.


GregL

Mark,

Your Assembly Optimization Tips are very good, I have a link to them in my favorites.  :U 


jj2007

Quote from: Greg on August 07, 2008, 05:59:44 PM
Regarding finit, I think the benefits of using it far outweigh any speed considerations.
finit has some advantages, such as resetting everything to standard values, e.g. precision to 80 bits. But it is sufficient to use it once at code start; and I have a suspicion that Windows does it for you when you launch the program... inside a loop, ffree st(7) is absolutely sufficient afaik.

GregL

jj2007,

Quote from: jj2007But it is sufficient to use it once at code start;
I agree, that's the way to do it.

Don't count on Windows doing it for you when you launch your program, it doesn't. In later versions of Windows, some API calls leave the FPU in 53-bit precision. Also, some API calls use MMX and don't clean up with emms.


jj2007

Quote from: Greg on August 07, 2008, 07:36:20 PM
In later versions of Windows, some API calls leave the FPU in 53-bit precision. Also, some API calls use MMX and don't clean up with emms.

Cute. Do you have any links or other references saying which ones misbehave?

Mark_Larson

Quote from: Greg on August 07, 2008, 06:02:36 PM
Quote from: Mark Larsondid you time it at all?

No, I took your word for it.  I imagine the div in the last code I posted is not helping the speed any.



correct.  My latest code will be the fastest.  I was thinking we could combine this with Michael's Macro, and add a routine that prints out the correct unit of time based on the reutrned value in clocks. 

clocks in edx:eax

use 1000.0 instead of 1024.0

assuming I have a 1GHz processor (makes it easier).  1GHz = 1000MHz (need to convert to MHz)

to get the speed in nanoseconds you take the clocks, and divide it by the PCU speed in MHz

if the clocks is 10000 divide by 1000 for the cpu speed to get nanoseconds

  10000
-----------  = 100 nanoseconds. 
  1000   

so we do the same thing.  If the value is within a certain range we use different units of time.

nanoseconds
microseconds
milliseconds
seconds
minutes
hours

what do you think?
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

GregL

jj2007,

I don't have a list of specific APIs. You won't find anything about it on MSDN either.

The MMX issue is mentioned by Raymond in SimplyFPU and I have seen it discussed elsewhere. The issue about the precision has come up in the PowerBASIC forums. PowerBASIC has an 80-bit extended precision data type (EXT) so if you are using EXT variables you would want the FPU to always use 64-bit precision. PowerBASIC has found that some Windows APIs leave the precision set to 53 bits. It's Microsoft's problem, but PowerBASIC is working on work-arounds.

Microsoft has pretty much decided 80-bit extended precision variables don't exist. They removed them from their compilers, they say for compatibility with other CPUs (PowerPC etc.).  Big mistake if you ask me. They could have kept them for x86, but they took the easy way out. Other C++ compilers support them, like Borland and Intel.


GregL

#74
Mark,

MichealW's macros will display milliseconds. For different units of time it sounds like a good idea.

QuoteMy latest code will be the fastest.

I did some timing on my code. The last code I posted here is pretty slow (but it is short and concise). The previous code I posted here does pretty well, I get about 19 cycles.