News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Don't need P2 stuff

Started by Magnum, May 28, 2010, 01:08:35 AM

Previous topic - Next topic

Magnum

I figure that no one would this on anything lower that a P-3, so what part of this can I get rid of?
I think it's the first 4 lines.


monitor_init:

  IFDEF PProPII
          pushfd
          pushad
 
          mov     ecx,3
  getcpuidtime:
          cpuid

Have a great day,
                         Andy

MichaelW

For the Pentium 4, Xeon, and Pentium M the maximum input value for the basic information is 2, like it is for the PPro, P2, etc
eschew obfuscation

Magnum

Quote from: MichaelW on May 28, 2010, 01:26:26 AM
For the Pentium 4, Xeon, and Pentium M the maximum input value for the basic information is 2, like it is for the PPro, P2, etc

You lost me.

Have a great day,
                         Andy

dedndave

the output of CPUID is controlled by the input value in EAX (and sometimes ECX also)
the results are returned in EAX, EBX, ECX, and EDX
if you execute CPUID with a value of 0 in EAX, the value returned in EAX is the maximum standard leaf number
so - if it returns a value of 2, it means the highest supported standard leaf is 2
if you execute CPUID with a value of 80000000h in EAX, the value returned in EAX is the maximum extended leaf number

http://www.intel.com/assets/pdf/appnote/241618.pdf
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25481.pdf

be sure and have a bottle of aspirin handy when reading these documents

btw - all pentiums support CPUID to some degree or another
even a few of the later 486's provide limited support

sinsi

The trouble is that you have posted a code fragment that itself is incomplete (IFDEF without an ENDIF) so we are all guessing...
Light travels faster than sound, that's why some people seem bright until you hear them.

Magnum

Quote from: sinsi on May 28, 2010, 06:45:35 AM
The trouble is that you have posted a code fragment that itself is incomplete (IFDEF without an ENDIF) so we are all guessing...

Sorry, here is the whole thing.


IFDEF PProPII
          pushfd
          pushad
 
          mov     ecx,3
  getcpuidtime:
          cpuid

          ;rdtsc Not supported with Tasm 3.1 or 4.1, so has to be done "manually"
          db      0fh,031h

          mov     [cycle],eax
          cpuid

          ;rdtsc
          db      0fh,031h

          sub     eax,[cycle]
          mov     [cpuid_cycle],eax
          dec     ecx
          jnz     getcpuidtime
 
          popad
          popf
  ENDIF

Have a great day,
                         Andy

dedndave

i understand that you may not be interested in the values returned by CPUID
however, in order for the instruction to consume a consistent amount of time, it may be a good idea to set EAX to 0 each time
CPUID is not very fast at all (something like 80+ cycles, depending on which leaf is called)
if you look at MichaelW's macros, you will see that he measures CPUID and subtracts that time from the total

the CPUID instruction destroys the contents of EAX, EBX, ECX, and EDX - need to push/pop ECX across the timing function

also, RDTSC returns a 64-bit value in EDX:EAX - good idea to use the whole value
IFDEF PProPII
          pushfd
          pushad

          mov     ecx,3
  getcpuidtime:
          push    ecx
          xor     eax,eax
          cpuid

          ;rdtsc Not supported with Tasm 3.1 or 4.1, so has to be done "manually"
          db      0fh,031h

          mov     [cycle],eax
          mov     [cycle_hi],edx
          xor     eax,eax
          cpuid

          ;rdtsc
          db      0fh,031h

          sub     eax,[cycle]
          sbb     edx,[cycle_hi]
          mov     [cpuid_cycle],eax
          mov     [cpuid_cycle_hi],edx
          pop     ecx
          dec     ecx
          jnz     getcpuidtime

          popad
          popf
  ENDIF

clive

Quote from: dedndave
also, RDTSC returns a 64-bit value in EDX:EAX - good idea to use the whole value

Or at least understand the magnitude of the number being measured/timed. 32-bit would be quite sufficient to time something approaching a second (@4GHz). Where it becomes particularly important is understanding if the number overflowed over the period of the test. However, I do think that using instruction timing over a few hundred microseconds is useless. For it to be remotely useful the measurement needs to be done in DOS (the real thing, not a virtual box), with interrupts disabled, on a single CPU.

There is a propensity here for timing over excessively long periods and averaging, on code that is highly artificial.
It could be a random act of randomness. Those happen a lot as well.

jj2007

Quote from: clive on May 28, 2010, 04:39:19 PM
There is a propensity here for timing over excessively long periods and averaging, on code that is highly artificial.
We also produce a great quantity of Organically Grown CodeTM here. But you are right that we need to keep an eye on time slices and interrupts, cache line splits and other nasty factors affecting the accuracy of timings. For the P4, there are the cyct_* macros here, whch extend MichaelW's counter_* macros in such a way that outliers are being eliminated before calculating the average. They need lower cycle counts and are a lot more reliable on the P4.

clive

Yes, certainly a lot of good code is get grown here, no doubt about that.

Things like paging, page sizes, TLB caching, memory width (DIMM pairing), memory timing (ie DIMM's EEPROM settings), cached/non-cached, write-combined, go into the hard to control category.

Actually I'd probably toss comparing the same two strings 10 million times into the artificial category. A figure of much more merit would be obtain by comparing 10 million unique strings, but it would obviously be more difficult to build and distribute a data set.

To expand on the RDTSC a little more, it can act as a very good timebase. It has a very fine granularity, but you'd have to calibrate it against an accurate reference frequency to determine exactly how fast it is ticking. You have to watch that it's coming from a single CPU, and that the CPU isn't modulating or throttling it.
It could be a random act of randomness. Those happen a lot as well.

Magnum


Quote

Or at least understand the magnitude of the number being measured/timed. 32-bit would be quite sufficient to time something approaching a second (@4GHz). Where it becomes particularly important is understanding if the number overflowed over the period of the test. However, I do think that using instruction timing over a few hundred microseconds is useless. For it to be remotely useful the measurement needs to be done in DOS (the real thing, not a virtual box), with interrupts disabled, on a single CPU.

There is a propensity here for timing over excessively long periods and averaging, on code that is highly artificial.
Quote

You are right.
I tested it on a boot floppy and got more believable results.

My code just won't work well when run under cmd.



Have a great day,
                         Andy