Print Page - Don't need P2 stuff

Title: Don't need P2 stuff
Post by: Magnum on May 28, 2010, 01:08:35 AM

I figure that no one would this on anything lower that a P-3, so what part of this can I get rid of?
I think it's the first 4 lines.

Code Select


monitor_init:

  IFDEF PProPII
          pushfd
          pushad
  
          mov     ecx,3
  getcpuidtime:
          cpuid

Title: Re: Don't need P2 stuff
Post by: MichaelW on May 28, 2010, 01:26:26 AM

For the Pentium 4, Xeon, and Pentium M the maximum input value for the basic information is 2, like it is for the PPro, P2, etc

Title: Re: Don't need P2 stuff
Post by: Magnum on May 28, 2010, 03:35:01 AM

Quote from: MichaelW on May 28, 2010, 01:26:26 AM
For the Pentium 4, Xeon, and Pentium M the maximum input value for the basic information is 2, like it is for the PPro, P2, etc

You lost me.

Title: Re: Don't need P2 stuff
Post by: dedndave on May 28, 2010, 06:23:39 AM

the output of CPUID is controlled by the input value in EAX (and sometimes ECX also)
the results are returned in EAX, EBX, ECX, and EDX
if you execute CPUID with a value of 0 in EAX, the value returned in EAX is the maximum standard leaf number
so - if it returns a value of 2, it means the highest supported standard leaf is 2
if you execute CPUID with a value of 80000000h in EAX, the value returned in EAX is the maximum extended leaf number

http://www.intel.com/assets/pdf/appnote/241618.pdf
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25481.pdf

be sure and have a bottle of aspirin handy when reading these documents

btw - all pentiums support CPUID to some degree or another
even a few of the later 486's provide limited support

Title: Re: Don't need P2 stuff
Post by: sinsi on May 28, 2010, 06:45:35 AM

The trouble is that you have posted a code fragment that itself is incomplete (IFDEF without an ENDIF) so we are all guessing...

Title: Re: Don't need P2 stuff
Post by: Magnum on May 28, 2010, 01:33:06 PM

Quote from: sinsi on May 28, 2010, 06:45:35 AM
The trouble is that you have posted a code fragment that itself is incomplete (IFDEF without an ENDIF) so we are all guessing...

Sorry, here is the whole thing.

Code Select


IFDEF PProPII
          pushfd
          pushad
  
          mov     ecx,3
  getcpuidtime:
          cpuid

          ;rdtsc Not supported with Tasm 3.1 or 4.1, so has to be done "manually"
          db      0fh,031h

          mov     [cycle],eax
          cpuid

          ;rdtsc
          db      0fh,031h

          sub     eax,[cycle]
          mov     [cpuid_cycle],eax
          dec     ecx
          jnz     getcpuidtime
  
          popad
          popf
  ENDIF

Title: Re: Don't need P2 stuff
Post by: dedndave on May 28, 2010, 03:37:41 PM

i understand that you may not be interested in the values returned by CPUID
however, in order for the instruction to consume a consistent amount of time, it may be a good idea to set EAX to 0 each time
CPUID is not very fast at all (something like 80+ cycles, depending on which leaf is called)
if you look at MichaelW's macros, you will see that he measures CPUID and subtracts that time from the total

the CPUID instruction destroys the contents of EAX, EBX, ECX, and EDX - need to push/pop ECX across the timing function

also, RDTSC returns a 64-bit value in EDX:EAX - good idea to use the whole value

Code Select

IFDEF PProPII
          pushfd
          pushad
 
          mov     ecx,3
  getcpuidtime:
          push    ecx
          xor     eax,eax
          cpuid

          ;rdtsc Not supported with Tasm 3.1 or 4.1, so has to be done "manually"
          db      0fh,031h

          mov     [cycle],eax
          mov     [cycle_hi],edx
          xor     eax,eax
          cpuid

          ;rdtsc
          db      0fh,031h

          sub     eax,[cycle]
          sbb     edx,[cycle_hi]
          mov     [cpuid_cycle],eax
          mov     [cpuid_cycle_hi],edx
          pop     ecx
          dec     ecx
          jnz     getcpuidtime
 
          popad
          popf
  ENDIF

Title: Re: Don't need P2 stuff
Post by: clive on May 28, 2010, 04:39:19 PM

Quote from: dedndave
also, RDTSC returns a 64-bit value in EDX:EAX - good idea to use the whole value

Or at least understand the magnitude of the number being measured/timed. 32-bit would be quite sufficient to time something approaching a second (@4GHz). Where it becomes particularly important is understanding if the number overflowed over the period of the test. However, I do think that using instruction timing over a few hundred microseconds is useless. For it to be remotely useful the measurement needs to be done in DOS (the real thing, not a virtual box), with interrupts disabled, on a single CPU.

There is a propensity here for timing over excessively long periods and averaging, on code that is highly artificial.

Title: Re: Don't need P2 stuff
Post by: jj2007 on May 28, 2010, 05:42:12 PM

Quote from: clive on May 28, 2010, 04:39:19 PM
There is a propensity here for timing over excessively long periods and averaging, on code that is highly artificial.

We also produce a great quantity of Organically Grown Code^TM here. But you are right that we need to keep an eye on time slices and interrupts (http://www.masm32.com/board/index.php?topic=11454.msg90807#msg90807), cache line splits (http://www.masm32.com/board/index.php?topic=11454.msg87781#msg87781) and other nasty factors affecting the accuracy of timings. For the P4, there are the cyct_* macros here (http://www.masm32.com/board/index.php?topic=11454.msg87781#msg87781), whch extend MichaelW's counter_* macros in such a way that outliers are being eliminated before calculating the average. They need lower cycle counts and are a lot more reliable on the P4.

Title: Re: Don't need P2 stuff
Post by: clive on May 28, 2010, 10:10:59 PM

Yes, certainly a lot of good code is get grown here, no doubt about that.

Things like paging, page sizes, TLB caching, memory width (DIMM pairing), memory timing (ie DIMM's EEPROM settings), cached/non-cached, write-combined, go into the hard to control category.

Actually I'd probably toss comparing the same two strings 10 million times into the artificial category. A figure of much more merit would be obtain by comparing 10 million unique strings, but it would obviously be more difficult to build and distribute a data set.

To expand on the RDTSC a little more, it can act as a very good timebase. It has a very fine granularity, but you'd have to calibrate it against an accurate reference frequency to determine exactly how fast it is ticking. You have to watch that it's coming from a single CPU, and that the CPU isn't modulating or throttling it.

Title: Re: Don't need P2 stuff
Post by: Magnum on May 29, 2010, 12:26:24 AM

Quote

You are right.
I tested it on a boot floppy and got more believable results.

My code just won't work well when run under cmd.

The MASM Forum Archive 2004 to 2012

Miscellaneous Forums => 16 bit DOS Programming => Topic started by: Magnum on May 28, 2010, 01:08:35 AM