I figure that no one would this on anything lower that a P-3, so what part of this can I get rid of?
I think it's the first 4 lines.
monitor_init:
IFDEF PProPII
pushfd
pushad
mov ecx,3
getcpuidtime:
cpuid
For the Pentium 4, Xeon, and Pentium M the maximum input value for the basic information is 2, like it is for the PPro, P2, etc
Quote from: MichaelW on May 28, 2010, 01:26:26 AM
For the Pentium 4, Xeon, and Pentium M the maximum input value for the basic information is 2, like it is for the PPro, P2, etc
You lost me.
the output of CPUID is controlled by the input value in EAX (and sometimes ECX also)
the results are returned in EAX, EBX, ECX, and EDX
if you execute CPUID with a value of 0 in EAX, the value returned in EAX is the maximum standard leaf number
so - if it returns a value of 2, it means the highest supported standard leaf is 2
if you execute CPUID with a value of 80000000h in EAX, the value returned in EAX is the maximum extended leaf number
http://www.intel.com/assets/pdf/appnote/241618.pdf
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25481.pdf
be sure and have a bottle of aspirin handy when reading these documents
btw - all pentiums support CPUID to some degree or another
even a few of the later 486's provide limited support
The trouble is that you have posted a code fragment that itself is incomplete (IFDEF without an ENDIF) so we are all guessing...
Quote from: sinsi on May 28, 2010, 06:45:35 AM
The trouble is that you have posted a code fragment that itself is incomplete (IFDEF without an ENDIF) so we are all guessing...
Sorry, here is the whole thing.
IFDEF PProPII
pushfd
pushad
mov ecx,3
getcpuidtime:
cpuid
;rdtsc Not supported with Tasm 3.1 or 4.1, so has to be done "manually"
db 0fh,031h
mov [cycle],eax
cpuid
;rdtsc
db 0fh,031h
sub eax,[cycle]
mov [cpuid_cycle],eax
dec ecx
jnz getcpuidtime
popad
popf
ENDIF
i understand that you may not be interested in the values returned by CPUID
however, in order for the instruction to consume a consistent amount of time, it may be a good idea to set EAX to 0 each time
CPUID is not very fast at all (something like 80+ cycles, depending on which leaf is called)
if you look at MichaelW's macros, you will see that he measures CPUID and subtracts that time from the total
the CPUID instruction destroys the contents of EAX, EBX, ECX, and EDX - need to push/pop ECX across the timing function
also, RDTSC returns a 64-bit value in EDX:EAX - good idea to use the whole value
IFDEF PProPII
pushfd
pushad
mov ecx,3
getcpuidtime:
push ecx
xor eax,eax
cpuid
;rdtsc Not supported with Tasm 3.1 or 4.1, so has to be done "manually"
db 0fh,031h
mov [cycle],eax
mov [cycle_hi],edx
xor eax,eax
cpuid
;rdtsc
db 0fh,031h
sub eax,[cycle]
sbb edx,[cycle_hi]
mov [cpuid_cycle],eax
mov [cpuid_cycle_hi],edx
pop ecx
dec ecx
jnz getcpuidtime
popad
popf
ENDIF
Quote from: dedndave
also, RDTSC returns a 64-bit value in EDX:EAX - good idea to use the whole value
Or at least understand the magnitude of the number being measured/timed. 32-bit would be quite sufficient to time something approaching a second (@4GHz). Where it becomes particularly important is understanding if the number overflowed over the period of the test. However, I do think that using instruction timing over a few hundred microseconds is useless. For it to be remotely useful the measurement needs to be done in DOS (the real thing, not a virtual box), with interrupts disabled, on a single CPU.
There is a propensity here for timing over excessively long periods and averaging, on code that is highly artificial.
Quote from: clive on May 28, 2010, 04:39:19 PM
There is a propensity here for timing over excessively long periods and averaging, on code that is highly artificial.
We also produce a great quantity of Organically Grown Code
TM here. But you are right that we need to keep an eye on time slices and interrupts (http://www.masm32.com/board/index.php?topic=11454.msg90807#msg90807), cache line splits (http://www.masm32.com/board/index.php?topic=11454.msg87781#msg87781) and other nasty factors affecting the accuracy of timings. For the P4, there are the cyct_* macros here (http://www.masm32.com/board/index.php?topic=11454.msg87781#msg87781), whch extend MichaelW's counter_* macros in such a way that outliers are being eliminated before calculating the average. They need lower cycle counts and are a lot more reliable on the P4.
Yes, certainly a lot of good code is get grown here, no doubt about that.
Things like paging, page sizes, TLB caching, memory width (DIMM pairing), memory timing (ie DIMM's EEPROM settings), cached/non-cached, write-combined, go into the hard to control category.
Actually I'd probably toss comparing the same two strings 10 million times into the artificial category. A figure of much more merit would be obtain by comparing 10 million unique strings, but it would obviously be more difficult to build and distribute a data set.
To expand on the RDTSC a little more, it can act as a very good timebase. It has a very fine granularity, but you'd have to calibrate it against an accurate reference frequency to determine exactly how fast it is ticking. You have to watch that it's coming from a single CPU, and that the CPU isn't modulating or throttling it.
Quote
Or at least understand the magnitude of the number being measured/timed. 32-bit would be quite sufficient to time something approaching a second (@4GHz). Where it becomes particularly important is understanding if the number overflowed over the period of the test. However, I do think that using instruction timing over a few hundred microseconds is useless. For it to be remotely useful the measurement needs to be done in DOS (the real thing, not a virtual box), with interrupts disabled, on a single CPU.
There is a propensity here for timing over excessively long periods and averaging, on code that is highly artificial.
Quote
You are right.
I tested it on a boot floppy and got more believable results.
My code just won't work well when run under cmd.