I was playing around with getting the current system time (UTC) as fast as is reasonably possible, the format had to be in a FILETIME structure. Luckily, if you're looking for a FILETIME format you can simple avoid using any API calls at all by using a little known shared area of memory. Address 0x7FFE0000 contains an area of memory shared between kernel and user mode that conatins the current clock in 100ns intervals since January 1, 1601 (FILETIME). You can also obtain the current tick count without an API call. Here's a sample of how to get the UTC system time:
/*
<link removed at Microsoft's request>
*/
#define MAX_WOW64_SHARED_ENTRIES 16
#define PROCESSOR_FEATURE_MAX 64
#define MM_SHARED_USER_DATA_VA 0x7FFE0000
// enum NT_PRODUCT_TYPE
NtProductWinNt = 1
NtProductLanManNt = 2
NtProductServer = 3
// enum ALTERNATIVE_ARCHITECTURE_TYPE
StandardDesign = 0
NEC98x86 = 1
EndAlternatives = 2
#define MAXIMUM_XSTATE_FEATURES 64
#define XSTATE_LEGACY_FLOATING_POINT 0
#define XSTATE_LEGACY_SSE 1
#define XSTATE_GSSE 2
#define XSTATE_MASK_LEGACY_FLOATING_POINT (1 << (XSTATE_LEGACY_FLOATING_POINT))
#define XSTATE_MASK_LEGACY_SSE (1 << (XSTATE_LEGACY_SSE))
#define XSTATE_MASK_LEGACY (XSTATE_MASK_LEGACY_FLOATING_POINT | XSTATE_MASK_LEGACY_SSE)
#define XSTATE_MASK_GSSE (1 << (XSTATE_GSSE))
#define NX_SUPPORT_POLICY_ALWAYSOFF 0
#define NX_SUPPORT_POLICY_ALWAYSON 1
#define NX_SUPPORT_POLICY_OPTIN 2
#define NX_SUPPORT_POLICY_OPTOUT 3
// Processor features
#define PF_FLOATING_POINT_PRECISION_ERRATA 0
#define PF_FLOATING_POINT_EMULATED 1
#define PF_COMPARE_EXCHANGE_DOUBLE 2
#define PF_MMX_INSTRUCTIONS_AVAILABLE 3
#define PF_PPC_MOVEMEM_64BIT_OK 4
#define PF_ALPHA_BYTE_INSTRUCTIONS 5
#define PF_XMMI_INSTRUCTIONS_AVAILABLE 6
#define PF_3DNOW_INSTRUCTIONS_AVAILABLE 7
#define PF_RDTSC_INSTRUCTION_AVAILABLE 8
#define PF_PAE_ENABLED 9
#define PF_XMMI64_INSTRUCTIONS_AVAILABLE 10
#define PF_SSE_DAZ_MODE_AVAILABLE 11
#define PF_NX_ENABLED 12
#define PF_SSE3_INSTRUCTIONS_AVAILABLE 13
#define PF_COMPARE_EXCHANGE128 14
#define PF_COMPARE64_EXCHANGE128 15
#define PF_CHANNELS_ENABLED 16
#define PF_XSAVE_ENABLED 17
XSTATE_FEATURE STRUCT
Offset LONG
Size LONG
ENDS
XSTATE_CONFIGURATION STRUCT
EnabledFeatures LONG64
Size LONG
OptimizedSave LONG
Features XSTATE_FEATURE MAXIMUM_XSTATE_FEATURES DUP
ENDS
KSYSTEM_TIME STRUCT
LowPart LONG
High1Time LONG
High2Time LONG
ENDS
KUSER_SHARED_DATA STRUCT
//
// WARNING: This structure must have exactly the same layout for 32- and
// 64-bit systems. The layout of this structure cannot change and new
// fields can only be added at the end of the structure (unless a gap
// can be exploited). Deprecated fields cannot be deleted. Platform
// specific fields are included on all systems.
//
// Layout exactness is required for Wow64 support of 32-bit applications
// on Win64 systems.
//
// The layout itself cannot change since this structure has been exported
// in ntddk, ntifs.h, and nthal.h for some time.
TickCountLowDeprecated LONG
TickCountMultiplier LONG
InterruptTime KSYSTEM_TIME
SystemTime KSYSTEM_TIME
TimeZoneBias KSYSTEM_TIME
ImageNumberLow SHORT
ImageNumberHigh SHORT
NtSystemRoot SHORT 260 DUP
MaxStackTraceDepth LONG
CryptoExponent LONG
TimeZoneId LONG
LargePageMinimum LONG
Reserved2 LONG 7 DUP
NtProductType ENUM // NT_PRODUCT_TYPE
ProductTypeIsValid BOOLEAN
Padding0 CHAR 3 DUP
NtMajorVersion LONG
NtMinorVersion LONG
ProcessorFeatures BOOLEAN PROCESSOR_FEATURE_MAX DUP
Reserved1 LONG
Reserved3 LONG
TimeSlip LONG
AlternativeArchitecture ENUM // ALTERNATIVE_ARCHITECTURE_TYPE
AltArchitecturePad LONG
SystemExpirationDate LARGE_INTEGER
SuiteMask LONG
KdDebuggerEnabled BOOLEAN
NXSupportPolicy CHAR
Padding CHAR 2 DUP
ActiveConsoleId LONG
DismountCount LONG
ComPlusPackage LONG
LastSystemRITEventTickCount LONG
NumberOfPhysicalPages LONG
SafeBootMode BOOLEAN
TscQpcData CHAR
TscQpcPad CHAR 2 DUP
// > Vista only
TraceLogging LONG
; SharedDataFlags LONG
DataFlagsPad LONG
TestRetInstruction LONGLONG
SystemCall LONG
SystemCallReturn LONG
SystemCallPad LONGLONG 3 DUP
UNION
TickCount KSYSTEM_TIME
TickCountQuad LONG64
ENDUNION
// The following padding is documented in the above union
// it is added separately to bypass a bug in GoAsm - Do not change !
TickCountPad DD
Cookie LONG
CookiePad LONG
ConsoleSessionForegroundProcessId LONGLONG
Wow64SharedInformation LONG MAX_WOW64_SHARED_ENTRIES DUP
UserModeGlobalLogger SHORT 16 DUP
ImageFileExecutionOptions LONG
// Pre vista 4 bytes padding instead of LangGenerationCount
LangGenerationCount LONG
Reserved5 LONGLONG
InterruptTimeBias LONG64
TscQpcBias LONG64
ActiveProcessorCount LONG
ActiveGroupCount SHORT
Reserved4 SHORT
AitSamplingValue LONG
AppCompatFlag LONG
SystemDllNativeRelocation LONGLONG
SystemDllWowRelocation LONG
XStatePad LONG
XState XSTATE_CONFIGURATION
ENDS
DATA SECTION
ksystime KSYSTEM_TIME <>
CODE SECTION
mov eax,MM_SHARED_USER_DATA_VA
add eax,KUSER_SHARED_DATA.SystemTime
mov ecx,[eax]
mov [ksystime.LowPart], ecx
add eax,4
mov ecx,[eax]
mov [ksystime.High1Time], ecx
add eax,4
mov ecx,[eax]
mov [ksystime.High2Time], ecx
You can use ksystime directly with any function that requires the current UTC time in FILETIME format. I am not sure what High2Time is but it seems to be equivalent to High1Time on my system.
Hope someone will find this interesting, I'm not sure which OSes support it, I tested mine in 32 bit mode of Win7-x64 and it works perfectly. According to some sources the shared area of memory is available from Win2K on, some others say WinXP.
Edgar
EDIT:
The structures have been changed to a final version current to Windows 8. Offsets have been checked to be sure they match the ASSERTs in ntddk.h
Quote from: donkey on January 05, 2012, 07:19:07 AM
Address 0x7FFE0000 contains an area of memory shared between kernel and user mode that conatins the current clock in 100ns intervals since January 1, 1601 (FILETIME). You can also obtain the current tick count without an API call.
Yeah, GetTick is probably the fastest API you can find ;-)
GetTickCount BA 0000FE7F mov edx, 7FFE0000 ; INT kernel32.GetTickCount(void)
7C8092BD 8B02 mov eax, [edx]
7C8092BF F762 04 mul dword ptr [edx+4]
7C8092C2 0FACD0 18 shrd eax, edx, 18
7C8092C6 C3 retn
However, getting the raw tick count direcly you can avoid the load of the multiplier and the shift and get an even higher resolution, not sure about the granularity (probably 100ns) but it might be a useful high resolution timer for testing code execution, processor speed etc... Just a thought.
Hi Dave,
The best place to find these sort of things is to check the DKK headers, something I was doing when I got the idea for the code above.
BTW for my purposes I wanted to save the UTC FILETIME structure, if you just need to use it once you can avoid moving it into a structure. For example:
mov eax,MM_SHARED_USER_DATA_VA
add eax,KUSER_SHARED_DATA.SystemTime
invoke FileTimeToSystemTime,eax,offset systime
Quote from: donkey on January 05, 2012, 07:34:17 AMHowever, getting the raw tick count direcly you can avoid the load of the multiplier and the shift and get an even higher resolution
I made a quick test and it always yields 64 ticks/second - same for a non-FPU solution. Probably there is an error in my logic ::)
Quoteinclude \masm32\MasmBasic\MasmBasic.inc
Ticks2Stack MACRO
mov edx, 7FFE0000h
fild dword ptr [edx]
ENDM
TicksPop MACRO
mov edx, 7FFE0000h
fisub dword ptr [edx]
fchs
ENDM
Init
Ticks2Stack
invoke Sleep, 1000
TicksPop
Inkey Str$("Ticks=%i", ST(0))
Exit
end start
Hi Jochen,
Yeah, its a weird one, I just get 0 in the field. Have to play with it a bit, obviously GetTickCount is getting useful data...
I played a bit with the 7FFE0008h and 7FFE0014h dwords, they get updated much more frequently. But there is nonetheless some weird granularity involved, both for the Sleep and the .Repeat test. The latter yields
Ticks=156265 1 loops
Ticks=312530 2 loops
Ticks=312530 3 loops
Ticks=625060 4 loops
Ticks=625060 5 loops
Ticks=781325 6 loops
Ticks=781325 7 loops
... which doesn't make sense to me. I'd also like to know why commenting out the useless mov eax below takes away the granularity:
Quote.Repeat
; mov eax, 123
imul eax, eax, 123
Quoteinclude \masm32\MasmBasic\MasmBasic.inc
Ticks2Stack MACRO
mov edx, 7FFE0014h ; or 7FFE0008h
fild dword ptr [edx]
ENDM
TicksPop MACRO
mov edx, 7FFE0014h
fisub dword ptr [edx]
fchs
ENDM
Init
For_ n=0 To 20
Ticks2Stack
imul eax, n, 10 ; increase delay proportionally to n
push eax
if 0
invoke Sleep, eax
else
mov ecx, eax
.Repeat
mov edx, 1000000
.Repeat
mov eax, 123
imul eax, eax, 123
dec edx
.Until Sign?
dec ecx
.Until Sign?
endif
TicksPop
pop ecx
Print Str$("Ticks=%i\t", ST(0)), Str$("%i loops\n", n)
Next
Exit
end start
Hi Jochen,
The tick count is not useful, it is marked as deprecated in later version of the DKK, I downloaded and installed the Win7 ddk and it is now called TickCountLowDeprecated. The tick count is now derived from the TickCount union in the structure, still calculating the offset...
#define KeQueryTickCount(CurrentCount) { \
KSYSTEM_TIME volatile *_TickCount = *((PKSYSTEM_TIME *)(&KeTickCount)); \
for (;;) { \
(CurrentCount)->HighPart = _TickCount->High1Time; \
(CurrentCount)->LowPart = _TickCount->LowPart; \
if ((CurrentCount)->HighPart == _TickCount->High2Time) break; \
YieldProcessor(); \
} \
}
Mmmmm, hard to calculate it by simple adding, I'll have to translate the whole structure to GoAsm format, the tick count is now pretty deep inside it. Oh well, it will be a good addition to the header project.
I got it, Edgar. Open this page (http://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-information-the-kuser_shared_data-structure/) and search for consistent.
On my Windows 2000 system I get 100 ticks per second, and on my newly acquired XP system I get 64.
Further testing still shows odd things:
Loop 0 0.0
Loop 1 1.0000000000000000
Loop 2 2.0000000000000000
Loop 3 3.0000000000000000
Loop 4 4.0000000000000000
Loop 5 5.0000000000000000
Loop 6 6.0000000000000000
Loop 7 6.5000000000000000
Loop 8 8.0000000000000000
Loop 9 8.5000000000000000
Loop 10 10.000000000000000 ::)
And no, it's not a rounding problem... Str$() does have 18.5 digits precision, and it takes them directly from the FPU.
The second branch of the switch below (Interrupt time) reliably produces "round" figures, but with an occasional x.5; the first one (System time) sticks a while to round figures but then the OS spontaneously decides that 1.000023423423 looks better. Adjust Magic until you find round numbers again. Setting the affinity mask doesn't help.
Quoteinclude \masm32\MasmBasic\MasmBasic.inc
if 0
UsedTime = 7FFE0014h ; System time
Magic=312588 ; may differ
Magic=312562 ; according to
Magic=312558 ; the processor's mood
else
UsedTime = 7FFE0008h ; Interrupt time
Magic=312500 ; always the same value
endif
KSYSTEM_TIME STRUCT
LowPart dd ?
High1Time dd ? ; UsedTime
High2Time dd ?
KSYSTEM_TIME ENDS
Ticks2Stack MACRO
mov edx, UsedTime ; aka KUSER_SHARED_DATA.SystemTime.High1Time (http://www.nirsoft.net/kernel_struct/vista/KUSER_SHARED_DATA.html)
; @@:
; mov eax, [edx] ; fails miserably, see...
; cmp eax, [edx+4] ; Windows Research Kernel @ HPI (http://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-information-the-kuser_shared_data-structure/)
; jne @B ; ... and look for consistent
fild qword ptr [edx]
ENDM
TicksPop MACRO
mov edx, UsedTime
fild qword ptr [edx]
fsub
fchs
ENDM
Init
; invoke SetProcessAffinityMask, rv(GetCurrentProcess), 1
PrintLine "Magic=", Hex$(Magic), CrLf$
For_ n=0 To 10
Ticks2Stack
imul eax, n, 10
push eax
mov ecx, eax
.Repeat
mov edx, 1000000
.Repeat
; mov eax, 123
imul eax, eax, 123
dec edx
.Until Sign?
dec ecx
.Until Sign?
TicksPop
push Magic
fidiv dword ptr [esp] ; divide ticks by magic number
pop eax
pop ecx
Print Str$("Loop %i \t", n), Str$("%Hf\n", ST(0))
fstp st
Next
Exit
end start
C:\Users\John\Desktop>NanoTicks.exe
Magic=0004C4B4
Loop 0 0.0
Loop 1 0.32000000000000000
Loop 2 0.64000000000000000
Loop 3 0.64000000000000000
Loop 4 1.2800032000000000
Loop 5 1.2800032000000000
Loop 6 1.6000000000000000
Loop 7 1.9200032000000000
Loop 8 1.9200032000000000
Loop 9 2.5600032000000000
Loop 10 2.5600032000000000
fwiw, I also got 0 in the original, not 64
I was going to suggest an affinity thingie but saw you had put it in, is the asm in the attachment the exe? (without the commented-out SetProcessAffinityMask).
I wonder how long it takes to update every process's structure too, and if it matters which cpu does it.
Looking at YieldProcessor, it's just a macro for a pause instruction (or rep nop for older cpus), usually used in spinlocks?
So if they match it returns, if they don't does it loop again? (The "for (;;)" I am not sure about.)
Quote from: sinsi on January 05, 2012, 01:16:16 PM
I was going to suggest an affinity thingie but saw you had put it in, is the asm in the attachment the exe? (without the commented-out SetProcessAffinityMask).
Yes it is, without SetProcessAffinityMask (which doesn't change anything)
It seems your Magic value is exactly 100000.
One thing I have never understood about the timers is how the high-resolution performance frequency (returned by QueryPerformanceFrequency) can be 3579546 Hz, or for the current topic why Microsoft selected a time unit of 100ns, when the system timers have a 1193182 Hz clock, and so a minimum period of ~838ns.
Don't they use the APIC timers nowadays?
One problem with the old ISA timer was binary doesn't work well in decimal, my real mode millisecond timer actually runs at 1090 or something.
>which doesn't change anything
Yes, I got the same results either way. Makes a nice hex (binary?) progression though :bdg
this work perfectly on my machine (Win7-x64) :U
include \masm32\include\masm32rt.inc
KSYSTEM_TIME struct
LowPart ULONG ?
High1Time LONG ?
High2Time LONG ?
KSYSTEM_TIME ends
gns macro t:req
mov eax,7FFE0000h+8
.while 1
mov edx,[eax].KSYSTEM_TIME.High1Time
mov ecx,[eax].KSYSTEM_TIME.LowPart
.break .if edx == [eax].KSYSTEM_TIME.High2Time
.endw
mov DWORD ptr t,ecx
mov DWORD ptr t+4,edx
endm
.code
main proc
LOCAL t1:QWORD
LOCAL t2:QWORD
gns t1
fild t1
fmul FP4(100.0E-9)
fstp t1
xor ebx,ebx
.while ebx < 100
gns t2
fild t2
fmul FP4(100.0E-9)
fsub t1
fstp t2
print real8$(t2),13,10
invoke Sleep,250
lea ebx,[ebx+1]
.endw
inkey
exit
main endp
end main
QuoteDon't they use the APIC timers nowadays?
Even though I recall reading about the APIC years ago, I had forgotten all about it. I now have a system that was built in December 2003, and on it QueryPerformanceFrequency returns 2992580000.
Quote from: qWord on January 05, 2012, 02:17:56 PM
this work perfectly on my machine (Win7-x64) :U
You got it :U
Actually, my macros worked, too. The tricky part causing the irregular sequences (6.0, 6.5, 8.0 (http://www.masm32.com/board/index.php?topic=18104.msg152631#msg152631)) is the delay loop - probably it cuts through time slices :'(
Quote from: jj2007 on January 05, 2012, 10:29:59 AM
I got it, Edgar. Open this page (http://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-information-the-kuser_shared_data-structure/) and search for consistent.
The structure there doesn't match the one MS uses, there are a number of discrepancies. It is a bit of a complicated structure to translate, this is how far I've gotten, checks out for the offsets they identify in Hex in the structure. It is complete to Win8
however I have to check a few feilds to verify it.
Nice work qWord. works here too.
#define MAX_WOW64_SHARED_ENTRIES 16
#define PROCESSOR_FEATURE_MAX 64
#define MM_SHARED_USER_DATA_VA 0x7FFE0000
// enum NT_PRODUCT_TYPE
NtProductWinNt = 1
NtProductLanManNt = 2
NtProductServer = 3
// enum ALTERNATIVE_ARCHITECTURE_TYPE
StandardDesign = 0
NEC98x86 = 1
EndAlternatives = 2
#define MAXIMUM_XSTATE_FEATURES 64
#define XSTATE_LEGACY_FLOATING_POINT 0
#define XSTATE_LEGACY_SSE 1
#define XSTATE_GSSE 2
#define XSTATE_MASK_LEGACY_FLOATING_POINT (1 << (XSTATE_LEGACY_FLOATING_POINT))
#define XSTATE_MASK_LEGACY_SSE (1 << (XSTATE_LEGACY_SSE))
#define XSTATE_MASK_LEGACY (XSTATE_MASK_LEGACY_FLOATING_POINT | XSTATE_MASK_LEGACY_SSE)
#define XSTATE_MASK_GSSE (1 << (XSTATE_GSSE))
#define NX_SUPPORT_POLICY_ALWAYSOFF 0
#define NX_SUPPORT_POLICY_ALWAYSON 1
#define NX_SUPPORT_POLICY_OPTIN 2
#define NX_SUPPORT_POLICY_OPTOUT 3
// Processor features
#define PF_FLOATING_POINT_PRECISION_ERRATA 0
#define PF_FLOATING_POINT_EMULATED 1
#define PF_COMPARE_EXCHANGE_DOUBLE 2
#define PF_MMX_INSTRUCTIONS_AVAILABLE 3
#define PF_PPC_MOVEMEM_64BIT_OK 4
#define PF_ALPHA_BYTE_INSTRUCTIONS 5
#define PF_XMMI_INSTRUCTIONS_AVAILABLE 6
#define PF_3DNOW_INSTRUCTIONS_AVAILABLE 7
#define PF_RDTSC_INSTRUCTION_AVAILABLE 8
#define PF_PAE_ENABLED 9
#define PF_XMMI64_INSTRUCTIONS_AVAILABLE 10
#define PF_SSE_DAZ_MODE_AVAILABLE 11
#define PF_NX_ENABLED 12
#define PF_SSE3_INSTRUCTIONS_AVAILABLE 13
#define PF_COMPARE_EXCHANGE128 14
#define PF_COMPARE64_EXCHANGE128 15
#define PF_CHANNELS_ENABLED 16
#define PF_XSAVE_ENABLED 17
XSTATE_FEATURE STRUCT
Offset LONG
Size LONG
ENDS
XSTATE_CONFIGURATION STRUCT
EnabledFeatures LONG64
Size LONG
OptimizedSave LONG
Features XSTATE_FEATURE MAXIMUM_XSTATE_FEATURES DUP
ENDS
KSYSTEM_TIME STRUCT
LowPart LONG
High1Time LONG
High2Time LONG
ENDS
KUSER_SHARED_DATA STRUCT
//
// WARNING: This structure must have exactly the same layout for 32- and
// 64-bit systems. The layout of this structure cannot change and new
// fields can only be added at the end of the structure (unless a gap
// can be exploited). Deprecated fields cannot be deleted. Platform
// specific fields are included on all systems.
//
// Layout exactness is required for Wow64 support of 32-bit applications
// on Win64 systems.
//
// The layout itself cannot change since this structure has been exported
// in ntddk, ntifs.h, and nthal.h for some time.
TickCountLowDeprecated LONG
TickCountMultiplier LONG
InterruptTime KSYSTEM_TIME
SystemTime KSYSTEM_TIME
TimeZoneBias KSYSTEM_TIME
ImageNumberLow SHORT
ImageNumberHigh SHORT
NtSystemRoot SHORT 260 DUP
MaxStackTraceDepth LONG
CryptoExponent LONG
TimeZoneId LONG
LargePageMinimum LONG
Reserved2 LONG 7 DUP
NtProductType ENUM // NT_PRODUCT_TYPE
ProductTypeIsValid BOOLEAN
Padding0 CHAR 3 DUP
NtMajorVersion LONG
NtMinorVersion LONG
ProcessorFeatures BOOLEAN PROCESSOR_FEATURE_MAX DUP
Reserved1 LONG
Reserved3 LONG
TimeSlip LONG
AlternativeArchitecture ENUM // ALTERNATIVE_ARCHITECTURE_TYPE
AltArchitecturePad LONG
SystemExpirationDate LARGE_INTEGER
SuiteMask LONG
KdDebuggerEnabled BOOLEAN
NXSupportPolicy CHAR
Padding CHAR 2 DUP
ActiveConsoleId LONG
DismountCount LONG
ComPlusPackage LONG
LastSystemRITEventTickCount LONG
NumberOfPhysicalPages LONG
SafeBootMode BOOLEAN
TscQpcData CHAR
TscQpcPad CHAR 2 DUP
// > Vista only
TraceLogging LONG
; SharedDataFlags LONG
DataFlagsPad LONG
TestRetInstruction LONGLONG
SystemCall LONG
SystemCallReturn LONG
SystemCallPad LONGLONG 3 DUP
UNION
TickCount KSYSTEM_TIME
TickCountQuad LONG64
ENDUNION
// The following padding is documented in the above union
// it is added separately to bypass a bug in GoAsm - Do not change !
TickCountPad DD
Cookie LONG
CookiePad LONG
ConsoleSessionForegroundProcessId LONGLONG
Wow64SharedInformation LONG MAX_WOW64_SHARED_ENTRIES DUP
UserModeGlobalLogger SHORT 16 DUP
ImageFileExecutionOptions LONG
// Pre vista 4 bytes padding instead of LangGenerationCount
LangGenerationCount LONG
Reserved5 LONGLONG
InterruptTimeBias LONG64
TscQpcBias LONG64
ActiveProcessorCount LONG
ActiveGroupCount SHORT
Reserved4 SHORT
AitSamplingValue LONG
AppCompatFlag LONG
SystemDllNativeRelocation LONGLONG
SystemDllWowRelocation LONG
XStatePad LONG
XState XSTATE_CONFIGURATION
ENDS
EDIT:
There were a couple of offset issues with the version of the structure I had in this post previously, they have been fixed and all offsets match the ASSERTs in ntddk.h so the structure is definitely correct. I have left in the comment for this structure to ensure anyone that it will not change and is the same for x32 and x64.
Beginning with Windows Vista, SharedDataFlags has changed to TraceLogging, this is just a name change, offsets remain the same so SharedDataFlags has been commented out.
Nice for a replacement for OSVERSIONINFO, always hated the way you got the OS version and never liked the hacks I have seen using FS segment overrides:
mov ecx,MM_SHARED_USER_DATA_VA
mov eax,[ecx+KUSER_SHARED_DATA.NtMajorVersion]
mov edx,[ecx+KUSER_SHARED_DATA.NtMinorVersion]
I got some unexpected insights into Sleep and its granularity... very nice :bg
Loop 1 0.0156250000000000000
Loop 8 0.125000000000000000
Loop 32 0.500000000000000000
Loop 64 1.00000000000000000
Sleep 16 - one more? Esc=exit
Executable attached.
include \masm32\MasmBasic\MasmBasic.inc
Ticks2FPU MACRO
mov edx, 7FFE0008h ; aka KUSER_SHARED_DATA.InterruptTime.High1Time (http://www.nirsoft.net/kernel_struct/vista/KUSER_SHARED_DATA.html)
@@: mov eax, [edx+4]
cmp eax, [edx+8] ; See Windows Research Kernel @ HPI (http://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-information-the-kuser_shared_data-structure/) and look for consistent
jne @B
fild qword ptr [edx]
ENDM
OneEm9 REAL10 100.0E-9
Init
mov ecx, 16
.Repeat
Ticks2FPU
For_ n=1 To 65
push ecx
dec ecx
invoke Sleep, ecx
pop ecx
fld st
Ticks2FPU
fsub
fchs
fld OneEm9
fmul
Print Str$("Loop %i \t", n), Str$("%If\n", ST(0))
fstp st
Next
Inkey Str$("Sleep %i - one more? Esc=exit\n", ecx-1)
add ecx, ecx ; double Sleep delay
.Until eax==VK_ESCAPE
Exit
end start
EDIT: Don't trust the code posted above!
With a Sleep 1000 (i.e. mov ecx, 1001 after Init) I get:
Loop 40 40.0000000000000000
Loop 41 41.0000000000000000 << great precision until here
Loop 42 42.0019531000000000 << starts misbehaving
Loop 43 43.0039062000000000
Loop 44 44.0058594000000000
Sometimes it keeps 19-digits precision until the end (Loop 65), sometimes it starts misbehaving earlier, i.e. invoke Sleep, n produces delays slightly longer than n ms. It's on a single core CPU.
Greetings to Redmond ::)
This thing just has a lot of information, processor features are determined as follows (or appear to be)
// replacement for IsProcessorFeaturePresent function
mov ecx,MM_SHARED_USER_DATA_VA
mov edx,[ecx+KUSER_SHARED_DATA.ProcessorFeatures+PF_XXXXXXXXXX]
and edx,1
EDX will be TRUE if the feature is present, FALSE otherwise.
KUSER_SHARED_DATA.ActiveProcessorCount will give the number of threads the processor supports (not the number of cores)
There is quite a lot of stuff to play with here and the best part is it is pretty much guaranteed never to be altered or deprecated, only expanded so it can be used without much worry.
correct me if i'm wrong, here - lol
but, i would think the API code looks much the same as your code
Quote from: dedndave on January 05, 2012, 07:19:36 PM
correct me if i'm wrong, here - lol
but, i would think the API code looks much the same as your code
Probably but this has no PUSH/CALL/RET overhead so you can save a few cycles (not to mention cache hits) and it also does very well at obfusacting code if you don't appreciate people disassembling your program. Any time I can replace an API reliably is a good thing in my book, one more tool in the box.
For example the KUSER_SHARED_DATA.ActiveProcessorCount can pretty much replace this whole DLL and program (http://www.codeproject.com/KB/system/countingprocessors.aspx) with a fairly reliable method using only 2 opcodes. Well, the Logical Processors part anyway.
Warning - see update of #22!
Quote from: jj2007 on January 05, 2012, 06:16:44 PMSometimes it keeps 19-digits precision until the end (Loop 65), sometimes it starts misbehaving earlier, i.e. invoke Sleep, n produces delays slightly longer than n ms. It's on a single core CPU.
This is unsurprisingly - Windows is not a real-time OS.
You may get better result by restarting a waitable timer each iteration (Create/SetWaitableTimer) (http://msdn.microsoft.com/en-us/library/windows/desktop/ms686289(v=vs.85).aspx)
I've kind of given up on it as a timer at least for now, the reason I got into the structure was that I'm translating the DDK for GoAsm and ran across it. There are plenty of cycle saving tidbits in there to keep me interested though. According to the article Jochen linked (http://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-information-the-kuser_shared_data-structure/) in an earlier post the interrupt timer might be one to try for timing functions.
Quote from: The article Jochen postedThe interrupt time is the only Windows clock that guarantees to be monotonous that is, its value only increases over timer. Its value represents the time in units of 100 ns since the system was booted. The interrupt time is the base clock for all timers in Windows (see my recent article A Bug in Windows Timer Management). It is updated every clock interrupt.
The following is a dump of the InterruptTime field with Sleep,100 between each...
Line 228: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1366442026 (0x5172402A)
Line 230: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1367534028 (0x5182E9CC)
Line 232: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1368626030 (0x5193936E)
Line 234: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1369718032 (0x51A43D10)
Line 236: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1370810034 (0x51B4E6B2)
Line 238: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1371902036 (0x51C59054)
Line 240: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1372994038 (0x51D639F6)
Line 242: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1374086040 (0x51E6E398)
Line 244: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1375178042 (0x51F78D3A)
Line 246: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1376270044 (0x520836DC)
Line 248: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1377362046 (0x5218E07E)
Line 250: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1378454048 (0x52298A20)
Haven't checked them all but about 1,010,000 ticks for 100 ms, seems pretty consistent.
Quote from: donkey on January 06, 2012, 01:49:20 AM
The following is a dump of the InterruptTime field with Sleep,100 between each...
...
Haven't checked them all but about 1,010,000 ticks for 100 ms, seems pretty consistent.
That's the whole point, Edgar - it
looks consistent, with 19-digits precision. Until loop 42, when Sleep decides to relax a little bit, around 0.2% :bg
The timer macro is ok, and it's the interrupt one, so the culprit is obviously good ol' Sleep.
Sleep isn't intended to be a timing device, strictly speaking
because it does not return until the beginning of the next time slice
not really sure how long a time slice is under windows - different versions of windows, different CPU's, etc :P
but it could account for small variations
Hi folks,
Here (http://www.masm32.com/board/index.php?topic=12460.msg153367#msg153367)is a new NanoTimer() macro based on this thread. Thanks to all but especially to Edgar for having discovered this marvelous Windows feature :U