The MASM Forum Archive 2004 to 2012

Project Support Forums => GoAsm Assembler and Tools => Topic started by: donkey on January 05, 2012, 07:19:07 AM

Title: Fast file time
Post by: donkey on January 05, 2012, 07:19:07 AM
I was playing around with getting the current system time (UTC) as fast as is reasonably possible, the format had to be in a FILETIME structure. Luckily, if you're looking for a FILETIME format you can simple avoid using any API calls at all by using a little known shared area of memory. Address 0x7FFE0000 contains an area of memory shared between kernel and user mode that conatins the current clock in 100ns intervals since January 1, 1601 (FILETIME). You can also obtain the current tick count without an API call. Here's a sample of how to get the UTC system time:


/*
<link removed at Microsoft's request>
*/

#define MAX_WOW64_SHARED_ENTRIES 16
#define PROCESSOR_FEATURE_MAX 64
#define MM_SHARED_USER_DATA_VA 0x7FFE0000

// enum NT_PRODUCT_TYPE
NtProductWinNt = 1
NtProductLanManNt = 2
NtProductServer = 3

// enum ALTERNATIVE_ARCHITECTURE_TYPE
StandardDesign = 0
NEC98x86 = 1
EndAlternatives = 2

#define MAXIMUM_XSTATE_FEATURES 64
#define XSTATE_LEGACY_FLOATING_POINT        0
#define XSTATE_LEGACY_SSE                   1
#define XSTATE_GSSE                         2

#define XSTATE_MASK_LEGACY_FLOATING_POINT   (1 << (XSTATE_LEGACY_FLOATING_POINT))
#define XSTATE_MASK_LEGACY_SSE              (1 << (XSTATE_LEGACY_SSE))
#define XSTATE_MASK_LEGACY                  (XSTATE_MASK_LEGACY_FLOATING_POINT | XSTATE_MASK_LEGACY_SSE)
#define XSTATE_MASK_GSSE                    (1 << (XSTATE_GSSE))

#define NX_SUPPORT_POLICY_ALWAYSOFF 0
#define NX_SUPPORT_POLICY_ALWAYSON 1
#define NX_SUPPORT_POLICY_OPTIN 2
#define NX_SUPPORT_POLICY_OPTOUT 3

// Processor features
#define PF_FLOATING_POINT_PRECISION_ERRATA  0   
#define PF_FLOATING_POINT_EMULATED          1   
#define PF_COMPARE_EXCHANGE_DOUBLE          2   
#define PF_MMX_INSTRUCTIONS_AVAILABLE       3   
#define PF_PPC_MOVEMEM_64BIT_OK             4   
#define PF_ALPHA_BYTE_INSTRUCTIONS          5   
#define PF_XMMI_INSTRUCTIONS_AVAILABLE      6   
#define PF_3DNOW_INSTRUCTIONS_AVAILABLE     7   
#define PF_RDTSC_INSTRUCTION_AVAILABLE      8   
#define PF_PAE_ENABLED                      9   
#define PF_XMMI64_INSTRUCTIONS_AVAILABLE   10   
#define PF_SSE_DAZ_MODE_AVAILABLE          11   
#define PF_NX_ENABLED                      12   
#define PF_SSE3_INSTRUCTIONS_AVAILABLE     13   
#define PF_COMPARE_EXCHANGE128             14   
#define PF_COMPARE64_EXCHANGE128           15   
#define PF_CHANNELS_ENABLED                16   
#define PF_XSAVE_ENABLED                   17

XSTATE_FEATURE STRUCT
Offset LONG
Size LONG
ENDS

XSTATE_CONFIGURATION  STRUCT
EnabledFeatures LONG64
Size LONG
OptimizedSave LONG
Features XSTATE_FEATURE MAXIMUM_XSTATE_FEATURES DUP
ENDS

KSYSTEM_TIME STRUCT
LowPart LONG
High1Time LONG
High2Time LONG
ENDS

KUSER_SHARED_DATA STRUCT
//
// WARNING: This structure must have exactly the same layout for 32- and
//    64-bit systems. The layout of this structure cannot change and new
//    fields can only be added at the end of the structure (unless a gap
//    can be exploited). Deprecated fields cannot be deleted. Platform
//    specific fields are included on all systems.
//
//    Layout exactness is required for Wow64 support of 32-bit applications
//    on Win64 systems.
//
//    The layout itself cannot change since this structure has been exported
//    in ntddk, ntifs.h, and nthal.h for some time.

TickCountLowDeprecated LONG
TickCountMultiplier LONG
InterruptTime KSYSTEM_TIME
SystemTime KSYSTEM_TIME
TimeZoneBias KSYSTEM_TIME
ImageNumberLow SHORT
ImageNumberHigh SHORT
NtSystemRoot SHORT 260 DUP
MaxStackTraceDepth LONG
CryptoExponent LONG
TimeZoneId LONG
LargePageMinimum LONG
Reserved2  LONG 7 DUP
NtProductType ENUM // NT_PRODUCT_TYPE
ProductTypeIsValid BOOLEAN
Padding0 CHAR 3 DUP
NtMajorVersion LONG
NtMinorVersion LONG
ProcessorFeatures BOOLEAN PROCESSOR_FEATURE_MAX DUP
Reserved1 LONG
Reserved3 LONG
TimeSlip LONG
AlternativeArchitecture ENUM // ALTERNATIVE_ARCHITECTURE_TYPE
AltArchitecturePad LONG
SystemExpirationDate LARGE_INTEGER
SuiteMask LONG
KdDebuggerEnabled BOOLEAN
NXSupportPolicy CHAR
Padding CHAR 2 DUP
ActiveConsoleId LONG
DismountCount LONG
ComPlusPackage LONG
LastSystemRITEventTickCount LONG
NumberOfPhysicalPages LONG
SafeBootMode BOOLEAN

TscQpcData CHAR
TscQpcPad CHAR 2 DUP

// > Vista only
TraceLogging LONG

; SharedDataFlags LONG
DataFlagsPad LONG

TestRetInstruction LONGLONG
SystemCall LONG
SystemCallReturn LONG
SystemCallPad LONGLONG 3 DUP

UNION
TickCount KSYSTEM_TIME
TickCountQuad LONG64
ENDUNION

// The following padding is documented in the above union
// it is added separately to bypass a bug in GoAsm - Do not change !
TickCountPad DD

Cookie LONG
CookiePad LONG
ConsoleSessionForegroundProcessId LONGLONG
Wow64SharedInformation LONG MAX_WOW64_SHARED_ENTRIES DUP
UserModeGlobalLogger SHORT 16 DUP
ImageFileExecutionOptions LONG

// Pre vista 4 bytes padding instead of LangGenerationCount
LangGenerationCount LONG
Reserved5 LONGLONG
InterruptTimeBias LONG64
TscQpcBias LONG64
ActiveProcessorCount LONG
ActiveGroupCount SHORT
Reserved4 SHORT
AitSamplingValue LONG
AppCompatFlag LONG
SystemDllNativeRelocation LONGLONG
SystemDllWowRelocation LONG
XStatePad LONG
XState XSTATE_CONFIGURATION
ENDS

DATA SECTION
ksystime KSYSTEM_TIME <>
CODE SECTION
mov eax,MM_SHARED_USER_DATA_VA
add eax,KUSER_SHARED_DATA.SystemTime

mov ecx,[eax]
mov [ksystime.LowPart], ecx
add eax,4
mov ecx,[eax]
mov [ksystime.High1Time], ecx
add eax,4
mov ecx,[eax]
mov [ksystime.High2Time], ecx


You can use ksystime directly with any function that requires the current UTC time in FILETIME format. I am not sure what High2Time is but it seems to be equivalent to High1Time on my system.

Hope someone will find this interesting, I'm not sure which OSes support it, I tested mine in 32 bit mode of Win7-x64 and it works perfectly. According to some sources the shared area of memory is available from Win2K on, some others say WinXP.

Edgar

EDIT:

The structures have been changed to a final version current to Windows 8. Offsets have been checked to be sure they match the ASSERTs in ntddk.h
Title: Re: Fast file time
Post by: jj2007 on January 05, 2012, 07:29:31 AM
Quote from: donkey on January 05, 2012, 07:19:07 AM
Address 0x7FFE0000 contains an area of memory shared between kernel and user mode that conatins the current clock in 100ns intervals since January 1, 1601 (FILETIME). You can also obtain the current tick count without an API call.

Yeah, GetTick is probably the fastest API you can find ;-)

GetTickCount  BA 0000FE7F      mov edx, 7FFE0000        ; INT kernel32.GetTickCount(void)
7C8092BD      8B02             mov eax, [edx]
7C8092BF      F762 04          mul dword ptr [edx+4]
7C8092C2      0FACD0 18        shrd eax, edx, 18
7C8092C6      C3               retn
Title: Re: Fast file time
Post by: donkey on January 05, 2012, 07:34:17 AM
However, getting the raw tick count direcly you can avoid the load of the multiplier and the shift and get an even higher resolution, not sure about the granularity (probably 100ns) but it might be a useful high resolution timer for testing code execution, processor speed etc... Just a thought.
Title: Re: Fast file time
Post by: donkey on January 05, 2012, 07:53:01 AM
Hi Dave,

The best place to find these sort of things is to check the DKK headers, something I was doing when I got the idea for the code above.

BTW for my purposes I wanted to save the UTC FILETIME structure, if you just need to use it once you can avoid moving it into a structure. For example:

mov eax,MM_SHARED_USER_DATA_VA
add eax,KUSER_SHARED_DATA.SystemTime
invoke FileTimeToSystemTime,eax,offset systime
Title: Re: Fast file time
Post by: jj2007 on January 05, 2012, 08:04:09 AM
Quote from: donkey on January 05, 2012, 07:34:17 AMHowever, getting the raw tick count direcly you can avoid the load of the multiplier and the shift and get an even higher resolution

I made a quick test and it always yields 64 ticks/second - same for a non-FPU solution. Probably there is an error in my logic ::)

Quoteinclude \masm32\MasmBasic\MasmBasic.inc
Ticks2Stack MACRO
   mov edx, 7FFE0000h
   fild dword ptr [edx]
ENDM

TicksPop MACRO
   mov edx, 7FFE0000h
   fisub dword ptr [edx]
   fchs
ENDM

   Init
   Ticks2Stack
   invoke Sleep, 1000
   TicksPop
   Inkey Str$("Ticks=%i", ST(0))
   Exit
end start
Title: Re: Fast file time
Post by: donkey on January 05, 2012, 08:41:25 AM
Hi Jochen,

Yeah, its a weird one, I just get 0 in the field. Have to play with it a bit, obviously GetTickCount is getting useful data...
Title: Re: Fast file time
Post by: jj2007 on January 05, 2012, 09:39:56 AM
I played a bit with the 7FFE0008h and 7FFE0014h dwords, they get updated much more frequently. But there is nonetheless some weird granularity involved, both for the Sleep and the .Repeat test. The latter yields
Ticks=156265    1 loops
Ticks=312530    2 loops
Ticks=312530    3 loops
Ticks=625060    4 loops
Ticks=625060    5 loops
Ticks=781325    6 loops
Ticks=781325    7 loops

... which doesn't make sense to me. I'd also like to know why commenting out the useless mov eax below takes away the granularity:

Quote.Repeat
;              mov eax, 123
              imul eax, eax, 123

Quoteinclude \masm32\MasmBasic\MasmBasic.inc
Ticks2Stack MACRO
   mov edx, 7FFE0014h   ; or 7FFE0008h
   fild dword ptr [edx]
ENDM

TicksPop MACRO
   mov edx, 7FFE0014h
   fisub dword ptr [edx]
   fchs
ENDM

   Init

   For_ n=0 To 20
      Ticks2Stack
      imul eax, n, 10  ; increase delay proportionally to n
      push eax
      if 0
         invoke Sleep, eax
      else
         mov ecx, eax
         .Repeat
            mov edx, 1000000
            .Repeat
              mov eax, 123
              imul eax, eax, 123
              dec edx
            .Until Sign?
            dec ecx
         .Until Sign?
      endif
      TicksPop
      pop ecx
      Print Str$("Ticks=%i\t", ST(0)), Str$("%i loops\n", n)
   Next
   Exit
end start
Title: Re: Fast file time
Post by: donkey on January 05, 2012, 09:47:36 AM
Hi Jochen,

The tick count is not useful, it is marked as deprecated in later version of the DKK, I downloaded and installed the Win7 ddk and it is now called TickCountLowDeprecated. The tick count is now derived from the TickCount union in the structure, still calculating the offset...

#define KeQueryTickCount(CurrentCount) { \
    KSYSTEM_TIME volatile *_TickCount = *((PKSYSTEM_TIME *)(&KeTickCount)); \
    for (;;) {                                                              \
        (CurrentCount)->HighPart = _TickCount->High1Time;                   \
        (CurrentCount)->LowPart = _TickCount->LowPart;                      \
        if ((CurrentCount)->HighPart == _TickCount->High2Time) break;       \
        YieldProcessor();                                                   \
    }                                                                       \
}
Title: Re: Fast file time
Post by: donkey on January 05, 2012, 09:54:02 AM
Mmmmm, hard to calculate it by simple adding, I'll have to translate the whole structure to GoAsm format, the tick count is now pretty deep inside it. Oh well, it will be a good addition to the header project.
Title: Re: Fast file time
Post by: jj2007 on January 05, 2012, 10:29:59 AM
I got it, Edgar. Open this page (http://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-information-the-kuser_shared_data-structure/) and search for consistent.
Title: Re: Fast file time
Post by: MichaelW on January 05, 2012, 10:33:41 AM
On my Windows 2000 system I get 100 ticks per second, and on my newly acquired XP system I get 64.
Title: Re: Fast file time
Post by: jj2007 on January 05, 2012, 12:35:56 PM
Further testing still shows odd things:
Loop 0          0.0
Loop 1          1.0000000000000000
Loop 2          2.0000000000000000
Loop 3          3.0000000000000000
Loop 4          4.0000000000000000
Loop 5          5.0000000000000000
Loop 6          6.0000000000000000
Loop 7          6.5000000000000000
Loop 8          8.0000000000000000
Loop 9          8.5000000000000000
Loop 10         10.000000000000000

::)
And no, it's not a rounding problem... Str$() does have 18.5 digits precision, and it takes them directly from the FPU.
The second branch of the switch below (Interrupt time) reliably produces "round" figures, but with an occasional x.5; the first one (System time) sticks a while to round figures but then the OS spontaneously decides that 1.000023423423 looks better. Adjust Magic until you find round numbers again. Setting the affinity mask doesn't help.

Quoteinclude \masm32\MasmBasic\MasmBasic.inc
if 0
   UsedTime = 7FFE0014h   ; System time
   Magic=312588   ; may differ
   Magic=312562   ; according to
   Magic=312558   ; the processor's mood
else
   UsedTime = 7FFE0008h   ; Interrupt time
   Magic=312500   ; always the same value
endif

KSYSTEM_TIME STRUCT
LowPart   dd ?
High1Time   dd ?      ; UsedTime
High2Time   dd ?
KSYSTEM_TIME ENDS

Ticks2Stack MACRO
  mov edx, UsedTime   ; aka KUSER_SHARED_DATA.SystemTime.High1Time (http://www.nirsoft.net/kernel_struct/vista/KUSER_SHARED_DATA.html)
; @@:
; mov eax, [edx]      ; fails miserably, see...
; cmp eax, [edx+4]      ; Windows Research Kernel @ HPI (http://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-information-the-kuser_shared_data-structure/)
; jne @B      ; ... and look for consistent
  fild qword ptr [edx]
ENDM

TicksPop MACRO
   mov edx, UsedTime
   fild qword ptr [edx]
   fsub
   fchs
ENDM

   Init
   ; invoke SetProcessAffinityMask, rv(GetCurrentProcess), 1
   PrintLine "Magic=", Hex$(Magic), CrLf$
   For_ n=0 To 10
      Ticks2Stack
      imul eax, n, 10
      push eax
      mov ecx, eax
      .Repeat
         mov edx, 1000000
         .Repeat
;           mov eax, 123
           imul eax, eax, 123
           dec edx
         .Until Sign?
         dec ecx
      .Until Sign?
      TicksPop
      push Magic
      fidiv dword ptr [esp]   ; divide ticks by magic number
      pop eax
      pop ecx
      Print Str$("Loop %i  \t", n), Str$("%Hf\n", ST(0))
      fstp st
   Next
   Exit
end start
Title: Re: Fast file time
Post by: sinsi on January 05, 2012, 01:16:16 PM

C:\Users\John\Desktop>NanoTicks.exe
Magic=0004C4B4

Loop 0          0.0
Loop 1          0.32000000000000000
Loop 2          0.64000000000000000
Loop 3          0.64000000000000000
Loop 4          1.2800032000000000
Loop 5          1.2800032000000000
Loop 6          1.6000000000000000
Loop 7          1.9200032000000000
Loop 8          1.9200032000000000
Loop 9          2.5600032000000000
Loop 10         2.5600032000000000

fwiw, I also got 0 in the original, not 64

I was going to suggest an affinity thingie but saw you had put it in, is the asm in the attachment the exe? (without the commented-out SetProcessAffinityMask).
I wonder how long it takes to update every process's structure too, and if it matters which cpu does it.

Looking at YieldProcessor, it's just a macro for a pause instruction (or rep nop for older cpus), usually used in spinlocks?
So if they match it returns, if they don't does it loop again? (The "for (;;)" I am not sure about.)
Title: Re: Fast file time
Post by: jj2007 on January 05, 2012, 01:24:58 PM
Quote from: sinsi on January 05, 2012, 01:16:16 PM
I was going to suggest an affinity thingie but saw you had put it in, is the asm in the attachment the exe? (without the commented-out SetProcessAffinityMask).

Yes it is, without SetProcessAffinityMask (which doesn't change anything)
It seems your Magic value is exactly 100000.
Title: Re: Fast file time
Post by: MichaelW on January 05, 2012, 01:35:35 PM
One thing I have never understood about the timers is how the high-resolution performance frequency (returned by QueryPerformanceFrequency) can be 3579546 Hz, or for the current topic why Microsoft selected a time unit of 100ns, when the system timers have a 1193182 Hz clock, and so a minimum period of ~838ns.
Title: Re: Fast file time
Post by: sinsi on January 05, 2012, 01:46:47 PM
Don't they use the APIC timers nowadays?
One problem with the old ISA timer was binary doesn't work well in decimal, my real mode millisecond timer actually runs at 1090 or something.

>which doesn't change anything
Yes, I got the same results either way. Makes a nice hex (binary?) progression though  :bdg
Title: Re: Fast file time
Post by: qWord on January 05, 2012, 02:17:56 PM
this work perfectly on my machine (Win7-x64)  :U
include \masm32\include\masm32rt.inc

KSYSTEM_TIME struct
LowPart    ULONG ?
High1Time   LONG ?
High2Time   LONG ?
KSYSTEM_TIME ends

gns macro t:req
mov eax,7FFE0000h+8
.while 1
mov edx,[eax].KSYSTEM_TIME.High1Time
mov ecx,[eax].KSYSTEM_TIME.LowPart
.break .if edx == [eax].KSYSTEM_TIME.High2Time
.endw
mov DWORD ptr t,ecx
mov DWORD ptr t+4,edx
endm

.code
main proc
LOCAL t1:QWORD
LOCAL t2:QWORD

gns t1
fild t1
fmul FP4(100.0E-9)
fstp t1
xor ebx,ebx
.while ebx < 100
gns t2
fild t2
fmul FP4(100.0E-9)
fsub t1
fstp t2
print real8$(t2),13,10
invoke Sleep,250
lea ebx,[ebx+1]
.endw

inkey
exit
main endp
end main
Title: Re: Fast file time
Post by: MichaelW on January 05, 2012, 02:20:40 PM
QuoteDon't they use the APIC timers nowadays?

Even though I recall reading about the APIC years ago, I had forgotten all about it. I now have a system that was built in December 2003, and on it QueryPerformanceFrequency returns 2992580000.
Title: Re: Fast file time
Post by: jj2007 on January 05, 2012, 03:01:48 PM
Quote from: qWord on January 05, 2012, 02:17:56 PM
this work perfectly on my machine (Win7-x64)  :U

You got it :U

Actually, my macros worked, too. The tricky part causing the irregular sequences (6.0, 6.5, 8.0 (http://www.masm32.com/board/index.php?topic=18104.msg152631#msg152631)) is the delay loop - probably it cuts through time slices :'(
Title: Re: Fast file time
Post by: donkey on January 05, 2012, 04:13:26 PM
Quote from: jj2007 on January 05, 2012, 10:29:59 AM
I got it, Edgar. Open this page (http://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-information-the-kuser_shared_data-structure/) and search for consistent.
The structure there doesn't match the one MS uses, there are a number of discrepancies. It is a bit of a complicated structure to translate, this is how far I've gotten, checks out for the offsets they identify in Hex in the structure. It is complete to Win8 however I have to check a few feilds to verify it.

Nice work qWord. works here too.

#define MAX_WOW64_SHARED_ENTRIES 16
#define PROCESSOR_FEATURE_MAX 64
#define MM_SHARED_USER_DATA_VA 0x7FFE0000

// enum NT_PRODUCT_TYPE
NtProductWinNt = 1
NtProductLanManNt = 2
NtProductServer = 3

// enum ALTERNATIVE_ARCHITECTURE_TYPE
StandardDesign = 0
NEC98x86 = 1
EndAlternatives = 2

#define MAXIMUM_XSTATE_FEATURES 64
#define XSTATE_LEGACY_FLOATING_POINT        0
#define XSTATE_LEGACY_SSE                   1
#define XSTATE_GSSE                         2

#define XSTATE_MASK_LEGACY_FLOATING_POINT   (1 << (XSTATE_LEGACY_FLOATING_POINT))
#define XSTATE_MASK_LEGACY_SSE              (1 << (XSTATE_LEGACY_SSE))
#define XSTATE_MASK_LEGACY                  (XSTATE_MASK_LEGACY_FLOATING_POINT | XSTATE_MASK_LEGACY_SSE)
#define XSTATE_MASK_GSSE                    (1 << (XSTATE_GSSE))

#define NX_SUPPORT_POLICY_ALWAYSOFF 0
#define NX_SUPPORT_POLICY_ALWAYSON 1
#define NX_SUPPORT_POLICY_OPTIN 2
#define NX_SUPPORT_POLICY_OPTOUT 3

// Processor features
#define PF_FLOATING_POINT_PRECISION_ERRATA  0   
#define PF_FLOATING_POINT_EMULATED          1   
#define PF_COMPARE_EXCHANGE_DOUBLE          2   
#define PF_MMX_INSTRUCTIONS_AVAILABLE       3   
#define PF_PPC_MOVEMEM_64BIT_OK             4   
#define PF_ALPHA_BYTE_INSTRUCTIONS          5   
#define PF_XMMI_INSTRUCTIONS_AVAILABLE      6   
#define PF_3DNOW_INSTRUCTIONS_AVAILABLE     7   
#define PF_RDTSC_INSTRUCTION_AVAILABLE      8   
#define PF_PAE_ENABLED                      9   
#define PF_XMMI64_INSTRUCTIONS_AVAILABLE   10   
#define PF_SSE_DAZ_MODE_AVAILABLE          11   
#define PF_NX_ENABLED                      12   
#define PF_SSE3_INSTRUCTIONS_AVAILABLE     13   
#define PF_COMPARE_EXCHANGE128             14   
#define PF_COMPARE64_EXCHANGE128           15   
#define PF_CHANNELS_ENABLED                16   
#define PF_XSAVE_ENABLED                   17

XSTATE_FEATURE STRUCT
Offset LONG
Size LONG
ENDS

XSTATE_CONFIGURATION  STRUCT
EnabledFeatures LONG64
Size LONG
OptimizedSave LONG
Features XSTATE_FEATURE MAXIMUM_XSTATE_FEATURES DUP
ENDS

KSYSTEM_TIME STRUCT
LowPart LONG
High1Time LONG
High2Time LONG
ENDS

KUSER_SHARED_DATA STRUCT
//
// WARNING: This structure must have exactly the same layout for 32- and
//    64-bit systems. The layout of this structure cannot change and new
//    fields can only be added at the end of the structure (unless a gap
//    can be exploited). Deprecated fields cannot be deleted. Platform
//    specific fields are included on all systems.
//
//    Layout exactness is required for Wow64 support of 32-bit applications
//    on Win64 systems.
//
//    The layout itself cannot change since this structure has been exported
//    in ntddk, ntifs.h, and nthal.h for some time.

TickCountLowDeprecated LONG
TickCountMultiplier LONG
InterruptTime KSYSTEM_TIME
SystemTime KSYSTEM_TIME
TimeZoneBias KSYSTEM_TIME
ImageNumberLow SHORT
ImageNumberHigh SHORT
NtSystemRoot SHORT 260 DUP
MaxStackTraceDepth LONG
CryptoExponent LONG
TimeZoneId LONG
LargePageMinimum LONG
Reserved2  LONG 7 DUP
NtProductType ENUM // NT_PRODUCT_TYPE
ProductTypeIsValid BOOLEAN
Padding0 CHAR 3 DUP
NtMajorVersion LONG
NtMinorVersion LONG
ProcessorFeatures BOOLEAN PROCESSOR_FEATURE_MAX DUP
Reserved1 LONG
Reserved3 LONG
TimeSlip LONG
AlternativeArchitecture ENUM // ALTERNATIVE_ARCHITECTURE_TYPE
AltArchitecturePad LONG
SystemExpirationDate LARGE_INTEGER
SuiteMask LONG
KdDebuggerEnabled BOOLEAN
NXSupportPolicy CHAR
Padding CHAR 2 DUP
ActiveConsoleId LONG
DismountCount LONG
ComPlusPackage LONG
LastSystemRITEventTickCount LONG
NumberOfPhysicalPages LONG
SafeBootMode BOOLEAN

TscQpcData CHAR
TscQpcPad CHAR 2 DUP

// > Vista only
TraceLogging LONG

; SharedDataFlags LONG
DataFlagsPad LONG

TestRetInstruction LONGLONG
SystemCall LONG
SystemCallReturn LONG
SystemCallPad LONGLONG 3 DUP

UNION
TickCount KSYSTEM_TIME
TickCountQuad LONG64
ENDUNION

// The following padding is documented in the above union
// it is added separately to bypass a bug in GoAsm - Do not change !
TickCountPad DD

Cookie LONG
CookiePad LONG
ConsoleSessionForegroundProcessId LONGLONG
Wow64SharedInformation LONG MAX_WOW64_SHARED_ENTRIES DUP
UserModeGlobalLogger SHORT 16 DUP
ImageFileExecutionOptions LONG

// Pre vista 4 bytes padding instead of LangGenerationCount
LangGenerationCount LONG
Reserved5 LONGLONG
InterruptTimeBias LONG64
TscQpcBias LONG64
ActiveProcessorCount LONG
ActiveGroupCount SHORT
Reserved4 SHORT
AitSamplingValue LONG
AppCompatFlag LONG
SystemDllNativeRelocation LONGLONG
SystemDllWowRelocation LONG
XStatePad LONG
XState XSTATE_CONFIGURATION
ENDS


EDIT:

There were a couple of offset issues with the version of the structure I had in this post previously, they have been fixed and all offsets match the ASSERTs in ntddk.h so the structure is definitely correct. I have left in the comment for this structure to ensure anyone that it will not change and is the same for x32 and x64.

Beginning with Windows Vista, SharedDataFlags has changed to TraceLogging, this is just a name change, offsets remain the same so SharedDataFlags has been commented out.
Title: Re: Fast file time
Post by: donkey on January 05, 2012, 05:00:58 PM
Nice for a replacement for OSVERSIONINFO, always hated the way you got the OS version and never liked the hacks I have seen using FS segment overrides:

mov ecx,MM_SHARED_USER_DATA_VA
mov eax,[ecx+KUSER_SHARED_DATA.NtMajorVersion]
mov edx,[ecx+KUSER_SHARED_DATA.NtMinorVersion]
Title: Re: Fast file time
Post by: jj2007 on January 05, 2012, 06:16:44 PM
I got some unexpected insights into Sleep and its granularity... very nice :bg

Loop 1          0.0156250000000000000
Loop 8          0.125000000000000000
Loop 32         0.500000000000000000
Loop 64         1.00000000000000000
Sleep 16 - one more? Esc=exit


Executable attached.

include \masm32\MasmBasic\MasmBasic.inc
Ticks2FPU MACRO
   mov edx, 7FFE0008h   ; aka KUSER_SHARED_DATA.InterruptTime.High1Time (http://www.nirsoft.net/kernel_struct/vista/KUSER_SHARED_DATA.html)
@@:   mov eax, [edx+4]
   cmp eax, [edx+8]   ; See Windows Research Kernel @ HPI (http://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-information-the-kuser_shared_data-structure/) and look for consistent
   jne @B
   fild qword ptr [edx]
ENDM

OneEm9   REAL10 100.0E-9

   Init
   mov ecx, 16
   .Repeat
      Ticks2FPU
      For_ n=1 To 65
         push ecx
         dec ecx
         invoke Sleep, ecx
         pop ecx
         fld st
         Ticks2FPU
         fsub
         fchs
         fld OneEm9
         fmul
         Print Str$("Loop %i  \t", n), Str$("%If\n", ST(0))
         fstp st
      Next
      Inkey Str$("Sleep %i - one more? Esc=exit\n", ecx-1)
      add ecx, ecx   ; double Sleep delay
   .Until eax==VK_ESCAPE
   Exit
end start

EDIT: Don't trust the code posted above!

With a Sleep 1000 (i.e. mov ecx, 1001 after Init) I get:
Loop 40         40.0000000000000000
Loop 41         41.0000000000000000  << great precision until here
Loop 42         42.0019531000000000  << starts misbehaving
Loop 43         43.0039062000000000
Loop 44         44.0058594000000000


Sometimes it keeps 19-digits precision until the end (Loop 65), sometimes it starts misbehaving earlier, i.e. invoke Sleep, n produces delays slightly longer than n ms. It's on a single core CPU.

Greetings to Redmond ::)
Title: Re: Fast file time
Post by: donkey on January 05, 2012, 07:15:35 PM
This thing just has a lot of information, processor features are determined as follows (or appear to be)

// replacement for IsProcessorFeaturePresent function
mov ecx,MM_SHARED_USER_DATA_VA
mov edx,[ecx+KUSER_SHARED_DATA.ProcessorFeatures+PF_XXXXXXXXXX]
and edx,1


EDX will be TRUE if the feature is present, FALSE otherwise.

KUSER_SHARED_DATA.ActiveProcessorCount will give the number of threads the processor supports (not the number of cores)

There is quite a lot of stuff to play with here and the best part is it is pretty much guaranteed never to be altered or deprecated, only expanded so it can be used without much worry.
Title: Re: Fast file time
Post by: dedndave on January 05, 2012, 07:19:36 PM
correct me if i'm wrong, here - lol

but, i would think the API code looks much the same as your code
Title: Re: Fast file time
Post by: donkey on January 05, 2012, 07:26:03 PM
Quote from: dedndave on January 05, 2012, 07:19:36 PM
correct me if i'm wrong, here - lol

but, i would think the API code looks much the same as your code

Probably but this has no PUSH/CALL/RET overhead so you can save a few cycles (not to mention cache hits) and it also does very well at obfusacting code if you don't appreciate people disassembling your program. Any time I can replace an API reliably is a good thing in my book, one more tool in the box.

For example the KUSER_SHARED_DATA.ActiveProcessorCount can pretty much replace this whole DLL and program (http://www.codeproject.com/KB/system/countingprocessors.aspx) with a fairly reliable method using only 2 opcodes. Well, the Logical Processors part anyway.
Title: Re: Fast file time
Post by: jj2007 on January 05, 2012, 11:59:44 PM
Warning - see update of #22!
Title: Re: Fast file time
Post by: qWord on January 06, 2012, 12:45:37 AM
Quote from: jj2007 on January 05, 2012, 06:16:44 PMSometimes it keeps 19-digits precision until the end (Loop 65), sometimes it starts misbehaving earlier, i.e. invoke Sleep, n produces delays slightly longer than n ms. It's on a single core CPU.
This is unsurprisingly - Windows is not a real-time OS.
You may get better result by restarting a waitable timer each iteration (Create/SetWaitableTimer) (http://msdn.microsoft.com/en-us/library/windows/desktop/ms686289(v=vs.85).aspx)
Title: Re: Fast file time
Post by: donkey on January 06, 2012, 01:49:20 AM
I've kind of given up on it as a timer at least for now, the reason I got into the structure was that I'm translating the DDK for GoAsm and ran across it. There are plenty of cycle saving tidbits in there to keep me interested though. According to the article Jochen linked (http://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-information-the-kuser_shared_data-structure/) in an earlier post the interrupt timer might be one to try for timing functions.

Quote from: The article Jochen postedThe interrupt time is the only Windows clock that guarantees to be monotonous that is, its value only increases over timer. Its value represents the time in units of 100 ns since the system was booted. The interrupt time is the base clock for all timers in Windows (see my recent article A Bug in Windows Timer Management). It is updated every clock interrupt.

The following is a dump of the InterruptTime field with Sleep,100 between each...

Line 228: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1366442026 (0x5172402A)
Line 230: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1367534028 (0x5182E9CC)
Line 232: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1368626030 (0x5193936E)
Line 234: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1369718032 (0x51A43D10)
Line 236: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1370810034 (0x51B4E6B2)
Line 238: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1371902036 (0x51C59054)
Line 240: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1372994038 (0x51D639F6)
Line 242: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1374086040 (0x51E6E398)
Line 244: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1375178042 (0x51F78D3A)
Line 246: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1376270044 (0x520836DC)
Line 248: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1377362046 (0x5218E07E)
Line 250: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1378454048 (0x52298A20)


Haven't checked them all but about 1,010,000 ticks for 100 ms, seems pretty consistent.
Title: Re: Fast file time
Post by: jj2007 on January 06, 2012, 07:41:10 AM
Quote from: donkey on January 06, 2012, 01:49:20 AM
The following is a dump of the InterruptTime field with Sleep,100 between each...
...
Haven't checked them all but about 1,010,000 ticks for 100 ms, seems pretty consistent.

That's the whole point, Edgar - it looks consistent, with 19-digits precision. Until loop 42, when Sleep decides to relax a little bit, around 0.2% :bg
The timer macro is ok, and it's the interrupt one, so the culprit is obviously good ol' Sleep.
Title: Re: Fast file time
Post by: dedndave on January 06, 2012, 10:45:26 AM
Sleep isn't intended to be a timing device, strictly speaking
because it does not return until the beginning of the next time slice

not really sure how long a time slice is under windows - different versions of windows, different CPU's, etc  :P
but it could account for small variations
Title: Re: Fast file time, NanoTimer macro
Post by: jj2007 on January 18, 2012, 08:43:22 PM
Hi folks,

Here  (http://www.masm32.com/board/index.php?topic=12460.msg153367#msg153367)is a new NanoTimer() macro based on this thread. Thanks to all but especially to Edgar for having discovered this marvelous Windows feature :U