News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Fast file time

Started by donkey, January 05, 2012, 07:19:07 AM

Previous topic - Next topic

donkey

I was playing around with getting the current system time (UTC) as fast as is reasonably possible, the format had to be in a FILETIME structure. Luckily, if you're looking for a FILETIME format you can simple avoid using any API calls at all by using a little known shared area of memory. Address 0x7FFE0000 contains an area of memory shared between kernel and user mode that conatins the current clock in 100ns intervals since January 1, 1601 (FILETIME). You can also obtain the current tick count without an API call. Here's a sample of how to get the UTC system time:


/*
<link removed at Microsoft's request>
*/

#define MAX_WOW64_SHARED_ENTRIES 16
#define PROCESSOR_FEATURE_MAX 64
#define MM_SHARED_USER_DATA_VA 0x7FFE0000

// enum NT_PRODUCT_TYPE
NtProductWinNt = 1
NtProductLanManNt = 2
NtProductServer = 3

// enum ALTERNATIVE_ARCHITECTURE_TYPE
StandardDesign = 0
NEC98x86 = 1
EndAlternatives = 2

#define MAXIMUM_XSTATE_FEATURES 64
#define XSTATE_LEGACY_FLOATING_POINT        0
#define XSTATE_LEGACY_SSE                   1
#define XSTATE_GSSE                         2

#define XSTATE_MASK_LEGACY_FLOATING_POINT   (1 << (XSTATE_LEGACY_FLOATING_POINT))
#define XSTATE_MASK_LEGACY_SSE              (1 << (XSTATE_LEGACY_SSE))
#define XSTATE_MASK_LEGACY                  (XSTATE_MASK_LEGACY_FLOATING_POINT | XSTATE_MASK_LEGACY_SSE)
#define XSTATE_MASK_GSSE                    (1 << (XSTATE_GSSE))

#define NX_SUPPORT_POLICY_ALWAYSOFF 0
#define NX_SUPPORT_POLICY_ALWAYSON 1
#define NX_SUPPORT_POLICY_OPTIN 2
#define NX_SUPPORT_POLICY_OPTOUT 3

// Processor features
#define PF_FLOATING_POINT_PRECISION_ERRATA  0   
#define PF_FLOATING_POINT_EMULATED          1   
#define PF_COMPARE_EXCHANGE_DOUBLE          2   
#define PF_MMX_INSTRUCTIONS_AVAILABLE       3   
#define PF_PPC_MOVEMEM_64BIT_OK             4   
#define PF_ALPHA_BYTE_INSTRUCTIONS          5   
#define PF_XMMI_INSTRUCTIONS_AVAILABLE      6   
#define PF_3DNOW_INSTRUCTIONS_AVAILABLE     7   
#define PF_RDTSC_INSTRUCTION_AVAILABLE      8   
#define PF_PAE_ENABLED                      9   
#define PF_XMMI64_INSTRUCTIONS_AVAILABLE   10   
#define PF_SSE_DAZ_MODE_AVAILABLE          11   
#define PF_NX_ENABLED                      12   
#define PF_SSE3_INSTRUCTIONS_AVAILABLE     13   
#define PF_COMPARE_EXCHANGE128             14   
#define PF_COMPARE64_EXCHANGE128           15   
#define PF_CHANNELS_ENABLED                16   
#define PF_XSAVE_ENABLED                   17

XSTATE_FEATURE STRUCT
Offset LONG
Size LONG
ENDS

XSTATE_CONFIGURATION  STRUCT
EnabledFeatures LONG64
Size LONG
OptimizedSave LONG
Features XSTATE_FEATURE MAXIMUM_XSTATE_FEATURES DUP
ENDS

KSYSTEM_TIME STRUCT
LowPart LONG
High1Time LONG
High2Time LONG
ENDS

KUSER_SHARED_DATA STRUCT
//
// WARNING: This structure must have exactly the same layout for 32- and
//    64-bit systems. The layout of this structure cannot change and new
//    fields can only be added at the end of the structure (unless a gap
//    can be exploited). Deprecated fields cannot be deleted. Platform
//    specific fields are included on all systems.
//
//    Layout exactness is required for Wow64 support of 32-bit applications
//    on Win64 systems.
//
//    The layout itself cannot change since this structure has been exported
//    in ntddk, ntifs.h, and nthal.h for some time.

TickCountLowDeprecated LONG
TickCountMultiplier LONG
InterruptTime KSYSTEM_TIME
SystemTime KSYSTEM_TIME
TimeZoneBias KSYSTEM_TIME
ImageNumberLow SHORT
ImageNumberHigh SHORT
NtSystemRoot SHORT 260 DUP
MaxStackTraceDepth LONG
CryptoExponent LONG
TimeZoneId LONG
LargePageMinimum LONG
Reserved2  LONG 7 DUP
NtProductType ENUM // NT_PRODUCT_TYPE
ProductTypeIsValid BOOLEAN
Padding0 CHAR 3 DUP
NtMajorVersion LONG
NtMinorVersion LONG
ProcessorFeatures BOOLEAN PROCESSOR_FEATURE_MAX DUP
Reserved1 LONG
Reserved3 LONG
TimeSlip LONG
AlternativeArchitecture ENUM // ALTERNATIVE_ARCHITECTURE_TYPE
AltArchitecturePad LONG
SystemExpirationDate LARGE_INTEGER
SuiteMask LONG
KdDebuggerEnabled BOOLEAN
NXSupportPolicy CHAR
Padding CHAR 2 DUP
ActiveConsoleId LONG
DismountCount LONG
ComPlusPackage LONG
LastSystemRITEventTickCount LONG
NumberOfPhysicalPages LONG
SafeBootMode BOOLEAN

TscQpcData CHAR
TscQpcPad CHAR 2 DUP

// > Vista only
TraceLogging LONG

; SharedDataFlags LONG
DataFlagsPad LONG

TestRetInstruction LONGLONG
SystemCall LONG
SystemCallReturn LONG
SystemCallPad LONGLONG 3 DUP

UNION
TickCount KSYSTEM_TIME
TickCountQuad LONG64
ENDUNION

// The following padding is documented in the above union
// it is added separately to bypass a bug in GoAsm - Do not change !
TickCountPad DD

Cookie LONG
CookiePad LONG
ConsoleSessionForegroundProcessId LONGLONG
Wow64SharedInformation LONG MAX_WOW64_SHARED_ENTRIES DUP
UserModeGlobalLogger SHORT 16 DUP
ImageFileExecutionOptions LONG

// Pre vista 4 bytes padding instead of LangGenerationCount
LangGenerationCount LONG
Reserved5 LONGLONG
InterruptTimeBias LONG64
TscQpcBias LONG64
ActiveProcessorCount LONG
ActiveGroupCount SHORT
Reserved4 SHORT
AitSamplingValue LONG
AppCompatFlag LONG
SystemDllNativeRelocation LONGLONG
SystemDllWowRelocation LONG
XStatePad LONG
XState XSTATE_CONFIGURATION
ENDS

DATA SECTION
ksystime KSYSTEM_TIME <>
CODE SECTION
mov eax,MM_SHARED_USER_DATA_VA
add eax,KUSER_SHARED_DATA.SystemTime

mov ecx,[eax]
mov [ksystime.LowPart], ecx
add eax,4
mov ecx,[eax]
mov [ksystime.High1Time], ecx
add eax,4
mov ecx,[eax]
mov [ksystime.High2Time], ecx


You can use ksystime directly with any function that requires the current UTC time in FILETIME format. I am not sure what High2Time is but it seems to be equivalent to High1Time on my system.

Hope someone will find this interesting, I'm not sure which OSes support it, I tested mine in 32 bit mode of Win7-x64 and it works perfectly. According to some sources the shared area of memory is available from Win2K on, some others say WinXP.

Edgar

EDIT:

The structures have been changed to a final version current to Windows 8. Offsets have been checked to be sure they match the ASSERTs in ntddk.h
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

jj2007

Quote from: donkey on January 05, 2012, 07:19:07 AM
Address 0x7FFE0000 contains an area of memory shared between kernel and user mode that conatins the current clock in 100ns intervals since January 1, 1601 (FILETIME). You can also obtain the current tick count without an API call.

Yeah, GetTick is probably the fastest API you can find ;-)

GetTickCount  BA 0000FE7F      mov edx, 7FFE0000        ; INT kernel32.GetTickCount(void)
7C8092BD      8B02             mov eax, [edx]
7C8092BF      F762 04          mul dword ptr [edx+4]
7C8092C2      0FACD0 18        shrd eax, edx, 18
7C8092C6      C3               retn

donkey

However, getting the raw tick count direcly you can avoid the load of the multiplier and the shift and get an even higher resolution, not sure about the granularity (probably 100ns) but it might be a useful high resolution timer for testing code execution, processor speed etc... Just a thought.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

donkey

#3
Hi Dave,

The best place to find these sort of things is to check the DKK headers, something I was doing when I got the idea for the code above.

BTW for my purposes I wanted to save the UTC FILETIME structure, if you just need to use it once you can avoid moving it into a structure. For example:

mov eax,MM_SHARED_USER_DATA_VA
add eax,KUSER_SHARED_DATA.SystemTime
invoke FileTimeToSystemTime,eax,offset systime
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

jj2007

Quote from: donkey on January 05, 2012, 07:34:17 AMHowever, getting the raw tick count direcly you can avoid the load of the multiplier and the shift and get an even higher resolution

I made a quick test and it always yields 64 ticks/second - same for a non-FPU solution. Probably there is an error in my logic ::)

Quoteinclude \masm32\MasmBasic\MasmBasic.inc
Ticks2Stack MACRO
   mov edx, 7FFE0000h
   fild dword ptr [edx]
ENDM

TicksPop MACRO
   mov edx, 7FFE0000h
   fisub dword ptr [edx]
   fchs
ENDM

   Init
   Ticks2Stack
   invoke Sleep, 1000
   TicksPop
   Inkey Str$("Ticks=%i", ST(0))
   Exit
end start

donkey

Hi Jochen,

Yeah, its a weird one, I just get 0 in the field. Have to play with it a bit, obviously GetTickCount is getting useful data...
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

jj2007

I played a bit with the 7FFE0008h and 7FFE0014h dwords, they get updated much more frequently. But there is nonetheless some weird granularity involved, both for the Sleep and the .Repeat test. The latter yields
Ticks=156265    1 loops
Ticks=312530    2 loops
Ticks=312530    3 loops
Ticks=625060    4 loops
Ticks=625060    5 loops
Ticks=781325    6 loops
Ticks=781325    7 loops

... which doesn't make sense to me. I'd also like to know why commenting out the useless mov eax below takes away the granularity:

Quote.Repeat
;              mov eax, 123
              imul eax, eax, 123

Quoteinclude \masm32\MasmBasic\MasmBasic.inc
Ticks2Stack MACRO
   mov edx, 7FFE0014h   ; or 7FFE0008h
   fild dword ptr [edx]
ENDM

TicksPop MACRO
   mov edx, 7FFE0014h
   fisub dword ptr [edx]
   fchs
ENDM

   Init

   For_ n=0 To 20
      Ticks2Stack
      imul eax, n, 10  ; increase delay proportionally to n
      push eax
      if 0
         invoke Sleep, eax
      else
         mov ecx, eax
         .Repeat
            mov edx, 1000000
            .Repeat
              mov eax, 123
              imul eax, eax, 123
              dec edx
            .Until Sign?
            dec ecx
         .Until Sign?
      endif
      TicksPop
      pop ecx
      Print Str$("Ticks=%i\t", ST(0)), Str$("%i loops\n", n)
   Next
   Exit
end start

donkey

Hi Jochen,

The tick count is not useful, it is marked as deprecated in later version of the DKK, I downloaded and installed the Win7 ddk and it is now called TickCountLowDeprecated. The tick count is now derived from the TickCount union in the structure, still calculating the offset...

#define KeQueryTickCount(CurrentCount) { \
    KSYSTEM_TIME volatile *_TickCount = *((PKSYSTEM_TIME *)(&KeTickCount)); \
    for (;;) {                                                              \
        (CurrentCount)->HighPart = _TickCount->High1Time;                   \
        (CurrentCount)->LowPart = _TickCount->LowPart;                      \
        if ((CurrentCount)->HighPart == _TickCount->High2Time) break;       \
        YieldProcessor();                                                   \
    }                                                                       \
}
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

donkey

Mmmmm, hard to calculate it by simple adding, I'll have to translate the whole structure to GoAsm format, the tick count is now pretty deep inside it. Oh well, it will be a good addition to the header project.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

jj2007

I got it, Edgar. Open this page and search for consistent.

MichaelW

On my Windows 2000 system I get 100 ticks per second, and on my newly acquired XP system I get 64.
eschew obfuscation

jj2007

Further testing still shows odd things:
Loop 0          0.0
Loop 1          1.0000000000000000
Loop 2          2.0000000000000000
Loop 3          3.0000000000000000
Loop 4          4.0000000000000000
Loop 5          5.0000000000000000
Loop 6          6.0000000000000000
Loop 7          6.5000000000000000
Loop 8          8.0000000000000000
Loop 9          8.5000000000000000
Loop 10         10.000000000000000

::)
And no, it's not a rounding problem... Str$() does have 18.5 digits precision, and it takes them directly from the FPU.
The second branch of the switch below (Interrupt time) reliably produces "round" figures, but with an occasional x.5; the first one (System time) sticks a while to round figures but then the OS spontaneously decides that 1.000023423423 looks better. Adjust Magic until you find round numbers again. Setting the affinity mask doesn't help.

Quoteinclude \masm32\MasmBasic\MasmBasic.inc
if 0
   UsedTime = 7FFE0014h   ; System time
   Magic=312588   ; may differ
   Magic=312562   
; according to
   Magic=312558   
; the processor's mood
else
   UsedTime = 7FFE0008h   ; Interrupt time
   Magic=312500   ; always the same value
endif

KSYSTEM_TIME STRUCT
LowPart   dd ?
High1Time   dd ?      ; UsedTime
High2Time   dd ?
KSYSTEM_TIME ENDS

Ticks2Stack MACRO
  mov edx, UsedTime   ; aka KUSER_SHARED_DATA.SystemTime.High1Time
; @@:

; mov eax, [edx]      ; fails miserably, see...
; cmp eax, [edx+4]      ;
Windows Research Kernel @ HPI
; jne @B      ; ... and look for
consistent
  fild qword ptr [edx]
ENDM

TicksPop MACRO
   mov edx, UsedTime
   fild qword ptr [edx]
   fsub
   fchs
ENDM

   Init
   ; invoke SetProcessAffinityMask, rv(GetCurrentProcess), 1
   PrintLine "Magic=", Hex$(Magic), CrLf$
   For_ n=0 To 10
      Ticks2Stack
      imul eax, n, 10
      push eax
      mov ecx, eax
      .Repeat
         mov edx, 1000000
         .Repeat
;           mov eax, 123
           imul eax, eax, 123
           dec edx
         .Until Sign?
         dec ecx
      .Until Sign?
      TicksPop
      push Magic
      fidiv dword ptr [esp]   ; divide ticks by magic number
      pop eax
      pop ecx
      Print Str$("Loop %i  \t", n), Str$("%Hf\n", ST(0))
      fstp st
   Next
   Exit
end start

sinsi


C:\Users\John\Desktop>NanoTicks.exe
Magic=0004C4B4

Loop 0          0.0
Loop 1          0.32000000000000000
Loop 2          0.64000000000000000
Loop 3          0.64000000000000000
Loop 4          1.2800032000000000
Loop 5          1.2800032000000000
Loop 6          1.6000000000000000
Loop 7          1.9200032000000000
Loop 8          1.9200032000000000
Loop 9          2.5600032000000000
Loop 10         2.5600032000000000

fwiw, I also got 0 in the original, not 64

I was going to suggest an affinity thingie but saw you had put it in, is the asm in the attachment the exe? (without the commented-out SetProcessAffinityMask).
I wonder how long it takes to update every process's structure too, and if it matters which cpu does it.

Looking at YieldProcessor, it's just a macro for a pause instruction (or rep nop for older cpus), usually used in spinlocks?
So if they match it returns, if they don't does it loop again? (The "for (;;)" I am not sure about.)
Light travels faster than sound, that's why some people seem bright until you hear them.

jj2007

Quote from: sinsi on January 05, 2012, 01:16:16 PM
I was going to suggest an affinity thingie but saw you had put it in, is the asm in the attachment the exe? (without the commented-out SetProcessAffinityMask).

Yes it is, without SetProcessAffinityMask (which doesn't change anything)
It seems your Magic value is exactly 100000.

MichaelW

One thing I have never understood about the timers is how the high-resolution performance frequency (returned by QueryPerformanceFrequency) can be 3579546 Hz, or for the current topic why Microsoft selected a time unit of 100ns, when the system timers have a 1193182 Hz clock, and so a minimum period of ~838ns.
eschew obfuscation