News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Windows File Caching semantics?

Started by Mark Jones, May 11, 2009, 04:29:42 AM

Previous topic - Next topic

Mark Jones

Question about Windows file caching---according to this MS article, "The [Win2k] file cache ... operates on files, or, more specifically, [256kb] sections of files. When file sections are referenced, they are mapped into an area of virtual memory by the Cache Manager."

Q: Does this mean ALL file sections are cached automatically? And by "section," are they referring to actual PE sections, or just 256kb chunks of data? Lastly, are there any file sections which are excluded, or have a higher or lower priority?

This is to efficiently code an application which should force caching of frequently-accessed data for as long as possible. (If such a thing is possible.)
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

UtillMasm

Windows 2000?

i only have vista and win7.

my question:

       'Physical memory: Total 2045, Cached: 1422, Free: 0'

how to disable the 'Cached' guy?

Mark Jones

The 'Cached' part is good -- the bigger this is, the better Windows will run overall, because the more data will be in memory. As soon as a program asks for more memory than is free however, the cache shrinks automatically. This is normal behavior. Zero free looks a little strange though. I do not have Vista or Win7, but it looks like a larger pagefile may help.
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

dedndave

there are several schools of thought about memory management and disk caches
(the two are very similar, as well as closely related)
each school of thought varies slightly in details
they are best viewed in flow-chart form, generally
to discover the precise school of thought selected by the MS programmers when they write the OS would be a real neat trick
in fact, they probably consider it "proprietary information" (basically because they would have difficulty fully describing it)
i remember seeing one memory manager described (this was about 1981, so the details have likely been much improved upon)
it was described in terms of a bartering system and a fisherman taking his catch to market
it talked about fish getting smelly and the wife not liking the smell (she shouldn't have married a fisherman) - lol
my point is that, for every programmer that touches the OS code, there is a slightly different
interpretation of what a memory manager or disk cache should do and how it should do it

as i understand it, programs are cached quite differently than data (a different management flow chart)
for larger programs, the program file itself may not be cached at all, rather, the location of all it's pieces on the drive is saved (indexed)
for other (data) files, it is more likely that the data is cached
the managers cache program files and data files differently for different sizes

the OS gives you a few controls over cache sizes, etc, but do not give you much control over the flow charts
likewise, they give you very simplified metering of perfomance
to be honest, if they gave you more controls (knobs), it would be difficult to understand or alter the settings
it would be equally as difficult to measure the performance advantages acquired from these settings
this is exasorbated by the fact that any given user operates the computer quite differently from another
many generalizations are made about the users' actions during a given session
if you are trying to optimize performance of the OS, there is only so much you can do

MichaelW

This quick and dirty test running under Windows 2000 seems to indicate that at least three sections are cached when the first section is accessed.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    include \masm32\macros\timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
      buff db 40000h dup(0)
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    mov esi, fopen("\masm32\include\windows.inc")

    invoke Sleep, 5000

    REPEAT 3

      timer_begin 1, HIGH_PRIORITY_CLASS
        mov eax, fread( esi, ADDR buff, 40000h )
      timer_end
      print ustr$(eax),","

      timer_begin 1, HIGH_PRIORITY_CLASS
        mov eax, fread( esi, ADDR buff, 40000h )
      timer_end
      print ustr$(eax),","

      timer_begin 1, HIGH_PRIORITY_CLASS
        mov eax, fread( esi, ADDR buff, 40000h )
      timer_end
      print ustr$(eax),13,10

      mov eax, fseek( esi, 0, BEGIN )

    ENDM

    fclose esi

    inkey "Press any key to exit..."
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start


The first time I run it after building it I typically get:

10,2,2
1,1,2
1,1,1


And if I run it again:

3,2,2
2,1,1
1,1,1


I'm guessing that the build process works the disk and cache hard enough to effectively clear windows.inc from the cache (my system cache typically occupies ~256MB).
eschew obfuscation

redskull

My understanding is that it will map a file in 256K chunks, depending on what is accessed; So, for example, if you were to open a 10 Meg file and read through it from top to bottom, it would initially map only the first 256K; as soon as you read into byte 257K, it will load up the second chunk into another view, and so on.  If you pulled up another file but started searching in the middle, it would only load the middle 256K chunk.  So, yes everything is done automatically, and no, it doesn't depend on PE specifications; just simply byte offsets within the file.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

Mark Jones

So pretty much, to force (any) file to be completely cached, perform a full read (or one byte every 256k.) Interesting, perhaps there is some I/O savings to be had in the latter, thanks. :bg
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

redskull

There's also "Intelligent Read Ahead", which will try to determine a pattern; if you read page 10 and then page 11, it will automatically load 12 in anticipation, so it's always best to to read in one direction (up or down).  This behavior depends on the flags you pass during CreateFile().  Also, though I have no proof to back up this claim, I think Windows will "fill up" empty memory with whatever it wants, based on your access patterns; e.g., if you use a web browser every single day, and you are only using 500M of physical ram out of 3G, then it will cache the web brower program and libraries 'just in case'.  But again, that depends on how much free RAM you have when you run the program.  Depending on 'how important' everything else is, it might uncache the first few blocks of your file as your reading in the last few.  I have to assume that any active program will be put completly in the cache (if possible), on the basis that it would create the biggest visible performance gain.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

dedndave

I think if you want the manager to behave "unnaturally", you have to access the file of interest "unnaturally".
If you want a specific file to stay in cache, access it every so often, without need.
Try to access different sections of the file.
The frequency and size of your accesses will cause the manager to elevate its' "importance".

Mark Jones

On a side note, the Lib Search Tool creates a list of valid filenames in a buffer upon startup, then once a search is initiated, loads each file from the list into RAM and searches it. Because of this, the first search takes considerably longer than subsequent searches, as each file must be sequentially read from disk. Once the data is cached however, it is quite fast.

It seems like some time around writing this tool, Windows Search changed, maybe during a service-pack or something. Windows Search now seems to pre-cache file data when activated, before a search is even performed. (Try right-clicking a large folder tree in Explorer, choose "Search...", and it will sit there a few seconds and "scan" something.) That is rather intelligent.
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08