News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

how the cache work.

Started by NightWare, June 04, 2009, 10:16:29 PM

Previous topic - Next topic

jj2007

Sounds possible although a bit "constructed". Filling 32 or 64k of cache with the kind of code that is painted in yellow above requires 500...1000 inner loops ::)

Mark Jones

Perhaps a dumb question, but can modern cache be disabled in the BIOS (like in the days of old)? If not, how about "filling it" with junk from a loop of 1,000, THEN run the algo under test once, rinse & repeat...

One thing is certain: the cache adds a very complex dynamic to the timing results.
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

dedndave

there is a CD (Cache Disable) control bit - i dunno if windows wants you to diddle with it or not
(same as the NW, Not Write-though bit)
these bits are in CR0

dedndave

apparently, it is safe to modify these bits
i guess the OS really doesn't really know the difference, other than by reading the CR0 register
CR0 - bit 29 - NW - Not Write-through (1 = not write-through or write-back cache, 0 = write-through cache)
CR0 - bit 30 - CD Cache Disable (1 = cache disabled, 0 = cache enabled)

both bits are set to 1 at reset - the OS sets them both to 0

disable the cache:

        mov     eax,CR0
        or      eax,40000000h
        mov     CR0,eax

enable the cache:

        mov     eax,CR0
        and     eax,0BFFFFFFFh
        mov     CR0,eax

might be a good idea to flush the cache before disabling it - REALTIME_PRIORITY_CLASS during the switch also
it would be interesting to see some timing comparisons on different processors (of both bits)
Jochen will be on this like a kid at christmas time

dedndave

nope - lol
XP rejected my attempt to disable the cache - let me play with NW
it may not allow me to play with CR0 at all for security reasons

EDIT
XP won't even let me read CR0 - hasta be a work-around someplace

drizz

Quote from: dedndave on June 07, 2009, 06:10:53 PMXP won't even let me read CR0 - hasta be a work-around someplace
Ring0
The truth cannot be learned ... it can only be recognized.

dedndave

ty Drizz - i am reading now - it is do-able

BlackVortex

One method I use to get as much info as possible for my own thread is with an exception handler. You get the full (?) thread context, even debug registers.
I'm kinda bored now, look here :
http://msdn.microsoft.com/en-us/library/ms679331(VS.85).aspx

Is the register you're looking for there ?

dedndave

well - that will tell me about the exception
it appears you need to enter ring 0, make changes in the msr, then leave ring 0
it can be done, but i think this may be a topic that the forum avoids
too bad, as it would be interesting to see some comparisons, especially with the NW bit toggled
but, temporarily disabling the cache would also be nice to test theories of cache operation

NightWare

Quote from: jj2007 on June 07, 2009, 08:07:23 AM
Now the crucial question: In which real life situation would it matter to be a few hundred cycles slower, once? Can you give an example?
it's not executed once, never, your app/program is a loop, nothing else. so it will ALWAYS call a function an enormous number of time, during the use. the problem is  : your app  (and your app is also not the only one running...), in most case, use a lot of diffrents algos to accomplish a task. and the cache is constantly updated, so it's not only few hundred cycles you loose once, but few hundred cycles you loose for every iterations, (and you can multiply it, when your main loop (your app) has severals loop levels...).

jj2007

Quote from: dedndave on June 07, 2009, 06:58:01 PM
well - that will tell me about the exception
it appears you need to enter ring 0, make changes in the msr, then leave ring 0
it can be done, but i think this may be a topic that the forum avoids

You would need a device driver and a call gate to use ring 0 functions from userland. Feasible, apparently, but not my cup of tea either...

dedndave

nah - i have some code that will do it i found on a RE forum site
in fact, it will do some other stuff i didn't want to do - lol

BlackVortex

Quote from: dedndave on June 07, 2009, 10:47:22 PM
nah - i have some code that will do it i found on a RE forum site
in fact, it will do some other stuff i didn't want to do - lol

Link plz ?   :green

Or PM if you prefer.

EDIT: Thanks

drizz

There is a very similar project(driver) coded by Opcode. look for "iopl_module_732.zip" in:
http://ghirai.com/hutch/files/win32asmboard_code_arhive.tar.bz2
The truth cannot be learned ... it can only be recognized.

NightWare

Quote from: jj2007 on June 07, 2009, 02:27:44 PM
Filling 32 or 64k of cache with the kind of code that is painted in yellow above requires 500...1000 inner loops ::)
why doing that ? :lol, maybe you've planned to scroll the page after...

Quote from: dedndave on June 07, 2009, 06:58:01 PM
temporarily disabling the cache would also be nice to test theories of cache operation
yep, but if you place yourself in the point of view of the cpu, you can deduce lot of things... for example, when you read from memory it's easy coz it's naturally indexed by the size of the memory, but when you put portions of this memory in the l2 cache there is no natural index anymore. but the cpu NEED a way to quickly know if a value is in the cache or not, so you can SEE how it could work, (try to code an algo for the possible options and you will understand quickly...).