how the cache work.

jj2007 · June 08, 2009, 06:59:50 AM

Quote from: NightWare on June 08, 2009, 02:56:23 AM
Quote from: jj2007 on June 07, 2009, 02:27:44 PM
Filling 32 or 64k of cache with the kind of code that is painted in yellow above requires 500...1000 inner loops ::)
why doing that ?

Because you assume that your 100-byte algo is 1. is not in the cache and 2. called many times, so every time you lose 100 cycles. Let's assume a worst case scenario:

- You have been granted 32k of virgin cache and a 20 ms time slice for your exclusive use.
- You have a very complex graphics routine of say, 32k.
- You call memfill to clear the screen. For the sake of the argument, do it three times to make sure it's in the cache.
- The complex algo could fill the cache entirely and thus evict the 100 bytes of memfill.
- However, running it once is not sufficient: The watchdog at the cache entry lets in only those with a frequent visitor card. The others pay a penalty :wink
- So you need to run first the 32k algo three times to make sure memfill is evicted.
- Then memfill comes as infrequent visitor and has to pay 100 cycles.

Now, in my naive understanding of things, 32k*3 means roughly 100,000 cycles, and 100 cycles (= 0.1%) more would not hurt me. But I have probably overlooked something - you are the expert. Seriously. I want to learn and understand this.

NightWare · June 09, 2009, 09:16:03 PM

> You have been granted 32k of virgin cache and a 20 ms time slice for your exclusive use.
- technically, there is no "virgin" cache, and you have considerably less time per OS iteration, coz the OS give you the hand every ~15 ms (so this slice of time represent your app, the others, and the OS functionnality).

> You have a very complex graphics routine of say, 32k.
- ok,

> You call memfill to clear the screen. For the sake of the argument, do it three times to make sure it's in the cache.
- not exactly, in my explanation i've reduced the infos to the minimum (to make the principle understandable), but in reality for just 3 call the memfill algo
will not be entirely in the cache, because there is an ORDER in the execution. yep, place yourself in the point of view of the cpu, in your case YOU know where the loop are, NOT the cpu (in fact, there a system to avoid mispredictions, but it work only for the case where the jumps are always in the same direction, or if the direction always change.
it's why i prefere considering every jumps as misprediction, coz in most cases it's impossible to respect this principle...), so (in the example) the part from Label2 to Label3 will be considered by the cpu as THE loop to store (before considering the entire function).

> The complex algo could fill the cache entirely and thus evict the 100 bytes of memfill.
- no and yes, no the complex algo will not fill the cache, there is requirements for that, linked to the loop statu, execution order, the frequency, size, etc... and 32k is clearly beyond the cache size.
example : for the librarian, if someone want a trilogy, but he only has 2 empty slot free, then it's useless/unproductive to store 2 book and make the travel for the last one.
it's the same for the cpu, for an additionnal reason : there is a COST to fill large amount of bytes. if you look ch 2.1.4 (dtlb part) you will see that there is locations reserved for large pages, in your opinion, why ?
and yes it will probably disappear of the cache (here it's a question of logistic/logic).

> However, running it once is not sufficient: The watchdog at the cache entry lets in only those with a frequent visitor card. The others pay a penalty
- more or less, there is requirements to be in the cache, IF others algo don't enter in the cache, it's because there is no reason so they don't pay a penalty (here, the penalty would have been to produce the extra work to put the code in the cache), it's just executed normally.

> So you need to run first the 32k algo three times to make sure memfill is evicted.
- no it depends of the requirments, IF your 32k algo has no loop, it will not fit in the cache, so something else is used. IF your 32k algo has loops, then thoses loops will be stored (in the order of the execution).

> Then memfill comes as infrequent visitor and has to pay 100 cycles.
- it depends, here again it's a logistic problem. the memfill algo is condamned to disappear in ALL case, it's just a question of time/frequency (the principle of the cache itself).
example : for the librarian, if a book is less used, the book will return to his "normal" place/location and a book more used will replace it. AND IF the book is used more often (again), the process will restart.

> you are the expert. Seriously. I want to learn and understand this.
- i'm not more expert than you are, the difference is i've just spent some time to understand how it could work, with speed optimization in mind, coz the cpu can't use complex operations for that (by this i mean division or multiplication, other than register shift), at THIS level.

News:

how the cache work.

jj2007

NightWare