only innerloop that fits in cache+recursion faster than innerloop+several outerL

daydreamer · March 16, 2006, 08:05:14 AM

???
I have an algo that is almost the same code for inner and 2 outerloops
if I make it fit in 32byte and align it, can it be faster than a 3 times bigger code
I am reusing the same regs+push/pop anyway for outerloops
also how much do I lose on PIV, which has slow shifts
if I should choose between penalty for partial register mov dl,ah vs sar eax,8+add edx,eax
the code is having full 32bit reg operations earlier in loop

Ratch · March 16, 2006, 09:57:16 PM

!Czealot,

Quote???

Question marks usually go after a sentence, not before. For example, what is your question? Ratch

daydreamer · March 17, 2006, 05:47:00 AM

Quote from: Ratch on March 16, 2006, 09:57:16 PM
!Czealot,

Quote???

Question marks usually go after a sentence, not before. For example, what is your question? Ratch

only innerloop that fits in cache+recursion faster than innerloop+several outerLoops ??? (doesnt fit into title, it gets too long)

Tedd · March 17, 2006, 11:53:06 AM

only innerloop that fits in cache+recursion faster than innerloop+several outerLoops ???

Translation:
which of the following would be faster:
- an inner-loop that fits into cache, called recursively;
- or a (possibly too large to fit in cache) inner-loop, with serveral outer-loops?

My guess would be the first. But it's just a guess :lol

Mincho Georgiev · March 17, 2006, 02:01:18 PM

Tedd, it is not necessarily for recursion to resolve the problem. Let's just remember for moment Fibona4i. The recursion for calculating Fib's number is infelicity choosen. /Except maybe on Moaver's Formula/
Depending of the exact alogrithm, itterative method may be better, but that's DEPENDING of the algo.
Maybe posting a piece your code is a good idea ,!Czealot.

Tedd · March 17, 2006, 06:19:45 PM

Shaka: you're right, a faster algorithm will usually beat any type of optimization.
But, assuming the algorithm stays (almost) the same, keeping code in cache should cause it to be more efficient than constantly swapping.

P1 · March 17, 2006, 06:44:11 PM

I believe you have gotten into cache issues with an uP. At 32 bytes ( size of a cache line, depends upon uP ), on an alignment boundary, is quicker to execute.

Regards, P1 :8)

News:

only innerloop that fits in cache+recursion faster than innerloop+several outerL

daydreamer

Ratch

daydreamer

Tedd

Mincho Georgiev

Tedd

P1