???
I have an algo that is almost the same code for inner and 2 outerloops
if I make it fit in 32byte and align it, can it be faster than a 3 times bigger code
I am reusing the same regs+push/pop anyway for outerloops
also how much do I lose on PIV, which has slow shifts
if I should choose between penalty for partial register mov dl,ah vs sar eax,8+add edx,eax
the code is having full 32bit reg operations earlier in loop
!Czealot,
Quote???
Question marks usually go after a sentence, not before. For example, what is your question? Ratch
Quote from: Ratch on March 16, 2006, 09:57:16 PM
!Czealot,
Quote???
Question marks usually go after a sentence, not before. For example, what is your question? Ratch
only innerloop that fits in cache+recursion faster than innerloop+several outerLoops ??? (doesnt fit into title, it gets too long)
only innerloop that fits in cache+recursion faster than innerloop+several outerLoops ???
Translation:
which of the following would be faster:
- an inner-loop that fits into cache, called recursively;
- or a (possibly too large to fit in cache) inner-loop, with serveral outer-loops?
My guess would be the first. But it's just a guess :lol
Tedd, it is not necessarily for recursion to resolve the problem. Let's just remember for moment Fibona4i. The recursion for calculating Fib's number is infelicity choosen. /Except maybe on Moaver's Formula/
Depending of the exact alogrithm, itterative method may be better, but that's DEPENDING of the algo.
Maybe posting a piece your code is a good idea ,!Czealot.
Shaka: you're right, a faster algorithm will usually beat any type of optimization.
But, assuming the algorithm stays (almost) the same, keeping code in cache should cause it to be more efficient than constantly swapping.
I believe you have gotten into cache issues with an uP. At 32 bytes ( size of a cache line, depends upon uP ), on an alignment boundary, is quicker to execute.
Regards, P1 :8)