Print Page - only innerloop that fits in cache+recursion faster than innerloop+several outerL

Title: only innerloop that fits in cache+recursion faster than innerloop+several outerL
Post by: daydreamer on March 16, 2006, 08:05:14 AM

???
I have an algo that is almost the same code for inner and 2 outerloops
if I make it fit in 32byte and align it, can it be faster than a 3 times bigger code
I am reusing the same regs+push/pop anyway for outerloops
also how much do I lose on PIV, which has slow shifts
if I should choose between penalty for partial register mov dl,ah vs sar eax,8+add edx,eax
the code is having full 32bit reg operations earlier in loop

Title: Re: only innerloop that fits in cache+recursion faster than innerloop+several outerL
Post by: Ratch on March 16, 2006, 09:57:16 PM

!Czealot,

Quote???

Question marks usually go after a sentence, not before. For example, what is your question? Ratch

Title: Re: only innerloop that fits in cache+recursion faster than innerloop+several ou
Post by: daydreamer on March 17, 2006, 05:47:00 AM

Quote from: Ratch on March 16, 2006, 09:57:16 PM
!Czealot,

Quote???

Question marks usually go after a sentence, not before. For example, what is your question? Ratch

only innerloop that fits in cache+recursion faster than innerloop+several outerLoops ??? (doesnt fit into title, it gets too long)

Title: Re: only innerloop that fits in cache+recursion faster than innerloop+several ou
Post by: Tedd on March 17, 2006, 11:53:06 AM

only innerloop that fits in cache+recursion faster than innerloop+several outerLoops ???

Translation:
which of the following would be faster:
- an inner-loop that fits into cache, called recursively;
- or a (possibly too large to fit in cache) inner-loop, with serveral outer-loops?

My guess would be the first. But it's just a guess :lol

Title: Re: only innerloop that fits in cache+recursion faster than innerloop+several ou
Post by: Mincho Georgiev on March 17, 2006, 02:01:18 PM

Tedd, it is not necessarily for recursion to resolve the problem. Let's just remember for moment Fibona4i. The recursion for calculating Fib's number is infelicity choosen. /Except maybe on Moaver's Formula/
Depending of the exact alogrithm, itterative method may be better, but that's DEPENDING of the algo.
Maybe posting a piece your code is a good idea ,!Czealot.

Title: Re: only innerloop that fits in cache+recursion faster than innerloop+several ou
Post by: Tedd on March 17, 2006, 06:19:45 PM

Shaka: you're right, a faster algorithm will usually beat any type of optimization.
But, assuming the algorithm stays (almost) the same, keeping code in cache should cause it to be more efficient than constantly swapping.

Title: Re: only innerloop that fits in cache+recursion faster than innerloop+several outerL
Post by: P1 on March 17, 2006, 06:44:11 PM

I believe you have gotten into cache issues with an uP. At 32 bytes ( size of a cache line, depends upon uP ), on an alignment boundary, is quicker to execute.

Regards, P1 :8)

The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: daydreamer on March 16, 2006, 08:05:14 AM