The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: loki_dre on April 17, 2008, 06:16:57 AM

Title: math with blocks of memory
Post by: loki_dre on April 17, 2008, 06:16:57 AM
is there a fast way to do math with 2 blocks of memory without creating a for loop?
When I say fast I mean fast interms of processing time.

eg.    a[0 to 10]=a[0 to 10]+b[0 to 10]
         a[0 to 10]=a[0 to 10]-b[0 to 10]
         a[0 to 10]=a[0 to 10]*b[0 to 10]
         a[0 to 10]=a[0 to 10]/b[0 to 10]
         a[0 to 10]=a[0 to 10]>b[0 to 10]
         a[0 to 10]=a[0 to 10]>=b[0 to 10]
         a[0 to 10]=a[0 to 10]<b[0 to 10]
         a[0 to 10]=a[0 to 10]<=b[0 to 10]
Title: Re: math with blocks of memory
Post by: hutch-- on April 17, 2008, 07:14:48 AM
loki,

It is not the loop code that determines the speed of calculations like this, its the speed of the memory access that imposes the speed limit. Just code an efficient loop for the operations and it will be as fast as it can be. The ADD and SUB operations are reasonably fast but MUL and DIV are slower.
Title: Re: math with blocks of memory
Post by: zooba on April 17, 2008, 10:34:53 AM
Quote from: hutch-- on April 17, 2008, 07:14:48 AM
The ADD and SUB operations are reasonably fast but MUL and DIV are slower.

Just to blow this out of the water somewhat, I recently did some basic benchmarking on a range of Core 2 processors and found that multiplication takes roughly as long as addition or subtraction (both integer, floating point and SSE). Division/modulus typically takes 4 times as long as the other operations. For earlier processors, certainly multiplication is slower, but they've finally reached parity for it. (I believe AMD has been there longer than Intel, but can't confirm.)

As to the original question, I suggest you read http://www.mark.masmcode.com/. It has enough ideas to help you out here. The short answer is yes, it can be done without a loop. The longer answer is that you have to write a lot of code for large blocks and the gain is minimal. A simple unrolled loop using SSE instructions will give you the best performance, and well beyond what a C/C++ compiler can give.

Cheers,

Zooba :U
Title: Re: math with blocks of memory
Post by: loki_dre on April 17, 2008, 03:49:15 PM
do you know why the processing takes longer in C++?
Title: Re: math with blocks of memory
Post by: u on April 17, 2008, 06:10:00 PM
Quote from: zooba on April 17, 2008, 10:34:53 AM
found that multiplication takes roughly as long as addition or subtraction (both integer, floating point and SSE).
Then that's some slow-ass add/sub performance :P. (no, really).
On AMD cpus multiplication is 3 times slower than add/sub/xor/and/or/... You can chain these simple ops in such a way, that a 1,8GHz cpu will perform as an 11GHz P4.

Quote from: loki_dre
do you know why the processing takes longer in C++?
Depends on which compiler and what optimization settings you've set. Compilers know only a subset of optimization patterns and can't compose new ones, unlike a good asm coder.
Title: Re: math with blocks of memory
Post by: loki_dre on April 17, 2008, 06:19:40 PM
hmmmmmm.....
So, anyone know what is the fastest processor is to do a lot of math with?
Title: Re: math with blocks of memory
Post by: u on April 17, 2008, 09:44:16 PM
Cell BE, judging from Folding@Home.
Title: Re: math with blocks of memory
Post by: zooba on April 17, 2008, 09:48:13 PM
Quote from: Ultrano on April 17, 2008, 06:10:00 PM
Then that's some slow-ass add/sub performance :P. (no, really).

Probably. The test was set up to avoid pairing, so that may be influencing the speed of the add/sub operations.

I am yet to see a C++ compiler produce a loop using SSE packed singles. Having said that I haven't looked since last year and there's new versions of stuff out now.

Quote from: Ultrano on April 17, 2008, 09:44:16 PM
Cell BE, judging from Folding@Home.

I agree. An Intel-based processor is not going to give the best processing performance. I'm pretty sure that within the Intel/AMD world it's largely irrelevant anyway, except for large steps in processor speed (500MHz-700MHz intervals).

Cheers,

Zooba :U
Title: Re: math with blocks of memory
Post by: hutch-- on April 18, 2008, 12:36:52 AM
It would be interesting to see if the Core 2 Duo range are actually faster with DIV and MUL relative to earlier processors. When Intel introduced the PIV it did some stuff better like pairing suitable instructions but was noticable slower for a given clock count on a number of comonly used iteger instructions like SHL SHR and they killed LEA as a fast method as well.

With very late SSE it would be very useful if they have build enough capacity into critical maths instruction rather than just dumping so much of it off to microcode.