The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: ecube on April 18, 2009, 12:13:52 AM

Title: What's the deal with SSE5
Post by: ecube on April 18, 2009, 12:13:52 AM
http://en.wikipedia.org/wiki/SSE5

I don't get why AMD calls it sse5 when it doesn't support all of intel's sse4. They should just give it a unique name to avoid confusion.

atleast

"Intel's pre-Nehalem cores contain only a partial implementation of SSE4, called SSE4.1. This poses some difficulty and extra work for compilers and assembly-level hand tuning of code."

makes some sense.

Also does anyone have a amd processor capable of testing sse5 instructions?
Title: Re: What's the deal with SSE5
Post by: KeepingRealBusy on April 20, 2009, 02:53:44 AM
Give me a sse5 instruction and I'll check my AMD spec to see If my cpu will execute it. I won't be able to get back to you until Wednesday.
Dave.
Title: Re: What's the deal with SSE5
Post by: ecube on April 20, 2009, 05:01:39 AM

FMADDPS – Multiply and add packed single precision floating point instruction
One of the typical operations computed in transformations such as DFT of FFT is of the form

Let f(n) and x(n) be two source buffers, for example src1 and src2, and let p be the destination to accumulate the results. All the buffers in the discussion are of floating point type. The implementation in plain C for N = 4(128 bits) is as follows:

for(int i =0; i< 4; i++)
     {
     p = p + src1[i] * src2[i];
     }

The code generated in x86 instructions per iteration is as follows:

     //src1 is on the top of the stack; src1 = src1 * src2
     fmul DWORD PTR _src2$[esp+148]
     //p = ST(1), src1 = ST(0); ST(1) = ST(0)+ST(1);ST-Stack Top
     faddp ST(1), ST(0)

The total number of instructions generated for 4 iterations= 2 * 4 = 8.

The above calculations in SSE2 instructions are as follows:

     //xmm0 = p, xmm1 = src1, xmm2 = src2
      mulps xmm1, xmm2
      addps xmm0, xmm1

However, the SSE5 instruction accomplishes the same computation in a single instruction:

     //xmm0 = p, xmm1 = src1, xmm2 = src2
fmaddps xmm0, xmm1, xmm2, xmm0


so the fmaddps xmm0, xmm1, xmm2, xmm0
Title: Re: What's the deal with SSE5
Post by: KeepingRealBusy on April 21, 2009, 11:53:56 PM
E^cube

QuoteAlso does anyone have a amd processor capable of testing sse5 instructions?

Since it won't be in production until 2011, I don't think so.

Dave