News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

What's the deal with SSE5

Started by ecube, April 18, 2009, 12:13:52 AM

Previous topic - Next topic

ecube

http://en.wikipedia.org/wiki/SSE5

I don't get why AMD calls it sse5 when it doesn't support all of intel's sse4. They should just give it a unique name to avoid confusion.

atleast

"Intel's pre-Nehalem cores contain only a partial implementation of SSE4, called SSE4.1. This poses some difficulty and extra work for compilers and assembly-level hand tuning of code."

makes some sense.

Also does anyone have a amd processor capable of testing sse5 instructions?

KeepingRealBusy

Give me a sse5 instruction and I'll check my AMD spec to see If my cpu will execute it. I won't be able to get back to you until Wednesday.
Dave.

ecube


FMADDPS – Multiply and add packed single precision floating point instruction
One of the typical operations computed in transformations such as DFT of FFT is of the form

Let f(n) and x(n) be two source buffers, for example src1 and src2, and let p be the destination to accumulate the results. All the buffers in the discussion are of floating point type. The implementation in plain C for N = 4(128 bits) is as follows:

for(int i =0; i< 4; i++)
     {
     p = p + src1[i] * src2[i];
     }

The code generated in x86 instructions per iteration is as follows:

     //src1 is on the top of the stack; src1 = src1 * src2
     fmul DWORD PTR _src2$[esp+148]
     //p = ST(1), src1 = ST(0); ST(1) = ST(0)+ST(1);ST-Stack Top
     faddp ST(1), ST(0)

The total number of instructions generated for 4 iterations= 2 * 4 = 8.

The above calculations in SSE2 instructions are as follows:

     //xmm0 = p, xmm1 = src1, xmm2 = src2
      mulps xmm1, xmm2
      addps xmm0, xmm1

However, the SSE5 instruction accomplishes the same computation in a single instruction:

     //xmm0 = p, xmm1 = src1, xmm2 = src2
fmaddps xmm0, xmm1, xmm2, xmm0


so the fmaddps xmm0, xmm1, xmm2, xmm0

KeepingRealBusy

E^cube

QuoteAlso does anyone have a amd processor capable of testing sse5 instructions?

Since it won't be in production until 2011, I don't think so.

Dave