SSE2: pxor vs subps

dioxin · July 25, 2010, 11:21:17 PM

Quotenot that I really believed in align 16, but many here do

You don't believe in aligning code?
But it makes a big difference if you align it. I get the following results if I sprinkle a few ALIGN 16s in the code:

Code Select

 5748 ms for psubb
 5729 ms for psubq
 5730 ms for xorps
 5730 ms for xorpd
 5729 ms for pxor
 5990 ms for subps

That's much better than the original.

jj2007 · July 26, 2010, 06:49:50 AM

Putting align 16 in front of the innermost loop has absolutely NO effect on my Celeron M... but it's true that there are more sensitive CPUs around.

Rockoon · July 28, 2010, 12:52:22 AM

Branch target aligning is not so effective on Intel (currently) due to the trace cache, but it doesnt usually hurt to align anyways.

News:

SSE2: pxor vs subps

dioxin

jj2007

Rockoon