News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

SSE2: pxor vs subps

Started by jj2007, July 24, 2010, 08:55:43 PM

Previous topic - Next topic

dioxin

Quotenot that I really believed in align 16, but many here do
You don't believe in aligning code?
But it makes a big difference if you align it. I get the following results if I sprinkle a few ALIGN 16s in the code:
5748 ms for psubb
5729 ms for psubq
5730 ms for xorps
5730 ms for xorpd
5729 ms for pxor
5990 ms for subps

That's much better than the original.

jj2007

Putting align 16 in front of the innermost loop has absolutely NO effect on my Celeron M... but it's true that there are more sensitive CPUs around.

Rockoon

Branch target aligning is not so effective on Intel (currently) due to the trace cache, but it doesnt usually hurt to align anyways.

When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.