SSE floating point Math instructions
except SSE versions of the usual ADD,SUB,MUL,DIV,SQRT
we have RCP which returns reciprocal (1/x) of operand
and RSQRT which returns 1/sqrt(x)
MIN returns the smallest number of both operands
MAX returns the largest number of both operands
there are two types of instructions:
1:instructions for only doing the operation in Single low FP which has SS in its name
and second type for doing 4 parallel(packed) math op's on all 4 FP's which have PS in its name
single op on lowest Single FP | packed(parallel) op on all 4 Single FP |
ADDSS xmm1,xmm2/mem32 ; F3 0F 58 /r | ADDPS xmm1,xmm2/mem128 ; 0F 58 /r |
SUBSS xmm1,xmm2/m128 ; F3 0F 5C /r | SUBPS xmm1,xmm2/m128 ; 0F 5C /r |
MULSS xmm1,xmm2/mem32 ; F3 0F 59 /r | MULPS xmm1,xmm2/mem128 ; 0F 59 /r |
DIVSS xmm1,xmm2/mem32 ; F3 0F 5E /r | DIVPS xmm1,xmm2/mem128 ; 0F 5E /r |
RCPSS xmm1,xmm2/m128 ; F3 0F 53 /r * | RCPPS xmm1,xmm2/m128 ; 0F 53 /r * |
RSQRTSS xmm1,xmm2/m128 ; F3 0F 52 /r ** | RSQRTPS xmm1,xmm2/m128 ; 0F 52 /r ** |
SQRTSS xmm1,xmm2/m128 ; F3 0F 51 /r square root | SQRTPS xmm1,xmm2/m128 ; 0F 51 /r |
MAXSS xmm1,xmm2/m32 | MAXPS xmm1,xmm2/m128 |
MINSS xmm1,xmm2/m32 | MINPS xmm1,xmm2/m128 |
*performs a fast reciprocal 1/x from a internal lookup table which has only has 12 bits precision on the mantissa compared to normal 24bits precision, so you can replace slow DIV with MUL with the reciprocal, if you need fast DIV but dont need precision
instead of
DIVPS xmm1,xmm2 ;slow division
you code
RCPPS xmm2,xmm2 ;a fast lookup of the reciprocal of the divider
MULPS xmm1,xmm2 ;and mul with reciprocal that is lot faster than div
** performs reciprocal of square root 1/sqrt(x) with the same lookup table as RCP uses
SSE2 instructions
for single op's on lower double precision float which have SD in its name
for parallel op's on two double precision floats which have PD in its name
Single op on low Double FP | packed (parallel) op on both Double FP |
ADDSD | ADDPD |
SUBSD | SUBPD |
MULSD | MULPD |
DIVSD | DIVPD |
RCPSD | RCPPD |
RSQRTSD | RSQRTPD |
SQRTSD | SQRTPD |
MAXSD xmm1,xmm2/m64 | MAXPD xmm1,xmm2/m128 |
MINSD xmm1,xmm2/m64 | MINPD xmm1,xmm2/m128 |