I have been reading about Windows XP x64 Edition. x87 FPU and MMX instructions are not supported, SSE/SSE2 only.
Question: Could a sine function written for SSE2 equal or outperform FSIN on a Pentium 4?
I originally thought it would be considerably slower, but now I am not sure. I would try to figure this out myself but I have a Pentium III (no SSE2).
I found the answer. I looked at the Intel 'Approximate Math Library'. It does sine, cosine, tangent etc. in SSE or SSE2 code faster than the equivalent FPU instructions. The accuracy is a little less but it is is better than what is achievable with lookup tables. I'll be darned. :red :dazzled:
You can find it here:
http://www.intel.com/design/pentiumiii/devtools/AMaths.zip
(updated the link)
Greg,
Have you checked out the method from this site? Ratch
http://www.bmath.net/bmath/halfstaff.html
Ratch,
Thanks for the link, I'll check it out.
QuoteI have been reading about Windows XP x64 Edition. x87 FPU and MMX instructions are not supported, SSE/SSE2 only.
I know this is an old post but this is
not true. See this (http://www.masm32.com/board/index.php?topic=4243.msg33220#msg33220) post.
sine and cosine can be calculated by taylorseries, take a look and you see its highly parallelizable to perform all these in parallel and perform a final adds and subs
if you really want performance write a SSEsine that calculcates several sine in parallel and unroll it as long as you have free xmm regs