The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: GregL on February 28, 2005, 12:53:18 AM

Title: Sine in SSE2 vs. FSIN
Post by: GregL on February 28, 2005, 12:53:18 AM
I have been reading about Windows XP x64 Edition. x87 FPU and MMX instructions are not supported, SSE/SSE2 only.

Question:  Could a sine function written for SSE2 equal or outperform FSIN on a Pentium 4?

I originally thought it would be considerably slower, but now I am not sure. I would try to figure this out myself but I have a Pentium III (no SSE2).

Title: Re: Sine in SSE2 vs. FSIN
Post by: GregL on February 28, 2005, 06:13:13 AM
I found the answer. I looked at the Intel 'Approximate Math Library'. It does sine, cosine, tangent etc. in SSE or SSE2 code faster than the equivalent FPU instructions. The accuracy is a little less but it is is better than what is achievable with lookup tables. I'll be darned. :red :dazzled:

You can find it here:
http://www.intel.com/design/pentiumiii/devtools/AMaths.zip

(updated the link)
Title: Re: Sine in SSE2 vs. FSIN
Post by: Ratch on February 28, 2005, 06:00:43 PM
Greg,

Have you checked out the method from this site? Ratch

http://www.bmath.net/bmath/halfstaff.html
Title: Re: Sine in SSE2 vs. FSIN
Post by: GregL on February 28, 2005, 07:07:08 PM
Ratch,

Thanks for the link, I'll check it out.

Title: Re: Sine in SSE2 vs. FSIN
Post by: GregL on September 18, 2007, 09:15:39 PM
QuoteI have been reading about Windows XP x64 Edition. x87 FPU and MMX instructions are not supported, SSE/SSE2 only.

I know this is an old post but this is not true. See this (http://www.masm32.com/board/index.php?topic=4243.msg33220#msg33220) post.
Title: Re: Sine in SSE2 vs. FSIN
Post by: daydreamer on September 19, 2007, 08:21:03 AM
sine and cosine can be calculated by taylorseries, take a look and you see its highly parallelizable to perform all these in parallel and perform a final adds and subs
if you really want performance write a SSEsine that calculcates several sine in parallel and unroll it as long as you have free xmm regs