I ran across these SSE3 macros (http://www.intel.com/cd/ids/developer/asmo-na/eng/167741.htm?page=1) for ml 6.15 and 7.x. I believe ml 8.0 has SSE3 support built in. See 'Appendix A' for the actual macros.
I see Agner Fog's macros (http://www.agner.org/optimize/#macros) do SSE3 now too.
I just got a chip that will do this stuff so I am playing catch-up here. :bg
FISTTP looks really useful.