News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Speed is in the mind of the beholder ?

Started by Draakie, September 12, 2006, 06:31:53 AM

Previous topic - Next topic

Draakie

Hiya all - I'am probably gonna stir up a hornets
nest - but just a question............................ :eek


Is openGL's rotation and transformation code faster than a handwritten SSE,SSE2,MMX assembler version of the same code. In my mind if the hardware is doing the dirty work - with it's own
memory to boot - it should be much faster..... Although I heard thru the grape-vine and having a look
at Intel's approximate Math lib - this could be a misconception. Please set me straight.

Draakie.
Does this code make me look bloated ? (wink)

P1

Not enough information and too many unaccounted for variables to even discuss this question.   :boohoo:

Regards,  P1  :8)

Draakie

OK - OK ...... :bdg

Seems P1 want's details and stuff. Well lets put it this way....

Would it be a viable experiment to even consider hand-writing rotation and transformation
code - rather than using OpenGL 's built-in API calls - for reasons of gaining speedups.

Yes - I understand this a bit of an abstract question and more data is required
ex. given good optimization skills ?, large 3D model/world ?, data storage ? etc.
What I was hoping for was that a MASM32 member might have given this thought
before and come to some sort of conclusion as to the ratio's of stupidity vs brilliance.

And yes P1 - I just put the same question a diffirent way - and I'am hoping to get
away with it :lol

Thanx again....

Does this code make me look bloated ? (wink)

drhowarddrfine

Any properly thought out assembly code will alwaysbe faster, or sometimes just as fast, as any higher level language anything.

Rockoon

Quote from: drhowarddrfine on September 13, 2006, 12:12:43 PM
Any properly thought out assembly code will alwaysbe faster, or sometimes just as fast, as any higher level language anything.

Unless the ASM code is run on one processor and the HLL code is run on a different one ..

You are being way too simplistic.

The problem with the OP's question is that he is trying to compare apples with applesauce and oranges with orange juice.

If the vectors needing transformation are in system ram (apples) then "properly through out" apple sauce is best. However, if they are in video ram (oranges), then orange juice is going to be better, regardless of what language the video cards code is written in.

Still further, you may have oranges when you would prefer apples, or vise-versa. If you are CPU limited then you want oranges, if you are GPU limited you want apples. You won't know until you profile your production code. However in my experience, the modern GPU is almost never limited by transforms... They slam against fill rate restrictions long before that... so transforms on the GPU usualy end up being FREE.
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

P1

Givens:  Testing occurs in the exact same hardware and OS environment.  Note: That it's understood that same graphic constraints apply.

It boils down to HLL overhead.  When the hand coded assembly approaches the same universal application of the HLL ( doing everything the other can do ) the percentage of speed difference drops to the amount of overhead the HLL injects into it's own processing of the work done, assuming the hand coded assembly does not do anything more than getting the task done, which amounts to assembler overhead in processing.

Regards,  P1   :8)