Lets talk optimization

ecube · November 20, 2006, 01:45:47 AM

Well its clear whats optimized on one processor may be slower than another, so its kind of a losing battle, but not really if you're willing to sacrifice size for speed. I recently took a look at amds highly optimized strlen alg(I forget the link)which showd it had 2 versions in one that it called depending on the strings length. So with that approach in mind my suggestion is we use this great code from MichaelW http://www.masm32.com/board/index.php?topic=1909.0 or something similiar, to see what cpu we're dealing with , what instructions it supports and use the more optimized code accordingly. If the cpu supports sse2 and its faster than alternatives than use that, etc etc...This approach obviously isn't practical for everything being that it can be 4 or more times the code, but I think with things such as massive applications, like games, this is great because you can really take advantage of the users hardware rather than just using the standard code on all processors.

zooba · November 20, 2006, 06:05:04 AM

I've considered before trying to make a 'standard' library which will automatically detect the features of a processor and use the appropriate function. As you mention, though, the size is an issue. Generally you have to sacrifice size for speed anyway, though a lot of assembly programmers want both at once :toothy (on the bright side, both is quite possible when comparing to C or similar).

For a large project though, selecting the most efficient routines (at startup obviously and store a pointer for the routine) based on the capabilities could present some real possibilities for fast (though large) code.

hutch-- · November 20, 2006, 11:35:29 AM

Writing mixed processor code is always a joy to behold but there is no neutral position, nothing does it well automatically and here I am particularly commenting about compilers. On average Intel hardware is relatively similar if its of similar age where AMD hardware of similar age can differ a lot from Intel hardware and what you must do is test on both and compromise between the two to get the best average.

With the coming 64 bit era, much of this will be simplified as the base line instruction set if far more advanced than the average on 32 bit with hardware up to 10 years old. Its not unknown on critical code to have different algos for different hardware, 3D now for AMD, SSE3 on Intel etc .... and if there are not all that many critical routines in the APP its no big deal to cater for a number of variations but I agree with Zooba here, at a full library level it just gets too big.

In what is probably a more data oriented than games oriented approach, you can sacrifice size for speed using tables with certain tasks and this is not particularly processor specific but it will run on anything from a 486 upwards.

TNick · November 21, 2006, 12:57:03 PM

This may be a silly question, but why can't we build our program on target machine, during install? This way we can find out what hardware is there and assamble the app acording. Because ml.exe and link32.exe are free, this would not be a problem.

???

Nick

ecube · November 22, 2006, 04:42:10 AM

You could, but with something like commerical software you probably wouldn't want the user to have the source code, and I don't really see the benefits of compiling on each system.

Mark_Larson · November 22, 2006, 04:43:51 PM

Quote from: E^cube on November 22, 2006, 04:42:10 AMI don't really see the benefits of compiling on each system.

I think he meant having the compiler compile which code was the fastest based on processor type.

daydreamer · November 22, 2006, 07:47:12 PM

Quote from: TNick on November 21, 2006, 12:57:03 PM
This may be a silly question, but why can't we build our program on target machine, during install? This way we can find out what hardware is there and assamble the app acording. Because ml.exe and link32.exe are free, this would not be a problem.

???

Nick

isnt the closest term Just-In-Time compilation you talk about?, you could even testrun and benchmark different snippets before writing out a full application from the fastest snippets, could be good for newer cpu's you didnt thought of your code should run on
E^cube :just look at this experimentating with MMX/SSE2 code runs on different on different cpus, all you need to change is pointer advancement and counter and it works on both SSE2 caps cpus and none SSE2 caps it runs as MMX code which chunk only 8bytes at a time, instead of SSE2's 16byte
http://www.masm32.com/board/index.php?topic=4910.msg36772#msg36772

donkey · November 23, 2006, 02:21:43 AM

Quote from: E^cube on November 22, 2006, 04:42:10 AM
You could, but with something like commerical software you probably wouldn't want the user to have the source code, and I don't really see the benefits of compiling on each system.

If you have written the software, a good installer could very easily simply use OBJ files to build the executable. It would make a list of files to be linked and send it to the linker, this would have the advantage that you would not be required to send the whole MASM32 package or include a small set of equates. You could also assemble everything then patch the calls/jmps in the program with an external optimizing program at install time based on the optimal execution path. Adobe Acrobat optimizes the program during the install, never looked at the optimizer program but I have always been curious about it.

News:

Lets talk optimization

ecube

zooba

hutch--

TNick

ecube

Mark_Larson

daydreamer

donkey