I wrote this one last year when I was working on sorting algorithms. There is a mountain of old C junk awash on the internet and while a reasonable amount of it is crap code, you often get good algorithm design from some of the examples. After testing a heap of "shell" sorts, the fastest I could find was a design by Janet Incerpi in conjunction with Robert Sedgewick.
The tutorial uses the design as a reference and built the C code using VC6 CL.EXE with the maximum of optimisation turned on. A completely unoptimised version was built as a basis for manually optimising the ASM output from VC6 and a manually optimised version is included to demonstrate what can be done in a hour or so of optimisation using MASM.
Particularly at an algorithm level writing reusable code for libraries, the time involved in manually optimising C code at the asm level is worth the effort to gain additional performance not possible from a compiler. The example is classic 486 compatible code using integer registers only, no MMX or SSE(2) and will run on anything from a 486 upwards.
This code was optimised on my old 1.5gig PIV so the timings on the attached text file will be slow in comparison to a later box. I did rebuild the C code using the version of CL.EXE from the VCTOOLKIT but it was no faster than the code created by the VC6 version of CL.EXE .
The timings I get on the current 2.8 gig PIV are as follows.
Optimised C = 266 to 282 MS
Unoptimised C = 454 to 469 MS
Optimised asm = 234 to 250 MS
If anything, the sample is a bit small for a later box and needs to be enlarged to get longer and more reliable timings.
[attachment deleted by admin]
Hutch,
Very nice presentation.
Paul
Very nice to know, if I ever make anything worth optimizing. :)
I like the basic technique and better still, its easy and useful practice at manually optimising assembler code. When I get a bit more time, I have a nice quick sort which was one of Robert Sedgewick's designs that I found written in Pascal dating from about 1988, ported it to basic, ported it from basic to C and converted it to masm with CL.EXE.
Its a good tchnique to this extent that it fully seperates algo design from code optimisation and as long as you have a test piece to make sure you did not mess aything up on the way, it allows you to concentrate on the actual GO FAST code details.
The punchline is that there is a mountain of old C code awash on the internet with a lot of the better stuff dating from before ANSI 89 that performs well when you test it in C with the optimisation maximised and this is a reasonably good test as to whether it will manually optimise well. It gives the assembler programmer the capacity to use a large body of mainstream code and still bash it into shape in assembler performance terms.