How is this? Can it go any faster?
[attachment deleted by admin]
Aero,
It works OK but I wonder why you bother. There is a trick to make at least some API calls faster and that is to copy them from the resident system DLL into local memory and execute them from local memory. Use the address from GetProcAddress and make a generous enough guess at how big the algo will be that allocate enough memory to place it in.
It used to help with some GDI functions that were being hammered fast.
One time just for fun I wrote code to make a faster exit process.
I made the code section writable and patch all the API calls which are FFxxxx to E8-xxxx.
As long as you can get an entry point there should be a way to at runtime make all external calls E8 calls I think.