News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Optimization source sites

Started by zemtex, June 11, 2011, 11:01:40 PM

Previous topic - Next topic

zemtex

I am looking for optimization data. I already have intel and amd's optimization manuals, I am also aware of optimization tips by Mark Larson. Do anyone know of very good sources for optimization, preferably extensive documentation about parallelism.

Thanks to anyone who can provide strong sources.
I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.

dedndave

Agner Fog   :U

http://www.agner.org/optimize/

zemtex

Wonderful.

Do you have more sources for µops, IFETCH and the different decoding pipes I would like some sources on that. I have some here and it is very good, I would like some sources that goes into details on this. Preferably on the sandy bridge architecture.
I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.

hutch--

zemtex,

Some of this stuff you will only quantify it well by writing some test piece benchmarks and looking at the timings. The later Core2 and i3/5/7 series cores are generally easier to optimise for than the earlier PIV series and if parallelism means multicore operations you will need to take special notice of the OS level task switching and core operation duration. Very short thread operations per core are expensive in terms of task switching where longer duration threads start to give you some good parallel performance. Also take note of the number of cores you have to work with as starting many more threads than cores increases your tasik switching overhead.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

zemtex

In my own experience if you code games you should probably use fewer threads (with HTT you can run two threads per physical core and one thread per logical core), in games you don't use alot of i/o operations. I/O operations is a primary source why threads are slowed down, typically in such a program more threads can produce faster execution, the non-i/o threads will not halt, while the i/o threads will. Games usually don't use much i/o operations so they benefit fewer threads while general windows applications probably and in many cases run better with tens of threads.
I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.

clive

Well it clearly depends. Windows itself has a lot of threads/processes running. The bottle necks come if your/their threads are all interdependent on each other. Or chew up time looking for Sun/Adobe updates, etc. If an operation lends itself to independent parallel execution use multiple threads. If you've just got a serialized sequence there is less point, unless you can pipeline things, where a thread goes and prefetches the next batch of data, while a second/third digest or process it. I'd use a chain-gang, bucket-brigade, type description for that, and is analogous to the way the hardware appears to be clocking at stupendous rates. ie throughput appears to be 1 cycle, but has significant real latency.

Intel used to publish books, white papers, and have developer conferences which generated the type of material you are looking for. It's been a while since I've been heavily interested in what Intel/Microsoft are up to, but I suspect a quick google on topics like SIMD and encryption optimization might bring some things up.

Intel also used to add a lot of parallization(sic) technology into their C and Fortran compilers, recognizing math/array constructs that would map effectively into the MMX/SSE SIMD models.

http://www.intel.com/idf/
It could be a random act of randomness. Those happen a lot as well.

zemtex

That is excellent clive I will look into the developer pages, there is probably a ton of information to consume there. I'm not developing anything, I am merely feeding my curios mind when I have nothing better to do.  :lol
I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.