News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Number Crunching Processor 30%

Started by AgentSmithers, May 27, 2009, 02:37:52 AM

Previous topic - Next topic

AgentSmithers

I did a simple code with defining a DD. Then starting it off at value 0 then Inc'ing it in a loop forever. Now I know its only counting so High so fast, but when I check my processor its not using 100% of the CPU, So its possible to make the Rocket hit the roof faster?

I made it so after it hits a value of 10, 100, 1000, 10,000 it would Cout and I can see it gets to about 1000000, in about a couple of seconds... so is my next step to make it Multithreaded?

I'm running a AMD 9500 Quadcore Phenom.

Thanks!

-Agent

mitchi

Yes. Using threads is the only way you can go higher. Use the CreateThread API.
Here you go :
http://msdn.microsoft.com/en-us/library/ms682453(VS.85).aspx

It's going to be something like this in your case:
invoke CreateThread, 0 , 0, addr myLoopLabel, 0, 0, 0

dedndave


Jimg

    invoke SetPriorityClass, CurProcess, REALTIME_PRIORITY_CLASS
    invoke SetProcessPriorityBoost, CurProcess, FALSE   ; normal behavior
    invoke SetThreadPriority, CurProcess, THREAD_PRIORITY_TIME_CRITICAL

AgentSmithers

Ah! Now is it normal that a Thread can only go so many Clock Cycles per Sec?
Is it due to me having Multiple Cores?
How many Threads do I need to Spawn to be to the point before Two many are spawned and performance is lost?

Mirno

The priority of the thread doesn't limit it in any way, it makes other threads more or less likely to get the next time slice.

All processes running on the OS, and awaiting processor time (not sleep()ing, or waiting for another process or IO) have a score associated with them. This at any given point the processor switches tasks, the OS will switch it to the one with the highest score - after running for a time the process will have it's score reduced (based on it's priority class). If a process is not chosen it's score is increased (based on it's priority class).

This means that two threads of the same priority should switch fairly regularly, so neither takes over.
It also means that a high and a low priority thread will both run, but the low will run much less often.

There are also other things involved (such as how long a thread gets to run, and other things), but this is the basic idea of all priority systems.

On a dual core system, then there are two processors which can do the work, so the check happens more often, but the same idea applies. Although the list of tasks may be modified by a processes processor affinity (telling the process to only run on processor X,Y, and Z - not A, B, and C).

If you set task manager to have 1 graph per CPU, and force the processor affinity mask to one processor, then you should see that processor max out. If you don't set the affinity mask, then you should see random processor cores max out while they run the task.

Also be aware that putting the text to the console will severly limit your count! Each execution of your printing code will have a noteable effect on the time it takes. As you are only printing on certain markers this should not be an issue.

Also note that multi-threading will not help you in this, as the memory location will be locked by one processor while it increments, and the cost of locking & unlocking will be greater than running single threaded.
Having said that, having four threads, each with their own count, and a master 5th thread that locks them occasionally gets the value, and sums them, would work, and be approximately 4 times faster (assuming quad core of course).

Thanks

Mirno

Neo

Quote from: AgentSmithers on May 27, 2009, 04:22:08 AM
Ah! Now is it normal that a Thread can only go so many Clock Cycles per Sec?
Is it due to me having Multiple Cores?
How many Threads do I need to Spawn to be to the point before Two many are spawned and performance is lost?
Mirno's answer gives better detail, but to summarize and more directly answer this set of questions:

  • The clock cycles per second (a.k.a. the clock speed) is really only limited by the clock cycles per second of the CPU.  If you have a 3GHz CPU, you have 3 billion clock cycles per second.
  • A quad-core CPU acts like 4 single-core CPUs.  You won't get a higher clock speed, but you can run 4 things (threads) at the same time, all running at the same clock speed, each using about 100% of one CPU core.
  • Windows reports the percentage CPU use as the average of the percentage use of each core, so if you're using 1 core 100%, and 3 cores 0%, the average is 25%.
  • There is added overhead from having too many threads, so generally having more active (continually-running) threads than the number of CPU cores will start to eat away at performance.  For a quad-core processor, the ideal number of active threads is 4.  For one of the fancy schmancy new Nehalem quad-cores with hyper-threading, the ideal number of active threads is 8 (i.e. 4 full cores x 2 virtual cores per full core).

I hope that helps!  :U