News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Core Blimey

Started by oex, February 13, 2010, 01:46:36 AM

Previous topic - Next topic

sinsi

I think that you need 2 threads for your GUI program - the main one with the user aspect ("why can't I click cancel") and the one that does the work.

As far as "as many threads as the number of cpu's", this doesn't matter (my fresh install of XP has 400+ threads going now). A lot of them are blocking but I think
a blind "4 cores = 4 threads" is misleading. FWIW, I have been looking around at threads/async IO and still don't have a clear idea (:

>Trust in Windows
:bdg
Light travels faster than sound, that's why some people seem bright until you hear them.

Hagrid

Quote from: sinsi on February 19, 2010, 08:48:13 AM
As far as "as many threads as the number of cpu's", this doesn't matter (my fresh install of XP has 400+ threads going now). A lot of them are blocking but I think
a blind "4 cores = 4 threads" is misleading. FWIW, I have been looking around at threads/async IO and still don't have a clear idea (:

You're right about "4 cores = 4 threads" being misleading.  That isn't actually what I said.  I suggested "four compute threads".  You can have as many threads as you want in your application (each has its own overhead).  Threads that are blocked on IO are fine (although there are likely better ways) as such threads are not contending for CPU time.  Having more runnable compute threads than CPU cores is a different deal and this is where unnecessary thread switching will eat into efficiency.

If your app does a lot of IO (network, disk, etc.) then you should be considering asynchronous IO combined with an IOCompletionPort.  The IOCP acts as a worker thread dispatch point.  You can create as many worker threads as you want and as an IO completes, the last thread to wait on the IOCP will be released to process the response.  IOCPs are designed to restrict the number of running threads to equal the CPUs automatically.  As you can post your own completion notifications to an IOCP, it also is a nifty method of queueing work for compute threads.

The LIFO management of worker threads with IOCP is intended to also prevent unnecessary thread switches - a worker thread that calls into the IOCP will return immediately if a completion notification is ready.  A FIFO strategey would guarantee a thread switch on every IO.

Hagrid

hutch--

This has been a very interesting discussion, like everyone else I have seen a mountain of software over the last 10 or so years that pelted threads around all over the place and the apps were characteristically laggy and slow for exactly the reason that massive active thread counts added far too much overhead to many applications. Multicore procesors have relieved this problem by a long way and the later i7 series with hyperthreading appears to have relieved it further but the fundamentals of the problem are the same, the more active threads you have competing for processor time, the higher the overhead to task switch them will be.

Quads are now common and we are entering the era of many core processors which opens up some exciting possibilities if the rest of the package is developed as well. Asynchronous parallel processing has been with us since win95 and many core procesors will make this type of code faster simply by spreading the load across more processors but the more interesting stuff will be when x86 catches up to the Itanium capacity of running cores in synch. Synchronous parallel processing will see big gains in processing power as the increase in core count can be used in different ways.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

redskull

Quote from: sinsi on February 19, 2010, 08:48:13 AM
FWIW, I have been looking around at threads/async IO and still don't have a clear idea (:

An IOCP is a way that async I/O can be distributed amongst several worker threads, while controlling the number which are running at once.  A good analogy for an IOCP is an IT office: having a dedicated IT employee (a worker thread) assigned to help each other non-IT employee (an I/O operation) would be easy, but wasteful and ineffcieint; since you would probably have more IT staff than desks and computers (CPU's), much of the staff would sit around waiting for one to become free.  However, having a single IT employee to serve everybody is no good, because the work just piles up behind him, and the other desks and computers go to waste (in this analogy, an IT member can't work on other projects while waiting on something to finish for the current one).  The most efficient method is to have one manager, and one IT member for each desk; requests go to the manager, who distributes the jobs to the employees one at a time.  That way, no desks go unused, and no employees sit around waiting.  When an employee finishes a task, he goes back to the manager to get another one.

Each IT employee is a worker thread, and the manager is the IOCP.  Your multithreaded program is MOST efficient when you have a single thread per CPU (all the other threads in the system are out of your control, so you can only worry about how efficient *your* program is).  The general idea is that a worker thread starts an I/O task, and then "checks back in" to the IOCP.  Eventually, when the I/O finishes, the IOCP recieves the notification and starts up another worker thread to handle whatever needs to be done.  By having enough worker threads 'queued up', waiting for I/O operations to complete, you get the maximum efficiency from your program.  The trick is that you can configure the IOCP to 'throttle' the number of worker threads it starts at once; normally, 1 per CPU.

For example, imagine you have lots of disk accesses to do; reading the info from the user is an I/O opeartion, so whenever the read completes, the IOCP will wake up a thread, which will parse the input and start the disk read, and then go back to waiting.  Whenever the disk read completes, the IOCP wakes up the next worker thread, who deals with the results of the disk read (sending them back to the user, etc).  Obviously, the more pending I/O operations you have, the bigger the payoffs.  The problem with programming with IOCP's is that since each worker thread is identical, they all have be 'smart' enough to deal with any of the results, isntead of each thread dedicated to one task.

Obviously, the more I/O intensive your app is and the more CPU's in the computer you intend to run on, the graeter the benefits will be; such a set up is idea for something like an SQL server, which runs on servers with dozens of CPU's, whose sole purpose is to read from a network, read from a disk, write to a network, and repeat on a staggering scale.

-r

Strange women, lying in ponds, distributing swords, is no basis for a system of government

dedndave

in a way, it makes US work harder as programmers
if we want high performance software, we may have to carefully design it to take advantage of what the machine has to offer
i.e., our code has to adapt, which could make it a bit complicated
this is something that may be more prevelent in the near future, as more and more machines have multiple cores
soon, we will be able to say "most machines"
but, that could be anywhere from 2 to lord knows how many cores
we may want to run a few baseline tests in the laboratory
let's wait til one of the members gets a dual package i9   :bg

oex

AMD has been reading my posts, they have a section on getting core count with CPU ID in this months newsletter :lol
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv