Would there be any negative impacts to my program if I were to call 1,000 threads to one, single procedure? They would be continuously going until they are stopped at different times. But these threads would be going for at the least of a few hours each, maybe.
Maybe there could be a more efficient way instead, maybe quicker?
Please do note that I haven't done much with multi-threading, so advice would be nice also. :toothy
EDIT: Could anyone recommend a book for me on multi-threading in Windows? I would very much appreciate it!
Thank you.
Quote from: Radio Ga Ga on July 30, 2010, 09:48:07 PM
Would there be any negative impacts to my program if I were to call 1,000 threads to one, single procedure?
Yes, unless you have 1000 core.
1 GUI thread + 1 worker thread for each core.
The assumption here being that the GUI thread is idle most of the time, sitting in a typical cpu-yielding pump waiting for messages to arrive.
No, just make sure they dont access the same memory at the same time.
Quote from: Geryon on July 30, 2010, 11:12:25 PM
Quote from: Radio Ga Ga on July 30, 2010, 09:48:07 PM
Would there be any negative impacts to my program if I were to call 1,000 threads to one, single procedure?
Yes, unless you have 1000 core.
No, a single core processor can have many thread running.
If the CPU is maxed you have the 'weight' of the thread code (Startup, OS juggling, Shutdown) which will ultimately slow down your code but otherwise no
Quote from: Farabi on July 31, 2010, 05:12:30 AM
Quote from: Geryon on July 30, 2010, 11:12:25 PM
Quote from: Radio Ga Ga on July 30, 2010, 09:48:07 PM
Would there be any negative impacts to my program if I were to call 1,000 threads to one, single procedure?
Yes, unless you have 1000 core.
No, a single core processor can have many thread running.
Can, and Should are two different things.
You need a good reason to run a high thread count, a lot of modern programming does not properly comprehend what is involved and you get slow laggy applications due to the thread overhead. Exceptions are threads that are mainly idle and things like internet connections qualify here as their data transfer rate is very slow alongside what can be processed by a single core in a computer.
If performance matters then Rockoons advice is sound, 1 thread per core and if the processor supports hyperthreading an extra helper thread per core.
Quote from: Farabi on July 31, 2010, 05:12:30 AM
Quote from: Geryon on July 30, 2010, 11:12:25 PM
Quote from: Radio Ga Ga on July 30, 2010, 09:48:07 PM
Would there be any negative impacts to my program if I were to call 1,000 threads to one, single procedure?
Yes, unless you have 1000 core.
No, a single core processor can have many thread running.
Well, not exatcly. A CPU can execute one thing at a time. So, the OS make the cpu to switch between threads (this is called "task switching")
For example, let's say you have a cpu with 10 MIPS(million instruction per second) and 2 threads is active. Then each thread can have less than 5 MIPS calculation "power".
Why less ? Because task switching takes time too.
The one thread per core is silly, you can easily do 30-50 threads for a single core and you'll see a nice speed boost depending on what you're using them for. Windows has a limit on the total number of threads that can be created, I think it's a around a few thousand. Anyway if you want to see some examples of programs using high threads look at firefox or apache. I've personally used 30 or so for thread polling in my que system and it drastically sped up what I was doing. Anything that blocks or holds up the show you should consider putting it in a thread.To give a practical example say you wanted to compress a 7 gig file. What I recommend you do is have 1 thread read the data, and put it in a que, then other threads actually process the data, that way you have nice parallelism. :U
Radio Ga Ga ,
1000 threads is probably too much. Is there really a task that would need so many?
If you have low-cpu use tasks that are blocking other tasks then maybe use a thread to free up resources but 1000 of them sounds over the top.
The rest of what follows assumes high cpu usage threads such as those doing long, intensive calculations.
If the threads are CPU intensive then each thread will be given its timeslice in turn. Typical timeslice is 16ms so each thread will get some CPU time every 16ms x 1000 threads / n cores = 16/n seconds (n=number of cores you have) which means your GUI threads, the stuff you try to type at the keyboard or click on with a mouse, will become unbearably slow and unusable. You'd need to run your calculation threads at a low priority so the GUI threads get some time to run.
To limit the number of threads to the number of cores is not a good idea.
Suppose you have 5 threads and 4 cores and each thread needs 3 hours to process then you'll spend 3 hours on the first 4 threads with 1 thread waiting for a core to become available. As soon as the first thread finishes you can schedule the 5th thread and it will take another 3 hours but during this time the other 3 threads will all finish and the cpu cores running them will sit idle. The total time to finish will be 6 hours.
On the other hand, if you scheduled 5 threads on 4 cores then each thread would be given timeslices and the total task would complete in around 3h45m with all 4 cores running all of the time. A saving of 37% in time to completion.
If you can follow that explanation you'll see that the best number of threads to use is dependent on the task you're doing.
Paul.
When I look on the Performance tab in Task Manager (taskmgr.exe), I find that I have ~890 threads running under normal circumstances. But If I were to increase that to ~1,890—it wreak havoc on my machine?
Quote from: Radio Ga Ga on July 31, 2010, 03:58:29 PM
When I look on the Performance tab in Task Manager (taskmgr.exe), I find that I have ~890 threads running under normal circumstances. But If I were to increase that to ~1,890—it wreak havoc on my machine?
The threads that you see there are NOT performing CPU intensive operations.
Most of the time they just "sit" there performing nothing, just waiting for an event or signal or message to wake them up.
An relatively small number of threads that are actually performing CPU intensive calculations can put down the most advanced CPU.
In my experience it was not uncommon for the whole OS/system to become very sluggish and non responsive when I was running intensive multi-threading testing.
This number is variable depending on how many cores your CPU has but I observed it to be in the range of 8-32 very intensive threads for one core.
"One running thread per core" is the optimum, most efficient way; any more, and you gain no more speed, and lose a little bit of task switching overhead for each additional one. Generally, the logical benefits and "perceived responsiveness" of splitting up your program outweighs the fractions of time you lose, but 1000 is far too many (assuming they all run all the time). There's no way to escape the fact that if you have 'x' operations to do, and 'y' CPU's to do them on, you'll never get them done any faster than x/y.
-r
Quote from: BogdanOntanu on July 31, 2010, 05:38:45 PM
Quote from: Radio Ga Ga on July 31, 2010, 03:58:29 PM
When I look on the Performance tab in Task Manager (taskmgr.exe), I find that I have ~890 threads running under normal circumstances. But If I were to increase that to ~1,890—it wreak havoc on my machine?
The threads that you see there are NOT performing CPU intensive operations.
Most of the time they just "sit" there performing nothing, just waiting for an event or signal or message to wake them up.
That is exactly what I am using those threads for! :clap:
So, referring to my previous question: Will there be anything negative?
I could probably right it in C++ using a class; it would be slow, but at least thousands of threads won't be hitting the same procedure.
Quote from: redskull on July 31, 2010, 07:22:38 PM
"One running thread per core" is the optimum, most efficient way; any more, and you gain no more speed, and lose a little bit of task switching overhead for each additional one. Generally, the logical benefits and "perceived responsiveness" of splitting up your program outweighs the fractions of time you lose, but 1000 is far too many (assuming they all run all the time). There's no way to escape the fact that if you have 'x' operations to do, and 'y' CPU's to do them on, you'll never get them done any faster than x/y.
-r
how do you gain no more speed? i've already given a simple example of how multi-threads can drastically increase throughput. I feel like you guys are giving him a lot of misinformation so i'm gonna have to write a proof of concept to prove my point. Logically or otherwise it just doesn't make sense to be fixated on this 1 thread per core non-sense, experience wise I know it isn't accurate.
I understand your doubt E^3. When you write your demonstration, please do not use any IO. Because when your program make a IO operation, other mechanisms includes.
For instance, My download speed 50kb/sec for a file with one thread. When I download exacty same file with two thread, my speed is 100kb/sec.
Now, can I say multi-threading increase my computational capacity ? No. There is no connection between thread count and cpu in this example.
However the point is there is crucial relation between TCP/IP Stack behaviour and threads.
Notice the emphasis I put on *running*. If you have a single thread that adds up 1 million numbers, and each addition takes one second, then it will take a million seconds. If you split it into two threads, have each do half the numbers and run both threads on the same CPU, then one thread takes a half million seconds, there's a small fraction of time to switch to the other, and then the other takes another half million seconds.
Using multiple threads to decrease the time spent doing nothing is not the same thing as speeding up the total work that have to get done. Like you said, "it depends on what you're using them for", and so far the OP hasn't gone into much detail about what those threads would be doing.
-r
Edit - I see now the OP subsequently clarified most would remained blocked, but the same principle still applies. For example, in your queu example, instead of simply 'blocking' on an i/o call, you could just as easily use a non-blocking versions, check the result, and go about doing other tasks until there is I/O available. It would make a programming nightmare to do it all in a single thread, but you would still save the small task switching time. Which, like I originally said, is usually worth it to avoid complexity.
Quote from: Radio Ga Ga on July 31, 2010, 07:27:45 PM
...
That is exactly what I am using those threads for! :clap:
So, referring to my previous question: Will there be anything negative?
I could probably right it in C++ using a class; it would be slow, but at least thousands of threads won't be hitting the same procedure.
IF the threads do nothing most of the time then it means that they are not using any CPU time and hence there will be no impact.
Well, some impact must exist because the OS has to manage those threads but I do estimate this to be minimal.
The dioxin argument is interesting, but flawed.
To recap his argument, suppose you have a 4-core CPU and 5 dependency-laden tasks to perform, each task taking 1 hour.
A naive implementation with only 4 threads would take 2 hours to complete all the tasks.
The key word is naive. There is no reason that each thread can't also work on 25% of that 5th task. Is it easier to write than the naive implementation? Probably not. But its absolutely more efficient.
E^cube's argument is the same, 'cept based on blocking I/O calls instead of work your own code must accomplish like in the dioxin argument.
http://msdn.microsoft.com/en-us/library/aa365683(VS.85).aspx
Excuse my ignorance but I can't see how multiple threads could speed up a task, *if* the task itself is efficiently coded. "Dependency-laden tasks" sounds nice, but what are these "dependencies"? Will you get more data from the harddisk if you request them from 4 different threads?? Unlikely.
However, you might get more data from a server if you request them from 4 different threads. Browsers do that all the time. But that is cheating, because you explicitly ask for more hardware, on the server side where the bottleneck sits.
Quote from: jj2007 on August 01, 2010, 12:35:08 AM
Excuse my ignorance but I can't see how multiple threads could speed up a task, *if* the task itself is efficiently coded. "Dependency-laden tasks" sounds nice, but what are these "dependencies"? Will you get more data from the harddisk if you request them from 4 different threads?? Unlikely.
However, you might get more data from a server if you request them from 4 different threads. Browsers do that all the time. But that is cheating, because you explicitly ask for more hardware, on the server side where the bottleneck sits.
I've already said it depends on how you're using them, what you need to do is find your bottlenecks, i'll give you some more examples. In terms of a download application, there's a nice one call Download accelerator plus which can download files up to 300% faster than normal. How it works is it find multiple locations the file you want is located, connects to each one in a different thread and downloads different parts of the file at the same time, this
1)removes the raping of resources for any 1 server
2)bypasses any rate limiting(as far as bandwidth) any 1 server has
3)fully utilizes what your internet bandwidth capabilities(which usually isn't the bottleneck)
also someone mentioned you could just for instance make a socket non-blocking and continue to check it...that's even less efficient than blocking, because your code actually has to continue to check the socket state, versus just wait on it, which does you no good.
In most cases the bottleneck is more so processing/handling the data anyway, as I already gave for compressing the file. Imagine you used 1 thread for compressing an entire 7 gig file, you'd have to read a part,then compress it, then maybe write it back out somewhere...
that's 3 things that the thread has to do before it can handle another part....horrible, and it'd be much more efficient for the first thread to read the data,toss it in a que and continue to to read more, and then have other threads process it. We all know playing around in memory is much faster than disk, so only makes sense to use that to your advantage....
another example is say you need to encrypt 100 random 80mb files in a folder,and the encryption algorithm takes 5 seconds for 80mbs. what you could do is have 1 thread findfirst/findnextfile the directory and add each file name into a que, and have other threads actually open the files and do the encryption, when they finish their file they check to see if anymore have been added...
so they're countless examples of how multiple threads can drastically increase performance, to jump on the other side of the fence for a second and give an example to where threads are an awful idea is a server program that attempts to create 1 thread per client and can have more than 20 simultaneous connections...a much better remedy for that is a i/o system like IOCP and using thread pooling/que system to process.
also here's a link on thread pooling, which is what I been meaning when I talk about a queuing system and rocks :U. http://msdn.microsoft.com/en-us/library/ms686756%28VS.85%29.aspx
I would have thought that the distinctions are simple, for any core in a processor it can at best run one thread at a time, with hyperthreading you can run a second from that core with about a 30% improvement over a single core without the hyperthreading.
What you cannot get is more processing power than the total core capacity has to offer, if you have 4 cores running at full capacity then any other intensive thread will slow all of the rest down and it gets worse with every extra thread started.
High suspended thread counts as is common in the OS is viable only if the machines full capacity is underused, as soon as the total core capacity is in use everything starts to slow down and the more threads you start the worse it gets.
A suspended thread is similar to having a technique to fast start a process, its startup overhead has already been performed and with a suspended thread it can be restarted much faster than creating a new process every time it is needed.
A typical high thread count can be applied to multiple internet connections as each connection is transferring data far slower than the macximum capacity of a given core, by running multiple threads of this low data movement type you more efficiently use the processing capacity of any given core.
Rockoon,
QuoteThe dioxin argument is interesting, but flawed.
To recap his argument, suppose you have a 4-core CPU and 5 dependency-laden tasks to perform, each task taking 1 hour.
The argument is not flawed.
Your recap is not a recap of my argument. I stated clearly that my argument applied to cpu intensive threads, not threads which were waiting, doing nothing most of the time.
What I intended to get across is that it is not always the best strategy to limit the number of running cpu intensive threads to 1 per core.
Paul.
Quote from: dioxin on August 01, 2010, 11:30:48 AM
Rockoon,
QuoteThe dioxin argument is interesting, but flawed.
To recap his argument, suppose you have a 4-core CPU and 5 dependency-laden tasks to perform, each task taking 1 hour.
The argument is not flawed.
Your recap is not a recap of my argument. I stated clearly that my argument applied to cpu intensive threads, not threads which were waiting, doing nothing most of the time.
What I intended to get across is that it is not always the best strategy to limit the number of running cpu intensive threads to 1 per core.
Paul.
My recap did not presume anything about threads waiting in your argument.
You are just imagining that it doesn't apply by trying to grasp at my side-note about e^cubes argument.
Lets take the case of rendering 5 pixels of an extremely deep mandelbrot zoom. Billions of iterations per pixel, with a full sampling taking 1 hour for each pixel.
You have 5 tasks to accomplish, 4 threads, and no imaginary waiting.
At T = 0:
Thread #1 begins Task #1
Thread #2 begins Task #2
Thread #3 begins Task #3
Thread #4 begins Task #4
At T = 15 minutes:
Thread #1 shifts to Task #2
Thread #2 shifts to Task #3
Thread #3 shifts to Task #4
Thread #4 shifts to Task #5
At T = 30 minutes:
Threads #1 shifts to Task #3
Threads #2 shifts to Task #4
Threads #3 shifts to Task #5
Threads #4 shifts to Task #1
At T = 45 minutes:
Threads #1 shifts to Task #4
Threads #2 shifts to Task #5
Threads #3 shifts to Task #1
Threads #4 shifts to Task #2
At T = 60 minutes:
Threads #1 shifts to Task #5
Threads #2 shifts to Task #1
Threads #3 shifts to Task #2
Threads #4 shifts to Task #3
Each task ran for 1 hour each, and the total time was only 1 hour and 15 minutes. Your argument insists that it should take 2 hours, 1 hour for 4 tasks and then 1 more hour for 1 task. Thats why your argument is flawed.
Quote from: jj2007 on August 01, 2010, 12:35:08 AM
but what are these "dependencies"?
Dependency chains. Non-parallelizable tasks. For instance, calculating a single pixel from the mandelbrot set. You cant set 4 cores off on the same pixel and get any advantage. Each iteration depends on the previous. Dependency-laden.
The dioxin argument is not "flawed", but it misses the entire point of the "one per one" guidline: It doesn't mean simply stop running threads, it means redesign the code. In his example, to be *most* efficient, instead of 5 threads that do 3 hours of work, you would want to build the code so you had 4 threads doing 3.75 hours of work each. That's more efficient than both the example and counterexample (but is, of course, not always practicle).
I can't stress enough that the key word is *running*. As soon as you start in with blocking I/O calls, especially if you have to wait for hard disk platters to spin, you change the entire argument.
-r
A book reference was requested:
Try http://www.amazon.com/Concurrent-Programming-Windows-Joe-Duffy/dp/032143482X/ref=pd_bxgy_b_text_b
and http://www.amazon.com/Multithreading-Applications-Win32-Complete-Threads/dp/0201442345/ref=pd_bxgy_b_img_a