News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

1000 thread polling loop demo

Started by hutch--, August 26, 2005, 03:30:37 AM

Previous topic - Next topic

hutch--

I have been listening to a lot of noise lately about how wicked, illegal, immoral ad fattening the use of a polling loop is in a modern multitasking operating system and had to boter to write a test piece to demonstrate the nonsense that was being flaunted against this technique. You will need to use both Task Manager and Sysinternals Process Explorer to evaluate the test piece but the results on the PIV running win2000 that I use showed 5% processor usage for 1000 running threads that in turn shelled out to 1000 instances of another app. That is one 200th of one percent of processor usage for each polling loop.

Anyone using the MASM32 library module "shell_ex" can rest assured that it is very efficient code that is very hard to improve on.

[attachment deleted by admin]
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Randall Hyde

Quote from: hutch-- on August 26, 2005, 03:30:37 AM
I have been listening to a lot of noise lately about how wicked, illegal, immoral ad fattening the use of a polling loop is in a modern multitasking operating system and had to boter to write a test piece to demonstrate the nonsense that was being flaunted against this technique. You will need to use both Task Manager and Sysinternals Process Explorer to evaluate the test piece but the results on the PIV running win2000 that I use showed 5% processor usage for 1000 running threads that in turn shelled out to 1000 instances of another app. That is one 200th of one percent of processor usage for each polling loop.

Anyone using the MASM32 library module "shell_ex" can rest assured that it is very efficient code that is very hard to improve on.

And you can replace the following statements:

; -------------------------------------------
  ; loop while created process is still active
  ; -------------------------------------------
  @@:
    invoke GetExitCodeProcess,pr_info.hProcess,ADDR xc
    invoke Sleep, 1
    cmp xc, STILL_ACTIVE
    je @B

by a single call to WaitForSingleObject and have 0% overhead, less typing, better synchronization, etc., etc.

In programming, like many other ventures, it's always best to use the right tool for the job. And WaitForSingleObject *is* the right tool for this job. The polling loop, no matter how *little* time it wastes, still wastes time. It also takes more space. It's also less precise. Maybe these issues don't matter for this trivial example, but given that using WaitForSingleObject is actually *easier*, too, there is no sense in using a polling loop for this task.
Cheers,
Randy Hyde

hutch--

 :bg

Yes you can but does it do it any better and is it as easy to modify ? The answer is NO and NO. I have kept hearing that its better because its better because its better but the proof is the objective testing and I am not hearing any of that. Then I hear its the right tool for the job because its the right tools for the job etc ....

Box to box the results will vary some if the OS is capable of handling a 1000 thread demo and to make the point of objective testing over endless opinion, 0.005% or lower for each thead AND polling loop says its very efficient code and its still easy to modify.

When someone can show me a meaningful improvement on one 200th of one percent or less I may take notice but I won't hold my breath waiting for an improvement when the processor usage is so low its hard to measure. None the less, we will still hear that its better because its better etc ......  :cheekygreen:

PS: I should have asked, how do you get something out of nothing "0% overhead" ? What operation at any level in a computer happen at ZERO cycles ?
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

PBrennick

Hutch,
For the whatever it is worth, I use both methods.  There are times, however, when it is necessary, because of what I am testing, to use a polling loop.  I have NEVER seen any negative consequences from its use and I do not understand what all the commotion is about.  I have been reading the 'polling wars' since May and have yet to see one person post any valid reason to NOT use a polling loop.  I just see well-meaning suggestions with no proof, I have a hard time accepting things delivered in this way.

Paul
The GeneSys Project is available from:
The Repository or My crappy website

hutch--

My problem Paul is I can lve with system signalled states, polling loops, gui message loops and anything else that does the job. What I bite on is the notion that there is only one way to do anything and that one size fits all. I have seen badly written inedequate code that failed to do the job as often as I have seen bloated slow overkill from applying the wrong technology to a simple task.

The reason why I rely very heavily on testing si I have a method to distinguish between what works well and what does not.

Unfortunately there is no current documentation for the chain of function calls that go from kernel32.dll to ntdll.dll and then into ntoskrnl.exe and this is for security reasons because the virus guys try and use int 2Eh for viral access. The data I could find for slightly earlier versions of win2000 is out of date as Microsoft have deliberately obscured the workings to make life more difficult for them.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Randall Hyde

Quote from: hutch-- on August 26, 2005, 03:07:34 PM
:bg

Yes you can but does it do it any better and is it as easy to modify ?
Using WaitForSingleObject uses less resources (however small you may claim the difference is, the fact is that it uses less resources) and I'd also argue that it's easier to modify, being a single statement rather than several.

What's your aversion to doing things the right way?

Quote
The answer is NO and NO.
How do you figure it's better and easier to use a polling loop?  Conceptually, I can see that it would be easier to dream the polling loop up if you're unfamiliar with the WaitForSingleObject call, but once you possess the knowledge, WaitForSingleObject *is* easier and *is* better.

Quote
I have kept hearing that its better because its better because its better but the proof is the objective testing and I am not hearing any of that. Then I hear its the right tool for the job because its the right tools for the job etc ....
Your testing is *not* objective. Nor are the tools you're using good enough to provide the proper answers.  You've thrown out some numbers where you've measured the polling code results and you've not even run the experiments on the blocking method.

And sometimes, *analysis* is better than empirical measurements when attempting to prove something. And analysis of the problem *clearly* shows the blocking approach to be the better one.

Quote
Box to box the results will vary some if the OS is capable of handling a 1000 thread demo and to make the point of objective testing over endless opinion, 0.005% or lower for each thead AND polling loop says its very efficient code and its still easy to modify.
Where are your numbers for the blocking code? How can you consider this to be an "objective experiment" when you've not even measured the other approach?

Quote
When someone can show me a meaningful improvement on one 200th of one percent or less I may take notice but I won't hold my breath waiting for an improvement when the processor usage is so low its hard to measure. None the less, we will still hear that its better because its better etc ......  :cheekygreen:

I've personally not questioned the worthiness of the *performance* difference. But the polling method has some *very* serious synchronization problems. Apparently, you're not too familiar with the problems found in concurrent programming, so I'm (and everyone else who is suggesting how poor your suggestions are) pretty sure that any attempt to go into further detail would be a waste of time. You should invest in a good book on concurrent programming or OS design and dig in. Or at the very least, study these topics on-line.


Quote
PS: I should have asked, how do you get something out of nothing "0% overhead" ? What operation at any level in a computer happen at ZERO cycles ?
If the polling loop executed only once, it would not have zero overhead.
Cheers,
Randy Hyde

Randall Hyde

Quote from: PBrennick on August 26, 2005, 03:44:14 PM
Hutch,
For the whatever it is worth, I use both methods.  There are times, however, when it is necessary, because of what I am testing, to use a polling loop.  I have NEVER seen any negative consequences from its use and I do not understand what all the commotion is about. 
The problem is with synchronization. Not the loss of cycles. If you're doing trivial things, you can get away with the lax synchronization that results from a polling loop. But it will fail you if true synchronization is required.

And might I ask why polling was *required* in your case? I'm not suggesting polling is never appropriate, but most of the time you don't need it.


Quote
I have been reading the 'polling wars' since May and have yet to see one person post any valid reason to NOT use a polling loop. 
Visit the threads on ALA. There are lots of examples of synchronization problems that occur with the polling approach. Particuarly with Hutch's approach to sending a "I'm complete" message to some other process.

Quote
I just see well-meaning suggestions with no proof, I have a hard time accepting things delivered in this way.

Microsoft tells us how it *should* be done in the SDK. The way to wait for another process to complete execution is via WaitForSingleObject (or, if other events are occuring, WaitForMultipleObjects). It's a one-statement function call. Why turn this into a multi-statement loop that runs every millisecond, wasting a bunch of cycles when it is easier and more efficient to use WaitForSingleObject?

The craziness here is that WaitForSingleObject is smaller, more efficient, and easier to program. What justification is there for using a polling loop?
Cheers,
Randy Hyde

PBrennick

I sure hope this does not turn into another nightnare like it did on Spook's board.  I think this topic should be moved or stopped.

Randy,
In all the time I have been here I cannot ever recall ever saying anything negative about you.  Now you are saying that I am a trivial programmer?  You should be ashamed.

Paul
The GeneSys Project is available from:
The Repository or My crappy website

MichaelW

Quote from: Randall Hyde on August 26, 2005, 08:09:49 PM
What justification is there for using a polling loop?

How about the possibility of performing other processing in the polling loop?

eschew obfuscation

hutch--

Paul,

You can be assured that there are many people in this forum that respect your years of proven commercial programming skills and it will tend to come not only from the learners that you have helped but from other programmers who have years of commercial programming skills.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

hutch--

Now for the waffle,

> What's your aversion to doing things the right way?

None, that why I tailor a polling loop to the task of shelling out to an external process.

> And sometimes, *analysis* is better than empirical measurements when attempting to prove something.
> And analysis of the problem *clearly* shows the blocking approach to be the better one.

Metaphysics is no substitute for empirical testing as the axioms for this branch of metaphysics need to be proven like anyting else. Inductive method is the technique you use to determine how real world event occur.

When it comes to the repeatedly stated "You get someting for nothing" approach with one technique over another, there appears to be a clear failure to understand the von Neumann architecture of current computer hardware, smething I have characterised as "doing one damned thing after another". Shift the argment to parallel computing and you have more than one processor "doing one damned thing after another" and synchronising the resuls where ite required.

> The problem is with synchronization. Not the loss of cycles. If you're doing trivial things, you can get away with
> the lax synchronization that results from a polling loop. But it will fail you if true synchronization is required.

This is simply trying to change the subect, when a polling loop is used to test the exit code for a running application, it IS doing someting trivial and problems of synchronsation just don't occur. A "shell" procedure disables the calling app while the called app runs and when the called app is finished, the calling app is re-enabled.

=======================================
Microsoft tells us how it *should* be done in the SDK. The way to wait for another process to complete execution is via WaitForSingleObject (or, if other events are occuring, WaitForMultipleObjects). It's a one-statement function call. Why turn this into a multi-statement loop that runs every millisecond, wasting a bunch of cycles when it is easier and more efficient to use WaitForSingleObject?
=======================================

Herein lies the nature of the assertions about the Wait techniques. WaitForSingleObject() cannot be extended by adding something into it like a polling loop so you have to use another API WaitForMultipleObjects() and see if the range of objects it will wait for suits te task you have in mind.

> Why turn this into a multi-statement loop that runs every millisecond,

Herein lies another soure of confusion. When you run the Sleep() API with the setting of one (1) millisecond, you force a wait of 1 milisecond then the function yields which places the thread at the end of the list of timesliced threads. The way you would acheive the result suggested here is to create a loop with GetTickCount() in it and make a subloop to waste the duration instead of using the Sleep() API that yields the rest of the timeslice back to the OS.

The reason why a properly written polling loop is so efficient is that it passes unused timeslices back to the OS to use elsewhere and this is the way its supposed to be done in a multitasking OS.

Now its not as if the polling technique is unknown or shunned in Windows architecture. EVERY running GUI applicaion not only polls keystrokes in a GetMessage() polling loop but polls anything the OS points at the WndProc style callback procedure and it is a particularly efficient technique, particularly as so many message get pointed at a WndProc ona regular basis.

When it comes to proper synchronisation on any large scale between far more complex operations of multiple processors, it is done in hardware to make make it fast enough yet it is also common knowledge that the increase in processor count does not give you an exact match in the increase in processor capacity. There is always a loss in parallel processor operating so that 2 processors may give you a 90% increase in processor power but its not 100%.

When it come to comparing how much running Windows software depends on polling loops as against those that depend on signalled states, its not even a competition.

Would anyone who has ever written Windows software try and write a WndProc with signalled state APIs ? I sugest that it would be a nightmare of near amazing proportions and the simple, clear and elegant polling loop design would eat it alive in performance terms.

If anyone was really serious about the "One size fits all" approach, they would use a polling loop instead as it has far wider usage in Windows applications with message loops and WndProc procedures  but as a matter of fact, te signalled stae APIs are useful in their own context, even if they are rather clunky high level concepts.

Its still a horses for courses world and trying to use a draught horse to win the Kentucky Derby is an exercise in futility.

Regards,

hutch at movsd dot com
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

PBrennick

Hutch,
Thank you for the nice words.

Michaelw,

How about the possibility of performing other processing in the polling loop?

Of course, you have hit the nail squarely on the head.  I have no clue why this answer was not obvious to ALL.  In this case WaitForMultipleObjects is just as bad for obvious reasons (probably obvious only to trivial programmers).

Paul
The GeneSys Project is available from:
The Repository or My crappy website

James Ladd

Hutch,

Very interesting work and thread.

I like your approach for the task at hand. It could be suitable to many things that are very similar to
this task.

How many resources does the example use ?
I think there is a limited thread count and using 1000 is ok if your the only person/app using the machine.

It would be very interesting to see the same example with some other work happening in the waiting threads.
While they are all waiting on a task there is little they have to do and contention for resources.

Would you recommend using this approach for a web server ?

Rgs, Striker.

hutch--

James,

A web server is an obviously far more complex animal than a simple "shell" procedure which only has to deactiveate the caling process, run an external one and re-activate the caller when the notification is made of termination. It will depend a lot of whether you need event notification for processes or if you need to stop one process while another is run.

I have usually seen web server software as asynchronous thread style applications usually with a thread count limit imposed by the host server but where isues of synchronisation arise in things like disk access and running CGI code etc .... I would take a very good look at what had to be done before I commited it to any particular technique. The important thing when you do use a polling loop for production code is to yield unused processor time back to the OS rather than waste it.

In terms of event notifcation, the normal GUI messaging system is the heart of Windows and it is very efficient in terms of processor usage and if event notification is the main requirement, it is very hard to improve on but at the other end, if large scale synchronisation of a large count of events is what you need, having a good look at the capacity of the Wait series of functions would probably be useful if you can get them to do the job.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

James Ladd

Hutch,

I always like how you want to examine the problem before making a judgement on the best technique.
I liked your example and I would not use it for a web server but thought Id ask so others may consider
the technique before using it for a web server.

Keep well.

Rgs, Striker.

ps - I dont think Randall had his happy flakes this morning.