News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Synchronous code design

Started by Jibz, June 12, 2005, 09:51:52 AM

Previous topic - Next topic

Jibz

Quote from: hutch--The problem is there is a whole mountain of code that you cannot write if you have restrictions on such techniques. Now in response the the GOTO boogie, I instead only use JUMPS and as few as I can get away with and I simply ignore the "rule" on LOCALS only if I need the scope of a GLOBAL so I wonder if there is a new rule of fashion on using polling loops ?

The GOTO rule naturally doesn't apply to assembly language, since it's not structured or scoped in any way .. it's just a series of instructions, and if you had no GOTO/JUMP you couldn't do anything. However for higher level languages GOTOs tend to mess up the structure and make it very hard to understand code, because it means you can suddenly jump in and out of scopes where variables are declared.

There are multiple reasons for promoting local variables over global, among them are better thread safety and code separation which helps code reuse. Again, for a single file asm project it doesn't make much difference.

Quote from: hutch--The basic mechanics is there is no other way of performing regular interval checks on events that test for a result. You can pass this off to the operating system if you like but it still must do the same to determine the results of events. On the win2k and later OS versions this is the 7 to 8% at idle processor usage of interrupts so I don't see that it is all that efficient in terms of processor usage. While I see that Windows is poorly designed in terms of low level access, this type of loop is trivial to write from ring3 access and they test up fine if they are written properly.

I'm sorry, but that's not entirely correct. What WaitForSingleObject does, is that it switches your thread from the ready state to the waiting state, which means it won't get scheduled at all. It then puts you in the waiting list of the object(s) you are waiting for, which is a simple linked list.

When a key is pressed, an interrupt occurs and (down the line) the OS signals the object you were waiting on, and walks the linked list of waiting threads and wakes up (puts into ready list) those that can now run. So, effectively you are not using any CPU time waiting this way, and there is no polling loop for this.

Even if there was one, it would still be better to have one OS thread handle it instead of every single user program. Things are just not the same as they were 15 years ago in DOS, where you were the only program running, and the hardware was just an 'int' away :U.

hutch--

 :bg

Jibz,

Under WaitForSingleObject is WaitForSingleObjectEx which in turn calls a similar procedure in NTDLL.DLL which calls int 2eh which is then polled by the OS even though the thread is turned off. On my win2k box, the overhead for system interrupts is about 7 - 8% so it does not occur at no cost. I know it works but I have yet to see the advantage over a polling loop in measured processor usage. The bottom line is you get nothing for nothing and the system polls the interrupt. The polling loop is of course a lot simpler and a lot less theory laden which is very much in keeping with low level programming.

I commented on the boogie on using GOTO as a matter of fashion. Very few are old enough to have seen real spaghetti code that was hung together with line number labels and gotos. It could be written reasonably well but it could also be writen very badly but that is not much different from any language that has that capacity. I have seen really lousy asm, basic and C in my time but I have also seen all 3 written well and they impliment the amount of structure needed, not what is imposed by language theory.

I am of the view that taboos on using GOTO have to do with limitations in the optimisation of complex loop code that can be routinely written in assembler.

LOCAL variable certainly do have their place where you need re-entrant code or you are writing recursive stack based code but at the other end, you can end up with near rediculous archipeligos of passed stack parameters where a global is far more efficient and faster. One of my favourites when writing code that MUST be self contained is to write a large structure that has every item needed on the way up the function tree then pass it as a single DWORD to the first function in the tree.

I just wondered if the taboo on polling loops was in the same vein as the taboos on GOTO and GLOBAL variables ? If so its probably a good reason to code such things in assembler.  :bg
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Jibz

Quote from: hutch--Under WaitForSingleObject is WaitForSingleObjectEx which in turn calls a similar procedure in NTDLL.DLL which calls int 2eh which is then polled by the OS even though the thread is turned off. On my win2k box, the overhead for system interrupts is about 7 - 8% so it does not occur at no cost. I know it works but I have yet to see the advantage over a polling loop in measured processor usage. The bottom line is you get nothing for nothing and the system polls the interrupt. The polling loop is of course a lot simpler and a lot less theory laden which is very much in keeping with low level programming.

I'll gladly admit I don't know the implementation in the low-level keyboard routines, which may very well be handled through a polling loop (like e.g. the floppy driver). But the operating system will be doing this anyway, so your polling loop is then just an unnecessary loop checking the result of the OS loop.

If you have 5 programs running waiting on the same event it gets more obvious. You can either have 5 polling loops, besides the OS one, getting scheduled and checking on it's result, or stick them all in a list of waiting threads and have the OS (wether it's running a single polling loop or not) notify them all when the event occurs. Clearly it's better to have the OS handle it.

I know you cannot measure the CPU usage of a loop like yours (especially since you used a 20ms delay this time) on a 3 gHz CPU, but the fact remains that your thread gets scheduled to do a test it doesn't have to do because the OS can let you know when the event occurs. Under stress conditions like in the shell examples you can sometimes see the difference :U.

Quote from: hutch--I am of the view that taboos on using GOTO have to do with limitations in the optimisation of complex loop code that can be routinely written in assembler.

I think it has more to do with limitations in the brains of programmers :bg. In rare cases you can do something very elegant with a GOTO that you couldn't do without, but you should know what you're doing .. less experienced coders often end up with a lot of unnecessary GOTOs cluttering up their program so they have no idea what the actual control flow is.

hutch--

The problem is with 50 copies of the wait_key algo running, they collectively take up 0% so again, the processor demand is so low that it does not register. System idle process is running at 92% and system interrupts occupy the rest.

Quote
Clearly it's better to have the OS handle it.

Clearly you have more confidence in the OS than I do. Having seen enough Windows code in my time, polling through a pile of junk affords you no measurable advantage but you pay the price of increased complexity.

I in fact understand the problem with less experience programers misusing a capacity but again this is the case with any capacity that can be misused and no language in particular is immune from this problem. Where compilers do fall down is with complex multidependent loops when they are written to optimise structured loops that are preknown in their layout. Feed the same compiler a design that I call a "crossfire"loop where you have non nested multiple loops peforming different tasks on the same data and without direct branching, they have terrible problems.

I am much of the view that structure is best left to the programmer to impliment to the level they need, not as a straight jacket to limit design and this is why I am not a follower of fashion. Make a rule and someone finds a need to do something different.

The main problem now is to turn off 50 running consoles with a wait_key loop running.  :P
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Jibz

Quote from: hutch--Clearly you have more confidence in the OS than I do. Having seen enough Windows code in my time, polling through a pile of junk affords you no measurable advantage but you pay the price of increased complexity.

I probably do :U.

The msvcrt.dll kbhit() function performs locking and calls GetNumberOfConsoleInputEvents, which btw has this info:

Quote from: MSDNA process can specify a console input buffer handle in one of the wait functions to determine when there is unread console input. When the input buffer is not empty, the state of a console input buffer handle is signaled.

So, the polling loop will not stop until the OS has processed the keyboard input and made it available in the console input buffer. While doing this, the OS will signal the input buffer handle, which includes waking up any threads in the associated waiting list.

Your polling loop is not taking over the work of the OS, it's checking the result of the OS mechanism continuously. Wouldn't you agree it is (however little) more efficient to have the OS schedule your thread again when it is handling the event anyway, instead of being scheduled every 20 ms to check up on the progress of this handling?

hutch--

There is an ancient rule I rigorously apply in programming called "Occams Razor" which means never do more than you need to do to get the job done. You can do more but it works no better and at the end, I am an empiricist who tests what I do and by test, I need do no more.

The distinction is in fact another ancient one between "efficient" cause and "final" cause. I prefer the testable "efficient" cause over the theoretical notion of "final" cause and I can do it by simple test. The API is an interesting one but I wonder if there is any real gain in manually coding that API to try and get it faster or smaller, both of which are objective tests when it is plenty fast enough and small as well.

There has to be some gain and I don't see one by objective test and I simply don't care about the rest. I have seen so much over coded junk in my time that I don't go in that direction intentionally.

The loop has a 10 millisecond delay but Sleep also drops the threads time slice yet even though I have known ladies who could type 120 words a minute (accurately as well) no-body can type 100 characters a second so its not like there is any loss there either.

Now I have no doubt that the technique you prefer works if its coded correctly but it will be bigger with a higher level of complexity for no measurable gain, thats why I will continue to use the simplicity of a polling loop where what you see is what you get.


I split the topic so that the MSVCRT topic did not fill up with stuff that is not part of it.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Jibz

Quote from: hutch--There is an ancient rule I rigorously apply in programming called "Occams Razor" which means never do more than you need to do to get the job done. You can do more but it works no better and at the end, I am an empiricist who tests what I do and by test, I need do no more.

And you considder writing your own polling loop as 'doing less' compared to just calling a function provided by the operating system to do the work?

Jibz

Btw, the Sleep() call is just stretching out the CPU usage so it ends up being less than 1%.

You could claim you can compute 1 million digits of PI using 0% CPU if you do a single loop every 20 ms. That doesn't mean it's not using CPU time, it's just spread out enough to not be visible in the task manager.

I don't care how sharp your razor is, in the end you are using CPU time waiting for something the OS can notify you about for (almost) free once it occurs -- simply because the OS is handling it anyway, and can walks the list of waiting threads when it does :U.

hutch--

 :bg

Seriously, I don't know what it takes, I just shut down 50 running copies of an app running the polling loop and that did not register more than 0% with the Sysinternals Process Explorer. I suggest on the basis of testing of this type that the razor is more than sharp enough, especially when it would be very rare to run 50 console apps at the same time all waiting for user input.

When you work with an operating system, you finally have a published interface and while I do from time to time disassemble a system DLL to find out what a function does, finally it only matters if the interface does what it is supposed to do. As I have little trust in OS design, I also rigorously test what I write in whatever area it is supposed to perform in and if it passes that testing, I use it.

Now I will still make the point that you get nothing for nothing and if you pass this capacity to the OS, it polls for the result and this takes time, even if its only a longer list of processes to test. Now I am well aware that Sleep() also has an overhead but when 50 copies of it running at normal priority don't register more than 0%, its not an overhead that I am going to lose Sleep() over. The more so as the system idle process is a lower priority that the 50 running processes Sleep()ing away.

What you are running into is the difference between pragmatism and theory, I am a pragmatist and if it tests up correctly, I trust the testing, not the theory and when the testing demonstrates that the processor usage is immesurable, it has passed the test.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Jibz

Quote from: hutch--Now I will still make the point that you get nothing for nothing and if you pass this capacity to the OS, it polls for the result and this takes time, even if its only a longer list of processes to test.

Do you have any background info supporting the claim that WaitForSingleObject will start a polling loop inside the OS doing the same as your loop?

Quote from: Inside Windows 2000To synchronize with an object, a thread calls one of the wait system services the object manager supplies, passing a handle to the object it wants to synchronize with. The thread can wait on one or several objects and can also specify that its wait should be canceled if it hasn't ended within a certain amount of time. Whenever the kernel sets an object to the signaled state, the kernel's KiWaitTest function checks to see whether any threads are waiting on the object. If they are, the kernel releases one or more of the threads from their waiting state so that they can continue executing.

The following example of setting an event illustrates how synchronization interacts with thread dispatching:


  • A user-mode thread waits on an event object's handle.
  • The kernel changes the thread's scheduling state from ready to waiting and then adds the thread to a list of threads waiting for the event.
  • Another thread sets the event.
  • The kernel marches down the list of threads waiting on the event. If a thread's conditions for waiting are satisfied, the kernel changes the thread's state from waiting to ready.
...

hutch--

No no, it does not START a polling loop, it adds to a system process that IS a polling loop and this is because there is no other way to peform the task. View the overhead of system polling with the percentage of processor time taken with interrupts. It is still the situation that you get nothing for nothing and any process that is placed in a wait state must be polled at some interval to determine when it is supposed to finish.

As long as hardware is clock driven instruction sequences and it has no reliable way of determining the delay in user input, there is no other choice that to check it at some interval to see if the user has done something. Now the terminology can be hidden in the terminology of system events, wait states, signal states and the like but if it goes back every so often to check if there has been user input, it is POLLING.

A rose by any other name still has spikes on it and in this case, there is an overhead for dumping something into an OS based process.

Now I think I can fairly say that I have listened to what you have to say and while I have no complaints to you using the technique that you prefer, I am happy enough to differ as I am equally happy enough with the tested results but I will not waste time on this idea when I see the result as having tested perfectly and does what it was intended to do.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Jibz

Quote from: hutch--Now I think I can fairly say that I have listened to what you have to say and while I have no complaints to you using the technique that you prefer, I am happy enough to differ as I am equally happy enough with the tested results but I will not waste time on this idea when I see the result as having tested perfectly and does what it was intended to do.

You continue to amaze me.

hutch--

I had the time to play with a test pece using the C function you posted which I built into a library along with the polling loop version and wrote a seperate test piece for each technique. I then manually started 50 instances of both to test their processor usage. Both settled down on this particular machine to 0% but while the polling version showed no change in the processor usage at all, the WaitForSingleObject C version because of the loop code it runs showed a spike of roughly 10% which settled down to 0% after it had started up.

There may be some problem withe the C version in that it cannot be started from File Manager with the Enter key as it catches the keystroke and exits. The polling version performs flawlessly as expected and does not have this problem.

Wait version as built by the vctoolkit compiler is 151 bytes in size, the polling loop version is 31 bytes. The attached zip file has both examples and the 2 modules built into a library.

Is it possible to both gut the C code and fix the trapping of the enter key ?

[attachment deleted by admin]
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

hutch--

My C is very rough these days but here is a C equvalent to the MASM proc wait_key.


// «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

#include <\vctoolkit\include\windows.h>
#include <\vctoolkit\include\conio.h>

// «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

void __stdcall key_wait()
    {
    while (_kbhit())
      {
      Sleep(10);
      }
      FlushConsoleInputBuffer(GetStdHandle(STD_INPUT_HANDLE));
    }

// «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Jibz

Quote from: hutch--There may be some problem withe the C version in that it cannot be started from File Manager with the Enter key as it catches the keystroke and exits. The polling version performs flawlessly as expected and does not have this problem.

That's actually a fun problem. The reason is that you are holding down the Enter key so long that the key release happens when the executable is running. The C code detects key release instead of key press, so it exits :bg.

To fix it, just change

    if (ir.EventType == KEY_EVENT && ir.Event.KeyEvent.bKeyDown == 0) return;

to

    if (ir.EventType == KEY_EVENT && ir.Event.KeyEvent.bKeyDown) return;

which will make it detect key press instead :U.

Quote from: hutch--Wait version as built by the vctoolkit compiler is 151 bytes in size, the polling loop version is 31 bytes. The attached zip file has both examples and the 2 modules built into a library.

The wonder of a runtime dll :toothy.

The reason the polling loop is so relatively short is of course that it is calling kbhit in msvcrt.dll, which does the work. Here is a rundown of what kbhit does:


  • Lock
  • Call GetNumberOfConsoleInputEvents to get the number of events available
  • Allocate memory for those events using malloc
  • Get events using PeekConsoleInput
  • Loop through events checking for a key press event
  • Free memory
  • Unlock

Now, of course most of the time while waiting the number of available events will be zero, which skips a large part of the code. But in the end you need about the same code to loop over events and find those of interest either way.

Btw, here is a version that will detect a left mouse click as well:


void waitforkeypress()
{
    HANDLE hin;
    DWORD dwNumEvents, dwNumRead;
    INPUT_RECORD ir;
    DWORD dwOldMode;

    /* get standard input handle */
    hin = GetStdHandle(STD_INPUT_HANDLE);

    if (hin == INVALID_HANDLE_VALUE) return;
       
    /* save current console mode */
    GetConsoleMode(hin, &dwOldMode);
   
    /* enable mouse input events */
    SetConsoleMode(hin, ENABLE_MOUSE_INPUT);

    /* flush input buffer */
    FlushConsoleInputBuffer(hin);

    /* wait loop */
    while (1)
    {
        /* wait for an input event */
        WaitForSingleObject(hin, INFINITE);

        /* get number of events */
        if (GetNumberOfConsoleInputEvents(hin, &dwNumEvents) && dwNumEvents)
        {
            /* loop through events */
            while (dwNumEvents--)
            {
                /* read event */
                if (ReadConsoleInput(hin, &ir, 1, &dwNumRead) && dwNumRead)
                {
                    /* if it's a key being pressed, return */
                    if (ir.EventType == KEY_EVENT && ir.Event.KeyEvent.bKeyDown)
                    {
                        SetConsoleMode(hin, dwOldMode);
                        return;
                    }

                    /* if it's the left mouse button being pressed, return */
                    if (ir.EventType == MOUSE_EVENT &&
                        (ir.Event.MouseEvent.dwButtonState & FROM_LEFT_1ST_BUTTON_PRESSED))
                    {
                        SetConsoleMode(hin, dwOldMode);
                        return;
                    }
                }
            }
        }
    }
}