News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

why console subsystem slower

Started by thomas_remkus, July 31, 2009, 07:13:31 PM

Previous topic - Next topic

thomas_remkus

I have been trying to understand how a friend's NASM code was over 7 times faster than the same ported code to MASM. I was sure that I had everything correct so I dug into it. When I execute the nasm code (from dos) it executes VERY fast at about 1/2 a second. The same MASM was taking me like 2.8-3.8 seconds. If found that the NASM was set to "Windows GUI" and the MASM I had set to "console". When I changed that in the MASM they both ran the same perf. WHY??? What's it about the console setting that's so ugly? Is it because it's a subsystem and I have to pass through it all the time?

Here's the code I'm working with:

.586
.model flat, stdcall
option casemap:none

include \masm32\include\kernel32.inc
includelib \masm32\lib\kernel32.lib

MAX_LOOP_LIMIT equ 429496729
                   
.data
varA dq 123.333
varB dq 1234533.987
varC dq 0

.code
start:
finit
mov ebx, MAX_LOOP_LIMIT

__begin:
fld varA
fadd varB
fstp varC
dec ebx
jnz __begin

    invoke ExitProcess, 0
end start


dedndave

the console is a dog - lol
especially if it is in the process of outputing characters to the con window
although, i am not sure what masm vs nasm has to do with it

redskull

Consoles are slow because they interact indirectly via message passing to the subsystem, and not direct kernel calls (one of the last things in NT to do so).  When you specify the console switch, it automatically initializes everything for you upon loading, whether you use it or not.  If you change the WINDOWS code to do an AllocConsole() call, the speeds should match (the CONSOLE switch effectively just calls AllocConsole() for you in the beginning).

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

dedndave

he is saying he sees a large discrepency between console mode code assembled with masm vs nasm
any clues about that redskull ?

MichaelW

The console should have no effect on the loop timing, but the processor speed could have a very large effect. The loop would take about 3.6s to execute on my 10-year old 500MHz P3, and I would not be surprised if a recent, high-end processor could do it in 10% of that time.

eschew obfuscation

redskull

Quote from: dedndave on August 01, 2009, 04:04:09 AM
he is saying he sees a large discrepency between console mode code assembled with masm vs nasm

The way I read the OP, he's seeing the discrepancy between linking with /SUBSYSTEM:WINDOWS and /SUBSYSTEM:CONSOLE, not MASM and NASM ("When I changed that in the MASM they both ran the same perf. ").  If that isn't the case, I haven't the foggiest.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

hutch--

I do most of my algo testing in console apps as its easier and faster but I also do some in GUI apps and there is no speed difference whatsoever between console and GUI mode. Where CMD.EXE can be slower is when you are dumping data on the screen but to a lesser degree you slow up an algo in GUI mode by writing results to the screen as the algo is running.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

redskull

I would think consoles would HAVE to be slower, if for no other reason that it involves an LPC, and hence twice the kernel mode switches.  After all, isn't speed the whole reason they changed the GUI routines from this method to straight kernel calls in the first place?  I doubt it would be noticeable, as console use is few and far between, but it would be interesting to test.  Of course, all this is moot regarding this code, as it has no output at all (console or otherwise); the only possible place for a slow down (all other things being equal) would be in the loading of the code, and the only thing different about the loading the code is the allocation of a console when you specify /SUBSYSTEM:CONSOLE.  I don't really see that taking 4 seconds, though.... I wish M.R. over at sysinternals would do a blog about the consoles, i've never actually seen any indepth info about how they work.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

dedndave

i think console mode is regarded as "unimportant" or "merely a tool"
the console window has several bugs and i doubt anyone at ms cares - lol
in a way, i suppose they are right, too
any "real" app is going to be gui
but, console gives you a way to run batches and scripts and test rudimentary things without a lot of code

redskull

Quote from: dedndave on August 01, 2009, 04:15:36 PM
...any "real" app is going to be gui...

I'm of the opinion that any 'real' app should be *both* (app depending, of course).   There's no better program than one that you can start in an interactive GUI session when you want to, or start with 50 different command line switches to do fully automated batch processing and pipe the output to a file, while you sit around and drink coffee.  The new windows PowerShell is a great tool, but not enough cmdlets ship with it.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

ecube

I can't wait until everything is like cooliris. If you don't know what that is, it's a free firefox addon http://www.cooliris.com that lets you zoom through online pics in 3d. It's probably 1 of the coolest things i've seen. If firefox etc was smart they'd buy it from the company and integrate it right in. Having lil snapshot previews of your favorite sites and being able to zoom through, chose 1 where it grows bigger and interact...wud be incredible. I'm trying to clone what they did to market it but my opengl/directx skills are pretty bad heh.

thomas_remkus

It's true. When I change the subsystem to "windows" I get the same perf. I was amazed at how different the outcome was between console/windows.

.586
.model flat, stdcall
option casemap:none

include \masm32\include\kernel32.inc
includelib \masm32\lib\kernel32.lib

MAX_LOOP_LIMIT equ 4294967295
                   
.data
varA dq 123.333
varB dq 1234533.987
varC dq 0

.code
start:
finit
mov ebx, MAX_LOOP_LIMIT

__begin:
fld varA
fadd varB
fstp varC
dec ebx
jnz __begin

    invoke ExitProcess, 0
end start


It was hinted that if this was changed to an SSE/2 implimentation then it's supposedly 4-20 times faster.

redskull

Do you have any quantitative times for just the executed instructions, or are your numbers 'guesstimations' from the time you run the .EXE (eg, including the loading times)?  Also, i'd be interested to see what happens if you link it as a WINDOWS app, but include an AllocConsole() call in the very beginning.  I find it almost unbelievable the type of subsystem would actually affect the execution speed once the thread is off and running; after all, you're even using the console.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

dedndave

i played with it a little bit
when you link it for subsystem windows, the program takes as long
because you have no screen output, you do not see anything happen when it is over
it shows up in the task manager, though

hutch--

Most of the problem with the example code is it does not isolate the console loading time from the test algo. Put a key press to start the algo and time it properly and you will see why the test code that was posted fails to compare console to GUI. Once a console is allocated, it barely uses any system resource as it only dumps a bit of text on a screen. GUI display is a lot faster than console display as console does not need to be all that fast but at the moment the assumptions are like chalk and chees.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php