Print Page - why console subsystem slower

Title: why console subsystem slower
Post by: thomas_remkus on July 31, 2009, 07:13:31 PM

I have been trying to understand how a friend's NASM code was over 7 times faster than the same ported code to MASM. I was sure that I had everything correct so I dug into it. When I execute the nasm code (from dos) it executes VERY fast at about 1/2 a second. The same MASM was taking me like 2.8-3.8 seconds. If found that the NASM was set to "Windows GUI" and the MASM I had set to "console". When I changed that in the MASM they both ran the same perf. WHY??? What's it about the console setting that's so ugly? Is it because it's a subsystem and I have to pass through it all the time?

Here's the code I'm working with:

Code Select

.586
.model flat, stdcall
option casemap:none

include \masm32\include\kernel32.inc
includelib \masm32\lib\kernel32.lib

MAX_LOOP_LIMIT equ 429496729
                   
.data
	varA		dq 123.333
	varB		dq 1234533.987
	varC		dq 0

.code
start:
	finit
	mov ebx, MAX_LOOP_LIMIT
	
	__begin:
		fld varA
		fadd varB
		fstp varC
		dec ebx
		jnz __begin
	
    invoke ExitProcess, 0
end start

Title: Re: why console subsystem slower
Post by: dedndave on July 31, 2009, 08:46:11 PM

the console is a dog - lol
especially if it is in the process of outputing characters to the con window
although, i am not sure what masm vs nasm has to do with it

Title: Re: why console subsystem slower
Post by: redskull on August 01, 2009, 03:46:53 AM

Consoles are slow because they interact indirectly via message passing to the subsystem, and not direct kernel calls (one of the last things in NT to do so). When you specify the console switch, it automatically initializes everything for you upon loading, whether you use it or not. If you change the WINDOWS code to do an AllocConsole() call, the speeds should match (the CONSOLE switch effectively just calls AllocConsole() for you in the beginning).

-r

Title: Re: why console subsystem slower
Post by: dedndave on August 01, 2009, 04:04:09 AM

he is saying he sees a large discrepency between console mode code assembled with masm vs nasm
any clues about that redskull ?

Title: Re: why console subsystem slower
Post by: MichaelW on August 01, 2009, 05:12:14 AM

The console should have no effect on the loop timing, but the processor speed could have a very large effect. The loop would take about 3.6s to execute on my 10-year old 500MHz P3, and I would not be surprised if a recent, high-end processor could do it in 10% of that time.

Title: Re: why console subsystem slower
Post by: redskull on August 01, 2009, 12:17:59 PM

Quote from: dedndave on August 01, 2009, 04:04:09 AM
he is saying he sees a large discrepency between console mode code assembled with masm vs nasm

The way I read the OP, he's seeing the discrepancy between linking with /SUBSYSTEM:WINDOWS and /SUBSYSTEM:CONSOLE, not MASM and NASM ("When I changed that in the MASM they both ran the same perf. "). If that isn't the case, I haven't the foggiest.

-r

Title: Re: why console subsystem slower
Post by: hutch-- on August 01, 2009, 01:39:06 PM

I do most of my algo testing in console apps as its easier and faster but I also do some in GUI apps and there is no speed difference whatsoever between console and GUI mode. Where CMD.EXE can be slower is when you are dumping data on the screen but to a lesser degree you slow up an algo in GUI mode by writing results to the screen as the algo is running.

Title: Re: why console subsystem slower
Post by: redskull on August 01, 2009, 03:59:08 PM

I would think consoles would HAVE to be slower, if for no other reason that it involves an LPC, and hence twice the kernel mode switches. After all, isn't speed the whole reason they changed the GUI routines from this method to straight kernel calls in the first place? I doubt it would be noticeable, as console use is few and far between, but it would be interesting to test. Of course, all this is moot regarding this code, as it has no output at all (console or otherwise); the only possible place for a slow down (all other things being equal) would be in the loading of the code, and the only thing different about the loading the code is the allocation of a console when you specify /SUBSYSTEM:CONSOLE. I don't really see that taking 4 seconds, though.... I wish M.R. over at sysinternals would do a blog about the consoles, i've never actually seen any indepth info about how they work.

-r

Title: Re: why console subsystem slower
Post by: dedndave on August 01, 2009, 04:15:36 PM

i think console mode is regarded as "unimportant" or "merely a tool"
the console window has several bugs and i doubt anyone at ms cares - lol
in a way, i suppose they are right, too
any "real" app is going to be gui
but, console gives you a way to run batches and scripts and test rudimentary things without a lot of code

Title: Re: why console subsystem slower
Post by: redskull on August 01, 2009, 04:28:05 PM

Quote from: dedndave on August 01, 2009, 04:15:36 PM
...any "real" app is going to be gui...

I'm of the opinion that any 'real' app should be *both* (app depending, of course). There's no better program than one that you can start in an interactive GUI session when you want to, or start with 50 different command line switches to do fully automated batch processing and pipe the output to a file, while you sit around and drink coffee. The new windows PowerShell is a great tool, but not enough cmdlets ship with it.

-r

Title: Re: why console subsystem slower
Post by: ecube on August 01, 2009, 05:37:22 PM

I can't wait until everything is like cooliris. If you don't know what that is, it's a free firefox addon http://www.cooliris.com that lets you zoom through online pics in 3d. It's probably 1 of the coolest things i've seen. If firefox etc was smart they'd buy it from the company and integrate it right in. Having lil snapshot previews of your favorite sites and being able to zoom through, chose 1 where it grows bigger and interact...wud be incredible. I'm trying to clone what they did to market it but my opengl/directx skills are pretty bad heh.

Title: Re: why console subsystem slower
Post by: thomas_remkus on August 02, 2009, 07:53:06 PM

It's true. When I change the subsystem to "windows" I get the same perf. I was amazed at how different the outcome was between console/windows.

Code Select

.586
.model flat, stdcall
option casemap:none

include \masm32\include\kernel32.inc
includelib \masm32\lib\kernel32.lib

MAX_LOOP_LIMIT equ 4294967295
                   
.data
	varA		dq 123.333
	varB		dq 1234533.987
	varC		dq 0

.code
start:
	finit
	mov ebx, MAX_LOOP_LIMIT
	
	__begin:
		fld varA
		fadd varB
		fstp varC
		dec ebx
		jnz __begin
	
    invoke ExitProcess, 0
end start

It was hinted that if this was changed to an SSE/2 implimentation then it's supposedly 4-20 times faster.

Title: Re: why console subsystem slower
Post by: redskull on August 02, 2009, 08:12:00 PM

Do you have any quantitative times for just the executed instructions, or are your numbers 'guesstimations' from the time you run the .EXE (eg, including the loading times)? Also, i'd be interested to see what happens if you link it as a WINDOWS app, but include an AllocConsole() call in the very beginning. I find it almost unbelievable the type of subsystem would actually affect the execution speed once the thread is off and running; after all, you're even using the console.

-r

Title: Re: why console subsystem slower
Post by: dedndave on August 02, 2009, 08:31:10 PM

i played with it a little bit
when you link it for subsystem windows, the program takes as long
because you have no screen output, you do not see anything happen when it is over
it shows up in the task manager, though

Title: Re: why console subsystem slower
Post by: hutch-- on August 03, 2009, 01:36:05 AM

Most of the problem with the example code is it does not isolate the console loading time from the test algo. Put a key press to start the algo and time it properly and you will see why the test code that was posted fails to compare console to GUI. Once a console is allocated, it barely uses any system resource as it only dumps a bit of text on a screen. GUI display is a lot faster than console display as console does not need to be all that fast but at the moment the assumptions are like chalk and chees.

Title: Re: why console subsystem slower
Post by: thomas_remkus on August 03, 2009, 03:25:29 AM

redskull:
I'm working from 'quesstimations' because I don't have any timing code. The "AllocConsole" is interesting so I'll try that.

hutch:
I'll put the keypress in there and see how that works. Uh, "chalk and cheese" ... fantastic. I think what you are saying is that once the console portion is loaded all the other code should run with the same performance.

Title: Re: why console subsystem slower
Post by: dedndave on August 03, 2009, 04:06:35 AM

MichaelW wrote the timing macros we use
they are available in the first post of the first thread of the Laboratory sub-forum

INCLUDE \masm32\include\masm32rt.inc
.686
INCLUDE \masm32\macros\timers.asm

notice that .686 is needed prior to the timers (i think 586 works, too)

you need to define a loop count
once you get the program running, try to adjust it so the loop test takes roughly 1/2 second (usually gives repeatable readings)

LOOP_COUNT = 100000

it is a good idea to restrict execution to a single core - this works on single-core and multi-core machines

INVOKE GetCurrentProcess
INVOKE SetProcessAffinityMask,eax,1

use HIGH_PRIORITY_CLASS for most testing
when it is done, the EAX register holds the cycle count

counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS

; place your test code here

counter_end

print str$(eax),9,"clock cycles",13,10

The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: thomas_remkus on July 31, 2009, 07:13:31 PM