News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

zeroing stack space

Started by bushpilot, October 22, 2005, 04:40:09 PM

Previous topic - Next topic

bushpilot

I am working to optimize a compiler's output.  The compiler stores local variables on the stack, and is required to initialize all variables to zero.

Following is an example of code it produced to clear a block of 40 bytes of local data.  The compiler generates this same code, modifying the numeric values in the second and third lines as needed, regardless of how much data it needs to clear, or where on the stack the data is. 

cld
lea edi,[ebp-60]
mov ecx,10
xor eax,eax
rep stosd


My gut feeling if there was only one choice of code output, this would be close to optimal.  However, it seems that to clear one dword, it  would be better to do something like:
mov d[ebx-24],0 

Or for two or several dwords it may be better to do something like:
xor eax,eax
mov [ebx-24],eax
mov [ebx-28],eax
(etc.)


I am guessing as well that for a big block (what is big?) there may be yet another option that is better.

So, does anyone have any suggestions - perhaps 3 or so options for different sized blocks?

Thanks!

Greg


tenkey

If your local stack space is at least 4096 bytes under Win32, then you will want to clear from high address down to low address to handle a stack growth issue. For example, if you allocate 16K by subtracting from ESP, accessing [ESP-16384] will cause an exception.

You might want to check if back-to-back PUSHes are better than multiple MOVs for small amounts.

Also, which processor model are you targeting? Some of the processors need different optimizations from the others.
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

bushpilot

QuoteAlso, which processor model are you targeting? Some of the processors need different optimizations from the others.

No specific target.

Greg

Codewarp

bushpilot,

If you really want to optimize this, don't zero them at all if they are being given initial values by the application (as they should be).  Also, I question your assertion that "compilers" must zero local variables.  This has not been my experience--in C++, uninitialized stack variables have undefined values (for which you may be warned by the compiler).

hutch--

Initialising local variables seems to vary from compiler to compiler. It is a specification of basic to do this but for example a C compiler does not. I prefer the non zeroed locals as you may wish to initialise them yourself which makes more work at procedure entry if you zero them as well.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

bushpilot

Actually what I said was "The compiler stores local variables on the stack, and is required to initialize all variables to zero."  It is a specification of this compiler, and I cannot change it.  It is a BASIC variant.

I will likely add a new non-initialized variable type, but that does not change this need.

Greg

Codewarp

Oh, you didn't tell us about the Basic Variant (or I didn't see it).  That would change everything.  Now, my comment is this:  you had better carefully measure the current speed of this operation you intend to optimize, i.e. its clock counts, because if you think that saving a few clock cycles on an operation that already takes, perhaps, hundreds of cycles, is going to buy you anything, you might be fooling yourself. ::)

bushpilot

Tenkey, your back-toback pushes seems to be a winner.  Thanks.

I'll also explore using the MMX registers for bigger blocks, maybe more than 512 bytes.

Greg

ToutEnMasm

Hello,
Here is the method i use with masm32,it clear all the locals in masm32 proc.
You pass only the last name of  the locals to the macro.

ToutEnMasm


ZEROLOCALES MACRO dernierelocale:REQ
mov ecx,ebp
lea edx,dernierelocale
sub ecx,edx
.if ecx != 0
push edi
mov edi,edx
mov al,0
cld
rep stosb
pop edi
.endif
ENDM





                                 

bushpilot

Nice Macro.  If you work it out to use stosd instead, it will be much faster.

Greg

ToutEnMasm

Hello,
more faster with stosd ,yes,but some Api use byte as a parameter.
They are very few to do it,but what happened if one byte more is put to zero,a crash ?.
And one byte less,perhaps an API parameter not correctly initialised and a random result.
                                 ToutEnMasm