News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Macro for initialising local variables

Started by jj2007, October 08, 2007, 03:02:09 PM

Previous topic - Next topic

jj2007

Inspired by ToutEnMasm's ZEROLOCALES, I am trying to figure out an elegant (=tiny and fast) macro to clear the local variables. Here is what I got; I tested it quite a lot but am still uncertain whether my logic is valid for all situations, therefore please feedback! :bg

##### The Macro #####
ClearLocals MACRO
mov edx,edi   ; move edi to a free ACD register
mov ecx,ebp   ; Base pointer into counter
mov edi,esp   ; current stack pointer as destination
sub ecx,edi   ; subtract from counter
mov al,0      ; fill with zeros
cld      ; forward
rep stosb      ; fill
mov edi,edx   ; get edi back for Windows
ENDM

##### Its usage: just call ClearLocals directly after the LOCALs #####
##### example taken from the recent MAPISendMail thread #####

SendMail proc MsgSubject:DWORD, MsgBody:DWORD, MsgPathAtt1:DWORD, MsgPathAtt2:DWORD

  LOCAL hMAPI:DWORD
  LOCAL dll_SEND:DWORD
  LOCAL MMessage:MapiMessage   ;48 bytes
  LOCAL MsgFile2:MapiFileDesc   ;structure on stack needs reverse order: File2 first
  LOCAL MsgFile1:MapiFileDesc   ;24 bytes
  LOCAL MsgTO:MapiRecipDesc   ;24 bytes

  ClearLocals       ;zeros from first to last local (eAx, eCx, eDx will be altered)

   invoke LoadLibrary, chr$('MAPI32.dll')
... etc

Mark Jones

"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08


drizz

Quote from: jj2007 on October 08, 2007, 03:02:09 PM
Inspired by ToutEnMasm's ZEROLOCALES, I am trying to figure out an elegant (=tiny and fast) macro to clear the local variables. Here is what I got; I tested it quite a lot but am still uncertain whether my logic is valid for all situations, therefore please feedback!
you can use dword-stosd without any worries as stack will never be unaligned (non dword).
also you can safely remove cld because all windows api-s assume that direction flag is cleared (conventional).
hence if one changes the dir.flag one should also clear it (if you use std - use cld when you are done).
ClearLocals MACRO
mov edx,edi
mov ecx,ebp
mov edi,esp
sub ecx,edi
xor eax,eax
shr ecx,2
rep stosd
mov edi,edx
ENDM
The truth cannot be learned ... it can only be recognized.

jj2007

Quote from: drizz on October 08, 2007, 05:35:47 PM
you can use dword-stosd without any worries as stack will never be unaligned (non dword).
also you can safely remove cld because all windows api-s assume that direction flag is cleared (conventional).
hence if one changes the dir.flag one should also clear it (if you use std - use cld when you are done).

Re stack:

I guess a LOCAL MyByte:BYTE would cause the stack to increase by a WORD.
However, if you declare a
LOCAL MyWordVar:WORD
will the stack increase by a DWORD? if no, then

shr ecx,1
rep stosw

would be more appropriate.

I vaguely remember that shr with a counter of more than one costs a cycle more, while mov ax,0 and xor eax,eax cost roughly the same. Any opinions?

By the way, the macro could be modified to become a procedure - an option if you have lots of procedures that require initialising of locals.

Re direction flag: you are prefectly right - I googled this up:
http://support.microsoft.com/kb/106262

On Intel chips, DF can be set to 1 with the STD instruction and can be cleared to 0 with the CLD instruction. If a function sets DF to 1, it should clear DF before terminating. This allows all functions to make the assumption that DF is always 0.

All C run-time functions correctly clear DF upon termination. However, if an exception occurs before a function has a chance to clear DF, the flag will still be set when the exception handler is executed.

drizz

Quote from: jj2007 on October 09, 2007, 08:10:28 AMI guess a LOCAL MyByte:BYTE would cause the stack to increase by a WORD.
However, if you declare a
LOCAL MyWordVar:WORD
will the stack increase by a DWORD? if no, then

shr ecx,1
rep stosw

would be more appropriate.
do i need to repeat my self? :eek byte variable will also cause 4 byte allocation on stack.
The truth cannot be learned ... it can only be recognized.

jj2007

Quote from: drizz on October 09, 2007, 02:56:19 PM
Quote from: jj2007 on October 09, 2007, 08:10:28 AMI guess a LOCAL MyByte:BYTE would cause the stack to increase by a WORD.
However, if you declare a
LOCAL MyWordVar:WORD
will the stack increase by a DWORD? if no, then

shr ecx,1
rep stosw

would be more appropriate.
do i need to repeat my self? :eek byte variable will also cause 4 byte allocation on stack.

No need for rolling your eyes, Drizz - I asked explicitly what happens with a WORD, not BYTE, local. And I wouldn't ask if 1) it had been explained in the masm32.hlp, see below, and 2. I would be sure that there is no option to "pack" wasteful byte declarations on the stack.

Cheers, JJ :bg

Syntax:   LOCAL name [[count]][:qualifiedtype] [, name [[count]]
            [:qualifiedtype]]...

  Description:

     Generates code to create one or more stack (automatic) variables,
     which can be accessed only within the current procedure. The
     assembler uses the same method used by high-level languages to
     create local variables.

     The <name> parameter is the name of the variable, and <count> is
     an optional expression (which must appear in square brackets)

     indicating the number of elements to allocate. The <qualifiedtype>
     parameter is any qualified type appropriate to <name>. The default
     <qualifiedtype> is WORD in a 16-bit segment and DWORD in a 32-bit
     segment.

     Once declared in a LOCAL statement, local variables can be
     referred to by name. The assembler translates references to these
     variables into references to their actual location on the stack
     using the BP indirect addressing mode.


     The assembler will generate an error if you have already defined
     <name> as a label.

  Example:

     LOCAL     array[20]:BYTE

drizz

>> ::) <<  :P
Quote from: jj2007 on October 09, 2007, 08:10:28 AMI guess
i advocate "go and try it out then ask" way of doing things...  :bg
so you should have made a testproject and debug with olly trying out different combo for local variables

Cheers  :bg
The truth cannot be learned ... it can only be recognized.

hutch--

jj,

Here is what the idea looks like in Basic which does this by specification. It also sets a stack frame and does register preservations as well but it zeros the locals before user code is run.


0040111B                    fn_0040111B:
0040111B 55                     push    ebp             ; set stack frame
0040111C 8BEC                   mov     ebp,esp
0040111E 53                     push    ebx             ; preserve 3 registers
0040111F 56                     push    esi
00401120 57                     push    edi
00401121 83EC64                 sub     esp,64h         ; allocate local storage
00401124 681B114000             push    40111Bh
00401129                    loc_00401129:
00401129 31F6                   xor     esi,esi         ; clear ESI & EDI
0040112B 31FF                   xor     edi,edi
0040112D B90D000000             mov     ecx,0Dh         ; set counter
00401132                    loc_00401132:
00401132 56                     push    esi
00401133 49                     dec     ecx
00401134 75FC                   jnz     loc_00401132    ; loop back to count in ECX
00401136 90                     nop
00401137 90                     nop
00401138 90                     nop
00401139 90                     nop
0040113A 90                     nop
0040113B 90                     nop
0040113C 8B8578FFFFFF           mov     eax,[ebp-88h]
00401142 8D65F4                 lea     esp,[ebp-0Ch]
00401145 5F                     pop     edi
00401146 5E                     pop     esi
00401147 5B                     pop     ebx
00401148 5D                     pop     ebp
00401149 C20400                 ret     4
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: hutch-- on November 02, 2007, 08:12:27 PM
jj,

Here is what the idea looks like in Basic which does this by specification. It also sets a stack frame and does register preservations as well but it zeros the locals before user code is run.
Thanks, very interesting. I guess you disassembled Visual Basic? The code is a lot more clumsy than the little macro (largely based on ToutEnMasm's code), but the principle is the same. I was a bit uncertain about the exact rules for local variables, i.e. 4-byte allocation on the stack even for byte & word variables, but colleagues here are always helpful in explaining :U

hutch--

Its actually PowerBASIC which conforms to the specification for basic, stack frame, preserves all of the correct registers, zeros the local variables and sets strings to a NULL string. Align stack variables to a minimum of 4 bytes, larger data types should be aligned at their data size.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Rockoon

the reason that basic snippet (which doesnt look like VB by the way) uses push's instead of stos's is because push doesnt depend on the direction flag and its usualy bad form in an open development environment to alter the direction flag without changing it back (ie, this could be a library procedure called from an arbitrary development environment such as vb, vc, gcc, c#, delphi, etc.. ) .. nor should the code make assumptions about its state

I would use this sort of strategy to both allocate and initialize the locals:


init macro vars
 xor eax, eax
 if vars le 256
   mov cl, vars and 255 ; encode modulo 256
@@do:
dec cl
push eax
jnz @@do
 else
  mov ecx, vars
@@do:
dec ecx
push eax
jnz @@do
 endif
endm


weighs in at 9 bytes for 256 or less locals, or 11 bytes for 257 or more locals
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

jj2007

Quote from: Rockoon on November 03, 2007, 05:23:53 PM
I would use this sort of strategy to both allocate and initialize the locals:
Fine if you have access to MASM's "LOCAL" routine (you aren't working for Microsoft by accident?  :bg)
You also would have to know the number and size of the local variables.
The ClearLocals macro builds on current MASM behaviour, and works with all types of local variables including structures, by simply using the difference between base page and stack pointer after allocation of locals.

jj2007

Quote from: hutch-- on November 03, 2007, 04:34:05 PM
Its actually PowerBASIC
Sigh! I have tinkered a bit with PB, hoping I could substitute my trusty old 16-bit GFA with a more recent model. However, PB has a bloat factor of about 5, and the compiler is roughly a factor of 10 slower; that on top of the porting difficulties for a 20,000 lines source forced me to keep an awkward mix of old Basic with a bit of new MASM... until Vista64 throws me out of the game, haha :bdg

Rockoon

Quote from: jj2007 on November 03, 2007, 06:44:01 PM
You also would have to know the number and size of the local variables.
The ClearLocals macro builds on current MASM behaviour, and works with all types of local variables including structures, by simply using the difference between base page and stack pointer after allocation of locals.

i'm sorry I thought you wanted "(=tiny and fast)"

stosb is definately not fast .. high minimum latency .. and its only a byte at a time

as far as your other criticisms.. are you lazy or something? :) .. none of them are actualy valid .. you can indeed calculate the size of the locals any way you want to and still rip through with pushes ..

..but now here is a criticism .. why would you want to initialize ALL locals? Seems like usualy you would only want to initialize structures.. individual temporary machine words should almost never need initialized memory because they start out as registers ..

.. for instance, any handles that get returned from api calls (such as hMAPI in the original post) .. no initial value is valid here .. the code would always ensure that its value was a non-erroring result of an api call .. so why initialize it?
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.