The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: chep on June 22, 2005, 06:58:50 AM

Title: Stack Probing PROLOGUE macro
Post by: chep on June 22, 2005, 06:58:50 AM
Here is a macro that can be used with OPTION PROLOGUE in order to allow stack probing (when LOCALs are more than 4Kb).

<EDITED>

;:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
;
; Allows a procedure to safely use LOCAL variables with a total size of 4kb or more,
; using an unrolled stack probing method by default.
;
;:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
;
; Usage:
;
;   OPTION PROLOGUE:STACKPROBE
;   MyProcedure PROC ; ...
;     ; ...
;   MyProcedure ENDP
;   OPTION PROLOGUE:PROLOGUEDEF
;
; The ROLLED macro argument generates a loop rather than the default unrolled code:
;
;   MyProcedure PROC <ROLLED> ; ...
;
;:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
;
; Notes:
;   - When the total size of the LOCAL variables is less than 4kb, the code generated is
;     identical to PROLOGUEDEF, so there is no drawback using this macro
;   - See "OPTION PROLOGUE" and "PROC" topics in MASM32.HLP for the macro specifications
;
;:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
;
; Limitations compared to PROLOGUEDEF:
;   - Stack probing is relevant only for Windows, ie FLAT model, so it won't accept other models
;   - Due to the FLAT model restriction, LOADDS is not supported
;   - FORCEFRAME argument doesn't generate a correct epilogue when no LOCAL variables are defined
;     So it is not supported for now :(
;
;:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

I finally gave up making this macro fully compatible with MASM's PROLOGUEDEF.
That's because stack probing is needed only for Windows (ie FLAT model), so LOADDS argument is not supported as it concerns only 16 bit code.

Also, when using FORCEFRAME argument with EPILOGUEDEF, it doesn't generate any epilogue (ie. leave instruction) although it generates the pop instructions corresponding to the USES directive. I really can't figure where this problem comes from, so I also gave up trying to implement FORCEFRAME... :red


Enjoy!

[attachment deleted by admin]
Title: Re: Stack Probing PROLOGUE macro
Post by: chep on June 22, 2005, 01:54:21 PM
At last a useable version.

Unfortunately I don't have the time to dig into the FORCEFRAME bug for the moment, nor to add the looped probing option (only unrolled probing for now...).

Anyway, here it is... (The first post has been updated) ::)
Title: Re: Stack Probing PROLOGUE macro
Post by: Petroizki on June 22, 2005, 04:45:05 PM
You still have to make the stack frame, when you have no locals but you do have proc arguments.

So something like:

  ;; Set up stack frame
  IF localbytes GT 0
    ...
  ELSEIF argbytes GT 0
    push ebp
    mov ebp, esp
  ENDIF


EDIT: Also, wouldn't it be better to use 'mov dword ptr [esp], eax' instead of 'mov byte ptr [esp], 0'?
It's one bytes smaller, and faster..
Title: Re: Stack Probing PROLOGUE macro
Post by: chep on June 22, 2005, 05:23:24 PM
Quote from: Petroizki on June 22, 2005, 04:45:05 PM
You still have to make the stack frame, when you have no locals but you do have proc arguments.
You're perfectly right, I forgot that! :red :red :red
And your proposed fix is perfect.

Quote from: Petroizki on June 22, 2005, 04:45:05 PM
Also, wouldn't it be better to use 'mov dword ptr [esp], eax' instead of 'mov byte ptr [esp], 0'?
It's one bytes smaller, and faster..
You're perfectly right also! :wink


Source code updated...
Thanks for pointing this out! :U :thumbu :thumbu
Title: Re: Stack Probing PROLOGUE macro
Post by: Petroizki on June 23, 2005, 11:38:38 AM
I added stack probing to my own prologue macros (http://www.masmforum.com/simple/index.php?topic=1063.0), it generates code like this on the beginning:

push ebp
mov ebp, esp
sub esp, 3A98 ; reserve stack space
mov dword ptr [ebp-1000], eax ; probe first page
mov dword ptr [ebp-2000], eax ; probe second page
mov dword ptr [ebp-3000], eax ; probe the last page
...

Seems to work, at least it produces less code..
Title: Re: Stack Probing PROLOGUE macro
Post by: chep on June 24, 2005, 03:44:44 AM
You're right (again :green).

A few thoughts however, correct me if I'm wrong :

- I would adjust esp *after* probing the pages, just in case "something else" would use the stack before the last page is probed. Well, that's how VCToolkit's probing function works anyway. (Ok, I should have looked at it before writing my macro, but well... ::))

- Shouldn't add esp, (-size) be slightly faster than sub ? (I suppose it is, as MASM as well as VC use it rather than sub. I didn't make any tests though)

- In the example you give, I think [ebp-3000h] is not the last page. The last probed page should be [ebp-3A98h] (or [ebp-4000h] for instance, it doesn't change anything) :
If the final esp lands on a page boundary (ie. a multiple of 1000h), it will land on the last DWORD of the guard page. But when the next push is made, it will try to access uncommitted memory, and the app will be killed. I'm not sure about this as I haven't managed to produce the "crashing case", but that's how I understand the VCToolkit probing function :


msvcrt_probe  proc ; argument : eax = localbytes

  cmp     eax, 1000h
  jnb     probe_stack
  neg     eax            ; this part is for localbytes < 4k so it's not relevant in our case
  add     eax, esp
  add     eax, 4
  test    [eax], eax
  xchg    eax, esp
  mov     eax, [eax]
  push    eax
  ret

probe_stack:             ; the interesting part...
  push    ecx
  lea     ecx, [esp+8]

probepages:
  sub     ecx, 1000h
  sub     eax, 1000h
  test    [ecx], eax
  cmp     eax, 1000h
  jnb     probepages

probelastpage:
  sub     ecx, eax
  mov     eax, esp
  test    [ecx], eax
  mov     esp, ecx
  mov     ecx, [eax]
  mov     eax, [eax+4]
  push    eax
  ret

msvcrt_probe  endp

; ... in main() :
  push    ebp
  mov     ebp, esp
  mov     eax, 2328h
  call    msvcrt_probe
  ; ...


This has been generated from the following C code (statically linked) :


int main()
{
  char test[9000];
  // ...
}



Well, anyway I have updated the code in the first post.
Title: Re: Stack Probing PROLOGUE macro
Post by: Petroizki on June 24, 2005, 06:34:08 AM
- What would you mean by "something else", a debugger? The probing could be easily done before adjusting esp, but you would have to use instruction that would not change any values in the negative offsets of esp (test, cmp, ...), this would probably make it slightly slower.

- I don't think add and sub have any speed differences, at least according to the optimization guides i have. They are basically the same instruction, on pentium that is.

- I guess your right, but i couldn't get it GPF on Windows XP. Actually you can remove the last two probes, and make it work. I will do some testing on 9x later.
Title: Re: Stack Probing PROLOGUE macro
Post by: chep on June 29, 2005, 03:51:25 PM
Quote from: Petroizki on June 24, 2005, 06:34:08 AM
- What would you mean by "something else", a debugger?
Indeed. I guess a user-mode debugger is the only thing that could tamper the program's stack.

Quote from: Petroizki on June 24, 2005, 06:34:08 AM
but you would have to use instruction that would not change any values in the negative offsets of esp (test, cmp, ...)
I don't understand why?

In fact I simply meant swapping the probing mov instructions and the esp adjustment :

mov DWORD PTR [ebp-1000h], eax
...
mov DWORD PTR [ebp-4000h], eax
sub esp, 4000h

That seems to work fine.
Title: Re: Stack Probing PROLOGUE macro
Post by: Petroizki on June 29, 2005, 05:08:09 PM
It may not be safe to mess with outside the stack; http://board.win32asmcommunity.net/index.php?topic=20128.0.

At least debugging with Whidbey may cause a problem..
Title: Re: Stack Probing PROLOGUE macro
Post by: chep on June 29, 2005, 06:17:22 PM
Ok, I understand now.

But in our case I guess we don't mind if the stack is overwritten by a debugger before esp is adjusted, as we are writing dummy values just to make sure each page is probed.
On the contrary it's more likely a problem could arise if we adjust esp before probing the stack, as a debugger could then hit a non probed, unguarded page, thus leading to a GPF.

Am I wrong?
Title: Re: Stack Probing PROLOGUE macro
Post by: Webring on July 04, 2005, 02:38:05 AM
You are a genious, thankyou so much for this
Title: Re: Stack Probing PROLOGUE macro
Post by: farrier on July 04, 2005, 05:51:00 AM
This was from the post I originally pointed you to:

http://board.win32asmcommunity.net/index.php?topic=19497.15

Code from KetilO:
MainDlgProc proc hWin:HWND,uMsg:UINT,wParam:WPARAM,lPar am:LPARAM
  LOCAL buffer[4096]:byte
  LOCAL buffer2[256]:byte
  LOCAL buffer3[256]:byte
  LOCAL printout[4096]:byte
   
  LOCAL pos:dword
  LOCAL hdi:HD_ITEM

  ;Touching the stack frame
  mov eax,ebp
  .while eax>esp
     mov dword ptr [eax],0
     sub eax,4
  .endw
  push edx
  push esi
  push edi


I found that if you replace
sub eax, 4
with
sub eax, 4096

it works just as well and faster!  Since we only need to touch each page and not each DWORD.

My point is, that the touching took place after all the stack adjustments were made.

The first problem in the above post was when the uses function was used, the push of the "used" registers, caused the guard page errors.

farrier
Title: Re: Stack Probing PROLOGUE macro
Post by: hutch-- on July 04, 2005, 06:31:07 AM
farrier,

Compliments, that is a good technique.  :thumbu
Title: Re: Stack Probing PROLOGUE macro
Post by: Mirno on July 04, 2005, 11:49:28 AM
Here is a first hack at a "probelogue" macro.


probelogue MACRO szProcName, flags, cbParams, cbLocals, rgRegs, rgUserParams
  push ebp
  mov  ebp, esp
  sub  esp, cbLocals

  mov eax, ebp
  .while eax > esp
     mov dword ptr [eax], 0
     sub eax, 4096
  .endw

  FOR usesreg, rgRegs
    push usesreg
  ENDM

EXITM <0>
ENDM


It should be useable with the "OPTION PROLOGUE:probelogue" command.
I've not tested it though, and it will not deal with all the fiddly bits that the default prologue does (near, far, calling convention, and the so on).

Mirno
Title: Re: Stack Probing PROLOGUE macro
Post by: Mirno on July 04, 2005, 05:45:07 PM
New and improved (it produces better code in some cases):

probelogue MACRO szProcName, flags, cbParams, cbLocals, rgRegs, rgUserParams
  LOCAL counter
  LOCAL alignedLocals
  LOCAL whileBias

  alignedLocals = (cbLocals + 3) AND NOT(3)

  whileBias = 2
  IFNB <rgUserParams>
    whileBias = rgUserParams
  ENDIF

  push ebp
  mov  ebp, esp

  IF alignedLocals NE 0
    sub  esp, alignedLocals
  ENDIF

  IF alignedLocals GT (4096 * whileBias)
    .while ebp > esp
       mov DWORD PTR [ebp], 0
       sub ebp, 4096
    .endw
    add ebp, alignedLocals AND NOT(4096 - 1)
  ELSEIF alignedLocals GE 4096
    counter = 0

    WHILE alignedLocals GE counter
      mov DWORD PTR [ebp + counter], 0
      counter = counter + 4096
    ENDM
  ENDIF

  FOR usesreg, rgRegs
    push usesreg
  ENDM


EXITM <0>
ENDM


Note that the while bias comes from the user parameters, the value 2 was chosen because it gives smallest code.


.code
start:
option PROLOGUE:probelogue
blah PROC <8>, a:DWORD, b:DWORD
  LOCAL zyx[4096]:BYTE
  ret
blah ENDP
end start


The "<8>" overrides the default whileBias, allowing you to generate unrolled stack probes for locals greater than 8192 bytes.
Assembling with the default prologue is fine, but with a warning about an unknown prologue user argument.

If someone has code they can test this on I'd be greatful, also if you can test with the wierd and wonderful combinations of near, far, public, private, uses, calling convention, and so on as I've not had the chance (or the knowledge of how they should affect the assembly generated on the default prologue).

This is all untested, I've been looking at the list code generated by MASM so there will almost certainly be errors.

Mirno
Title: Re: Stack Probing PROLOGUE macro
Post by: Petroizki on July 04, 2005, 06:11:12 PM
You guys are reinventing what we have already figured out...  :eek

- Yes, every page only needs to be probed on one DWORD.
- Use 'mov dword ptr [ebp], eax' instead of 'mov dword ptr [ebp], 0', to make probing smaller annd faster.
- Just reserve the local stack at once (sub/add only once), and then probe the pages (or vice versa), it makes less code this way.

Quote from: chep on June 29, 2005, 06:17:22 PMBut in our case I guess we don't mind if the stack is overwritten by a debugger before esp is adjusted, as we are writing dummy values just to make sure each page is probed.
On the contrary it's more likely a problem could arise if we adjust esp before probing the stack, as a debugger could then hit a non probed, unguarded page, thus leading to a GPF.

Am I wrong?
I don't know. What if we overwrite some important value the debugger is currently using? It might be possible that both ways would crash on some debuggers. But i guess your way might be better.
Title: Re: Stack Probing PROLOGUE macro
Post by: chep on July 04, 2005, 07:02:41 PM
Mirno,

All the tests I have done show that only USES has an effect on the generated code. NEAR/FAR/calling convention etc do not affect the generation of the stack frame.

Also, you don't need the alignedLocals thing as MASM automatically rounds up the value for you:
TestProc PROC
  LOCAL odd[3]:BYTE
  ...
TestProc ENDP


generates the following stack frame:
push    ebp
mov     ebp, esp
add     esp, 0FFFFFFFCh ; -4

(even when using a custom prologue, the localbytes argument is already rounded)


Quote from: Petroizki on July 04, 2005, 06:11:12 PM
What if we overwrite some important value the debugger is currently using?
My understanding here is that debuggers don't (or at least shouldn't) leave important values on the stack: as soon as the control is returned to the debugged program, the debugger should assume that the program will mess up the stack (after all, it's the program's stack, not the debugger's).

Well, anyway I guess we'll have hard time really sorting this out... unless a Visual Studio team member shows up to clarify everything! :P
Title: Re: Stack Probing PROLOGUE macro
Post by: chep on July 04, 2005, 07:41:48 PM
I finally added an option for looped probing, using a macro argument (ROLLED) :

OPTION PROLOGUE:STACKPROBE
TestProc PROC <ROLLED> USES esi edi prm:DWORD
  ; ...
TestProc ENDP


It generates the following code:

    push ebp
    mov  ebp, esp
      add  ebp, (-max_probe) ; [1]
    @rolled:
      mov  DWORD PTR [ebp], eax
      add  ebp, page_size
      cmp  ebp, esp
      jne  @rolled ; [2]
      add  esp, (-localbytes)


The loop body itself (from [1] to [2] included) takes 19 bytes, while the unrolled version takes 6 bytes *per page*. So it becomes space-efficient to use the rolled version starting at 4 probed pages, ie. strictly more than 12Kb of LOCALs.

Q: maybe it could be useful to have FORCEUNROLLED / FORCEROLLED arguments, and by default let the macro decide of the most efficient version?