News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Stack Overflow issue

Started by dedndave, August 31, 2009, 09:30:41 PM

Previous topic - Next topic

dedndave

i am writing a proc that places an array on the stack
to be OS-friendly, i check the stack space before i allocate
if esp minus the array size is going to be less than 512 bytes, i abort the allocation
initially, esp is at 0012FF80h
i can allocate 12156 (2F7Ch) bytes - no problem (0012FF80-00002F7C=0012D004)
if i try to allocate 12160 (2F80h) bytes - the program exits - with no error messages - no dr watson
is there something magical about esp=0012D000 or subtracting more than 12156 bytes at a time ??

i have a solution, of course - if i need to allocate more than X number of bytes i can use the heap
but, i want to understand the limitations of allocating stack space, as well

EDIT - what are the other restrictions regarding the stack
i thought i could point it anyplace i wanted, so long as the block has write access privilege
of course, i know i am not supposed to wrap it from 00000000 to FFFFFFFC with a push - lol

Tedd

More than likely, you get 12k for the stack (three 4k pages) - so that's 2F80h + whatever's already allocated one the stack (making another 80h bytes.)
The next 4k page after would either be absent, or a guard page (which is still actually absent) - but then I'd still expect to see an exception generated, or the stack extended so it's not a problem.
No snowflake in an avalanche feels responsible.

dedndave

Thanks Tedd
yes - i would expect to see my good buddy, dr watson
but, you are right - i found this post by Zooba...
QuoteThe stack reserves the size specified by the linker but it doesn't commit it until you try and access it. It detects attempted accesses using a guard page, so the first time you attempt to access a page that hasn't been committed, Windows will catch an exception and commit it. However, it only guards one page, which is the next uncommitted page. Attempting to access the page beyond the guard page will cause an access violation exception.

The local variables on the stack are 'allocated' by simply changing the stack pointer (ESP). Until you attempt to access the next page there's no extra memory allocated. By shifting the stack pointer by more than one page (probably 4096 bytes) you risk missing the guard page and attempting to access an uncommitted page which causes an access violation.
i also found some related posts regarding "stack probing" - all very good info   :U
in days of old, a "page" was 256 bytes
with newer memory techniques, i am sure that has changed
how big is a page ?
got it - GetSystemInfo

hutch--

Dave,

You can have a play with the two linker options, Stack reserve and Stack commit to see if you just need more stack space. Its something you have to do if you use large stack recursion for things like Quick sorts and other similar recursive algos.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Astro

I was reading something on this recently.

I was under the impression that when you accessed the "guard" page, it looked to see how much you wanted and allocated it. I couldn't quite see how it knew though if you wanted more than one page worth. I guess it doesn't.

I think the only solutions are:

* To recursively access the guard page for the next allocation, getting the system to slowly expand the stack space
or
* do as Hutch said and allocate at link time.

A little bit more on it. I can't find the stuff I was reading.

http://www.ravenbrook.com/project/mps/master/manual/wiki/c-stack.html

Best regards,
Astro.

dedndave

Thanks, Hutch
this is for a big-num routine
big-nums usually run 512 to 1024 bits
i just wanted to accomodate larger numbers
what i think is a good approach is, allow stack allocation up to some size limit (say, 1024 bytes)
if they are messing with larger values than that, they can expect a little overhead in the way of heap allocation
does that sound like a reasonable plan ?

hutch--

Dave,

Is there any particular reason to want to use the stack rather than just allocating a big enough block of memory to do what you are after. Your normal PE stack size is 1 meg and while you can make this a lot larger if you need to, allocating 16  meg is trivial these days and it should be a much more convenient play pen for working on large numbers than using the stack. The o0ther factor is you can change the allocation size with dynamically allocated memory where the stack limit is fixed once the application starts.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

i was thinking of speed, primarily
allocating from the heap slows things down a bit
as i mentioned, normal use is 1024 bits (that's only 128 bytes)
it seems that 1K of stack space is a nice value (could even be 2K, i suppose)
but - a 1K-byte number is 8,192 bits - lol - i have never seen such requirements
if they are playing with numbers that large, they probably aren't expecting lightning speed
i was just trying to weigh what is fast against what is practical

ecube

you can use the stack prob macro and it wont effect performance and will fix stack overflow/silent death issues.

;stackprobe.inc

;:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
;
; MASM Stack Probing PROLOGUE macro
; by chep, 2005/06/22
;
;:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
;
; Allows a procedure to safely use LOCAL variables with a total size of 4kb or more,
; using an unrolled stack probing method.
;
;:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
;
; Usage:
;
;   OPTION PROLOGUE:STACKPROBE
;   MyProcedure PROC ; ...
;     ; ...
;   MyProcedure ENDP
;   OPTION PROLOGUE:PROLOGUEDEF
;
;:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
;
; Notes:
;   - When the total size of the LOCAL variables is less than 4kb, the code generated is
;     identical to PROLOGUEDEF, so there is no drawback using this macro
;   - See "OPTION PROLOGUE" and "PROC" topics in MASM32.HLP for the macro specifications
;
;:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
;
; Limitations compared to PROLOGUEDEF:
;   - Stack probing is relevant only for Windows, ie FLAT model, so it won't accept other models
;   - Due to the FLAT model restriction, LOADDS is not supported
;   - FORCEFRAME argument doesn't generate a correct epilogue when no LOCAL variables are defined
;     So it is not supported for now :(
;
;:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
;
; TODO:
;   - Allow looped probing additionally to the current unrolled probing
;     This behaviour should be controlled through macro arguments
;
;:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

STACKPROBE MACRO procname, flags, argbytes, localbytes, reglist, userparms:VARARG
  LOCAL parm, reg, probe, max_probe, line, page_size
  ;; Memory page size, can be changed for other architectures
  page_size = 4096
  ;; Throw an error if not in FLAT model, because stack probing is irrelevant otherwise
  IF @Model NE 7
    line TEXTEQU %@Line
    % ECHO @FileCur(line) : STACKPROBE prologue ERROR: stack probing is irrelevant if not using FLAT model
    .ERR
    EXITM <1>
  ENDIF
  ;; Detect macro arguments to give a warning
  IFNB <userparms>
    FOR parm,<userparms>
      line TEXTEQU %@Line
      % ECHO @FileCur(line) : STACKPROBE prologue warning: unknown prologue argument : parm
    ENDM
  ENDIF

  ;; Set up stack frame
  IF localbytes GT 0
    push ebp
    mov  ebp, esp
    IF localbytes LT page_size
      ;; Normal stack frame, no probing
      add  esp, (NOT localbytes)+1
    ELSE
      ;; Unrolled stack probing
      max_probe = (localbytes+page_size-1) AND (-page_size) ;; round up to next page size
      probe = page_size
      WHILE probe LE max_probe
        ;; Probe stack
        mov  DWORD PTR [ebp-probe], eax
        probe = probe + page_size
      ENDM
      add  esp, (-localbytes)
    ENDIF
  ELSEIF argbytes GT 0
    push ebp
    mov  ebp, esp
  ENDIF

  ;; USES clause
  IFNB <reglist>
    FOR reg,reglist
      push reg
    ENDM
  ENDIF

  EXITM <0>
ENDM


dedndave

thanks E^Cube
yes - i saw that in a previous post
i was "reading your mail" - lol

Gunnar Vestergaard

I must thank you for the stackprobe.inc. Do I understand correctly that this include file will provide me with e.g. 100KB of stack space if necessary?

Gunnar

ecube

Quote from: Gunnar Vestergaard on December 27, 2009, 12:03:51 AM
I must thank you for the stackprobe.inc. Do I understand correctly that this include file will provide me with e.g. 100KB of stack space if necessary?

Gunnar

yeah afaik, since i've been using the inc I haven't had any issues and been able to alloc as much memory as I wanted. Maybe someone can test it on really large memory, or a lot of smaller blocks.

MichaelW

The problem with using the stack for large buffers is that once the stack commit size is exceeded, each probe triggers a guard-page exception. Depending on the number of pages required reserving space on the stack could be much slower than allocating the space from memory. The attachment compares the cycle counts for both methods, over a range of allocation sizes. As built, the size of the stack commit is 4096 bytes, so for anything larger than 2 pages the probing is required. Typical results on my Windows 2000, P3 system:

77 cycles, 16KB from stack
173 cycles, 16KB with realloc
688 cycles, 64KB from stack
184 cycles, 64KB with realloc
2650 cycles, 256KB from stack
179 cycles, 256KB with realloc

53 cycles, 16KB from stack
158 cycles, 16KB with realloc
644 cycles, 64KB from stack
157 cycles, 64KB with realloc
2491 cycles, 256KB from stack
175 cycles, 256KB with realloc

52 cycles, 16KB from stack
157 cycles, 16KB with realloc
677 cycles, 64KB from stack
158 cycles, 64KB with realloc
2490 cycles, 256KB from stack
174 cycles, 256KB with realloc

eschew obfuscation

Biterider

Another potential problem is that the new allocated stack pages remain commited until the end of the app. There are tricks to decommit this stack pages but usually it is not the prefered way to go for large allocations.

Biterider

ecube

thanks MichaelW, I guess for really large mem requirements it's better to use API but for something like

which is 32KB and IMO quite a bit of memory only takes 7 cycles on my machines. Also in your test you're assuming it's 1 large buffer, i'm not sure if it'd effect things, but generally speaking most users use multiple buffers of various sizes, so you'd have to call your globalrealloc functions multiple times to make it more fair on large buffers instead of just once.


OPTION PROLOGUE:STACKPROBE
StackTest proc
local buff[1024]:BYTE
local buff1[1024]:BYTE
local buff2[1024]:BYTE
local buff3[1024]:BYTE
local buff4[1024]:BYTE
local buff5[1024]:BYTE
local buff6[1024]:BYTE
local buff7[1024]:BYTE
local buff8[1024]:BYTE
local buff9[1024]:BYTE
local buff10[1024]:BYTE
local buff11[1024]:BYTE
local buff12[1024]:BYTE
local buff13[1024]:BYTE
local buff14[1024]:BYTE
local buff15[1024]:BYTE
local buff16[1024]:BYTE
local buff17[1024]:BYTE
local buff18[1024]:BYTE
local buff19[1024]:BYTE
local buff20[1024]:BYTE
local buff21[1024]:BYTE
local buff22[1024]:BYTE
local buff23[1024]:BYTE
local buff24[1024]:BYTE
local buff25[1024]:BYTE
local buff26[1024]:BYTE
local buff27[1024]:BYTE
local buff28[1024]:BYTE
local buff29[1024]:BYTE
local buff30[1024]:BYTE
local buff31[1024]:BYTE
ret
StackTest endp
OPTION PROLOGUE:PROLOGUEDEF