News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

A stack allocation function

Started by Relvinian, October 14, 2006, 12:16:02 PM

Previous topic - Next topic

Relvinian

Hey all,

I've been working on some crazy projects that did a lot of small allocations and freed them repeatedly in a function and I didn't like the fact that I had to do a lot of allocations on the heap. So in turn, I made a very strange but yet short ASM function to allocate space on the stack for me to use. The advantages of this were highly beneficial for what I was doing...

Anyway, if you want to have fun with a dumb function, enjoy...I'm sure with a little creativity on your parts, this could be used in ASM with some fancy macros, something to make sure when accessing variables/data on the stack, you have the correct ESP value, etc.

Also, if you have any questions, suggestions for enhancement, etc, post away...

Relvinian

; ----------------------------------------------------------------------------
; Prototype:
;     LPVOID __stdcall fmStackAlloc(const size_t sizeAlloc);
;
; Description:
;   Allocates the requested amount of memory on the stack. Stack allocation
;   for this function is limited to 4095 bytes or shorter.
;
;   4096 or greater size on the stack requires that each page (4096 bytes) be
;   touched to make sure the Win32/OS has internally allocated it for use by
;   the application. This requires a loop to walk the pages and I was never in
;   the need of allocations over 2048 so I didn't bother coding it.
;
;   NOTE:
;       There are some advantages and disadvantages using this function.
;       Advantages:
;          - Any allocation will automatically be freed upon function exit.
;            No need to call any free routines
;          - All access to this memory is much faster since it is already in
;            the CPU memory cache.
;       Disadvantages:
;          - Allocations are lost once you leave this function.
;          - Function variables are MUCH harder to calculate during the
;            programming of the function.
;          - Changing the function later can lead to many programmer bugs
;            and system crashes.
;
; Returns:
;   Failed : NULL eax (ptr)
;   Success: Ptr to space on the stack
; ----------------------------------------------------------------------------
_fmStackAlloc@4 proc public
    ; get our param from the stack (request size)
    mov eax, [esp + 4]

    ; make sure we aren't trying to allocate more then
    ; 4095 bytes. If so, fail and return 0 (NULL ptr).
    cmp eax, 4096
    jge short notSupported

    ; these next five instruction are the heart of the function.
    ; Amazing what they accomplish. :-)

    ; adjust the stack so we are saving enough room specified.
    ; what we will be doing is calculation the new 'ESP' in EAX
    ; and then swaping EAX and ESP.
    neg eax
    add eax, esp
    add eax, 8     ; need to account for return EIP + PARAM

    ; this is what actually makes sure Win32/OS has allocated/loaded
    ; the stack into its internal memory management.
    test [eax], eax ; make sure OS has it active (page with stack)

    ; swap our registers now that we are done calculation and 'activating'.
    ; and make sure we get our return address to jump back to.
    mov ecx, eax
    mov edx, [esp]
    mov esp, ecx

    ; push the return function address on the stack for proper returning.
    push edx

    ; even though this is a stdcall procedure with arguments, since
    ; we have manually manipulated the stack, we return as if we
    ; didn't have any arguments.
    ret
   
notSupported:
    ; normal return since no stack adjustments were made.  Return
    ; a failed condition.
    mov eax, 0
    ret 4

_fmStackAlloc@4 endp

Seb


Ratch

 Relvinian,

     You might want to look at reply #10 of the link below.  It shows how one can use STRUCTs to manipulate the stack, so that PROCs are not necessary.  Ratch


http://www.masm32.com/board/index.php?topic=3938.0

Relvinian

Quote from: Ratch on October 14, 2006, 05:03:24 PM
Relvinian,

     You might want to look at reply #10 of the link below.  It shows how one can use STRUCTs to manipulate the stack, so that PROCs are not necessary.  Ratch


http://www.masm32.com/board/index.php?topic=3938.0

Ratch,

Below is your structure that you are talking about. I am going to reference this while I talk so you can see a problem with using that an what my funtion does.

Quote
STKPFR        STRUC            ;tailor the STRUCT according to the subroutine
EBXsave      DWORD ?          ;\
EDIsave      DWORD ?          ; >saved registers
ADR1 = $
iUniTest     DWORD ?          ; \
dwBytesRead  DWORD ?          ;  \
iFileLength  DWORD ?          ;   >local variables for subroutine
pBuffer      DWORD ?          ;  /
pText        DWORD ?          ; /
pConv        DWORD ?          ;/
ADR2 = $
return       DWORD ?          ;return address
ADR3 = $
hwndEdit     DWORD ?          ;\
pstrFileName DWORD ?          ; >pushed parameters
ADR4 = $
return1      DWORD ?          ;\
hwnd         DWORD ?          ; \
msg          DWORD ?          ;  >You don't have to add these unless you are in message
wParam       DWORD ?          ; /     processing, and need to reference these params from the
lParam       DWORD ?          ;/      CALLBACK routine. They are already PUSHed on the stack
STKPFR ENDS

Here, your "structure" has the variables which will be used pre-defined (also known during compile time) so the stack is balanced during COMPILE time. Your stack structure can NOT handle dynamically changing stacks during the course of a program running.

What happens if you want to allocated some bytes memory any where from 1 to 4095 during runtime but don't have a "free" spot in your structure for it? And this may or may not happen too. And to complicate matters, I want that allocated on the stack and not from a the Win32 Heap memory management. How do I do this with your "static" structure?

As I see it with your structure, if I wanted space like this, I would have to always code for it during coding time and could waste a mega-load of stack space during runtime.

Also, one of the biggest flaws and most prone to human error coding is the follow:
Quote
INVOKE ReadFile,EBX,[S1$.pBuffer+3*DWORD],[S1$.iFileLength+2*DWORD],EAX,EBP

Why in the world would I want to remember all kinds of offsets like (+3*dword,  +2*dword, etc) for each parameter of a function call. How much work do you think I want to put into writing ASM code?  Assembly takes a little longer to write then higher level languages but it shouldn't be tedious by trying to remember offsets after offsets.

That's why MASM gives you stuff like:
Quote
   LOCAL  mii : MENUITEMINFO  ; structure of info about a menu item

   mov [mii.cbSize], sizeof MENUITEMINFO
   mov [mii.fMask], MIIM_DATA

Here is a simple example of using dynamically allocated memory (stack) to parse a string when the contents and length are not known during compile time. This is ideal to use because you don't want to waste stack space with big buffers of nothing.

FuncParse PROC String:dword, SegEndChr:dword
; why used hard coded sizes of buffers which can lead to wasted stack space
; (and lead to cache thrashing) or a corrupted stack? Instead, use
; dynamically allocated buffers (on the stack of course).

   LOCAL   strHdr  : dword  ; NULL terminated string with the string header info
   LOCAL   strBody : dword  ; NULL terminated string with the string body info
   LOCAL   strTail : dword  ; NULL terminated string with the string tail info
   LOCAL   len     : dword

   ; find the length of the Hdr segment
   mov eax, [String]
   xor ecx, ecx
   
GetHdrSegmentLen:
   movzx edx, byte ptr [eax+ecx]        ; get byte and check for
   cmp edx, [SegEndChr]
   je short @F

   add ecx, 1
   jmp short GetHdrSegmentLen

   ; save our segment length and allocate a stack buffer to hold contents
   mov [len], ecx
   add ecx, 1   ; room for a NULL terminator
   push ecx  ; remember size (with NULL) for later NULLing out memory
   push ecx
   call fmStackAlloc  ; does not NULL out memory contents
   mov [strHdr], eax
   push eax
   call fmClear ; NULL out the buffer
   
   ; copy this segment into our buffer now
   push [len]
   push [String]
   push [strHdr]
   call fmCopy      ; copy the number of bytes into the new buffer now         

   ; Repeat the above code for the body and tail segments. Once you are done,
   ; you now have three dynamically allocated buffers (on the stack) with the
   ; contents of string passed in and sperated from their respective segments
   ; leaving the contents of the original string untouched.

   ; do something with the segments here.  ;-)

   ; when leaving this function, since we allocated the buffers on the stack,
   ; we don't have to worry about freeing and having memory leaks.
   ret
FuncParse endp


There are other reasons too for using functions over macros, structures, etc. What happens if my coding style (or work's code style) prohibits me from mixing .code and .data segments together in a function? What happens if you are worried about a pre-allocated stack size for a function and wasted extra stack space (producing bad CPU cache thrashing)? What about the code bloating with macros? Any number of other reasons a programmer would use a specific reason (function, macro, etc) for their solution. Just as you decide to use your stack structure for functions.

Relvinian

zooba

Great idea, and good use of how procedures are created to make it work that way :U

Personally, I don't mind going to the heap. However, a macro to do this same thing is probably as simple as subtracting the desired size from ESP (and the test [eax], eax, something I wouldn't have thought of doing until it crashed :bg ) which isn't really much of a bloat (especially compared to a call operation). Of course, you need to balance the stack before you deallocate, but I imagine this is the same in yours also.

Great work though.

Cheers,

Zooba :U

Relvinian

Zooba,

Yep, you can create a macro which will contain the code necessray to create a "dynamic length buffer". I personally like functions for most things because as time goes on, you learn new things and go back to functions / macros and try to improve them or add new fetaures. For example, I have designed but not coded yet some new code to go into the fmStackAlloc() which will always make sure the bytes requested are divisible by 4, 8 or 16 and have another param to align it on a 4, 8 or 16 byte address too.

I thought I would at least give a small example of how to use fmStackAlloc in your own MASM32 code. Here's a simple function which shows how easy dynamically allocated stack buffers are and what is necessary and not. So, if you have dynamic allocations less then 4k and you won't need it outside the current function, using the stack to place the allocations a great solution both in terms of speed and also keeps the stack fresh in the cache for faster access in other functions.


NOTE:
   This is a stripped down version of one of my functions that draws text on a owner-drawn menu control.


Quote
SplitMenuText proc uses ebx hDC:dword, pSrcString:dword
   local pTabChar   :dword
   local pSegText  :dword
   local pSegAccel :dword

   ; check and see if this menu item text has a '\t' character which seperates the left portion of the text with the right portion
   push 9  ; '\t'
   push [pSrcString]
   call strChar

   test eax, eax
   je short DrawFullText

   ; we have a menu text item with two parts that we need to draw -- ignoring the \t char itself
   ; 1st part is all text before the '\t' character and left justified
   ; 2nd part is all text after the '\t' character and right justified

   ; Since menu texts are usually small (average of 16-32 bytes per item), we'll just get the length,
   ; and create two allocations on the stack of that size and NULL them out.
   push [pSrcString]
   call strLength
   mov ebx, eax ; remove length for other calls

   ; create our two dynamic buffers from the stack now.
   push eax
   fmStackAlloc
   mov [pSegText], eax

   push ebx
   fmStackAlloc
   mov [pSegAccel], eax

   ; null out both buffers
   push ebx
   push [pSegText]
   call fmClear

   push ebx
   push [pSegAccel]
   call fmClear

   ; find out how long the text is before the '\t' character.
   mov edx, [pSrcString]
   mov eax, [pTabChar]
   sub eax, edx

   ; copy the first segment of text into pSegText --- before the '\t'.
   push eax
   push edx
   push [pSegText]
   call fmCopy

   ; copy the second segment of text into pSegAccel --- after the '\t'
   add [pTabChar], 1    ; skip '\t'
   push [pTabChar]
   push pSegAccel]
   strCopy


   ; now you have two working pointers to text strings that you can do anything you want with.
   ; when you are ready to leave this function, just leave normally...Nothing else required. The memory
   ; allocated by the function will be correctly release and the stack balanced. No additional steps.

   ret
SplitMenu endp


Ratch

Relvinian,

     Interesting points you make, so I will refute them one by one.  First of all, let me say that I developed my "procless" method using structures in order to be more flexible and transparent than the PROC way of defining and using subroutines.  Both are static creatures, and use structure methods.  PROCs do it internally, and my method does it externally.  There are two main advantages of my method.  First, I do not use a precious  register (EBP) in an already register starved CPU.  I can surely find other uses for it in a non-trival application.  Second,  I can sometimes PUSH parameters for a subroutine many instructions beforehand, if they are available, instead of storing them somewhere and having to PUSH them all at once.  That's efficient, but the pre-call PUSHs must be carefully flagged and documented.  The big disadvantage, as you pointed out, is that the stack must be compensated if a stack reference is made after a PUSH.  This might seem daunting at first, but it becomes second nature after a while.

Quote
Here, your "structure" has the variables which will be used pre-defined (also known during compile time) so the stack is balanced during COMPILE time. Your stack structure can NOT handle dynamically changing stacks during the course of a program running.

     At least as much or more than the PROC does.  In the example you gave using the PROC, the buffer allocation is outside of the structure, not within.  I can certainly do it that way too.

Quote
What happens if you want to allocated some bytes memory any where from 1 to 4095 during runtime but don't have a "free" spot in your structure for it? And this may or may not happen too. And to complicate matters, I want that allocated on the stack and not from a the Win32 Heap memory management. How do I do this with your "static" structure?

    Same as above.  The allocation occurs outside of the structure, just as it did in the PROC example you submitted.

Quote
As I see it with your structure, if I wanted space like this, I would have to always code for it during coding time and could waste a mega-load of stack space during runtime.

    Only if you coded the space inside of the structure.

Quote
Also, one of the biggest flaws and most prone to human error coding is the follow:

Quote
INVOKE ReadFile,EBX,[S1$.pBuffer+3*DWORD],[S1$.iFileLength+2*DWORD],EAX,EBP


Why in the world would I want to remember all kinds of offsets like (+3*dword,  +2*dword, etc) for each parameter of a function call. How much work do you think I want to put into writing ASM code?  Assembly takes a little longer to write then higher level languages but it shouldn't be tedious by trying to remember offsets after offsets.

    I addressed this above.  A little more work, but one doesn't use the EBP register with my method.

    Next comes your example of a dynamic stack allocation. There is nothing that the PROC does that my code cannot do if I make a stackframe using the EBP register, which my method certainly can do.  Otherwise I would have to copy the local variables above all the dynamic allocations, and use them that way with a structure.  

Quote
There are other reasons too for using functions over macros, structures, etc. What happens if my coding style (or work's code style) prohibits me from mixing .code and .data segments together in a function?

    A structure is a static creature.  If you do dynamic allocation, you must use registers or store something somewhere is a static location.  That is what my method can do if necessary and what PROCs do with the EBP register.

Quote
What happens if you are worried about a pre-allocated stack size for a function and wasted extra stack space (producing bad CPU cache thrashing)?

    Don't pre-allocate it until you need it and release it when you are through.  My method does not preclude anyone from doing that.

Quote
What about the code bloating with macros?

    I don't know how to answer that because I don't know what you mean.  Either don't write 'em so they bloat, or don't use them if they do.

Quote
Any number of other reasons a programmer would use a specific reason (function, macro, etc) for their solution. Just as you decide to use your stack structure for functions.

    Yep.  Ratch

















Relvinian

Quote from: Ratch on October 15, 2006, 07:24:43 AM

Quote
What about the code bloating with macros?

>>>     I don't know how to answer that because I don't know what you mean.  Either don't write 'em so they bloat, or don't use them if they do.



Simple example of code bloating from macro type coding styles (from macros.asm) in MASM32 package:

Quote
TxtItem MACRO tID, cID, strng
      mov tbb.iBitmap,   I_IMAGENONE
      mov tbb.idCommand, cID
      mov tbb.fsStyle,   BTNS_BUTTON or BTNS_AUTOSIZE
      mov tbb.iString,   tID
      invoke SendMessage,TBhWnd,TB_ADDBUTTONS,1,ADDR tbb
      fn SendMessage,TBhWnd,TB_ADDSTRING,0,strng
    ENDM

It all comes down to personal preference but if you use this macro to add items for toolbars, you are going to have a LOT of duplicate code (bloating) that could be easily encaspulated into a function.

Relvinian

Ratch

Relvinian,

     OK, now I see what you mean, I think.  A MACRO basically generates inline code.  And if that code is a long sequence of instructions and data that is mostly the same , then some memory saving can be had by making that code sequence into a subroutine.  Ratch