Print Page - Avoiding stack frames (again!)

Title: Avoiding stack frames (again!)
Post by: Damos on July 03, 2009, 11:54:41 AM

we're allways talking about avoiding setting up stack frames so as an optimization on here and wondered what you guys thought of this:

say we have a routine that needed arguments as in:

PrintSum proc alpha:DWORD,beta:DWORD
...
PrintSum endp

alpha and beta are then addressed as an offset to ebp

but what if we had a macro that:

MyProc PrintSum,alpha:DWORD,beta:DWORD

would be interpreted as:

.data?
PrintSum_alpha dd ?
PrintSum_beta dd ?
alpha equ PrintSum_alpha
beta equ PrintSum_beta
.code
PrintSum:
...
then undefine alpha & beta at end of routine to release the namespace for other routines.
so now we have no need to set up stack frame, instead we are using uninitialized memory to pass our params onto a routine.
we could also have a new invoke macro that:

MyInvoke PrintSum,2,4
interpretes as:
push 2
pop PrintSum_Alpha
push 4
pop PrintSum_beta
I know there this needs tweaking here and there but what do you think in principle?

Title: Re: Avoiding stack frames (again!)
Post by: dedndave on July 03, 2009, 12:52:06 PM

it sounds good, but i dunno how to "undefine" data labels - lol

this code assumes the labels are already declared

push 2
pop PrintSum_Alpha
push 4
pop PrintSum_beta

for that matter, the values could just as well be permanently declared
but, i think the stack frame method turns out to be faster

i dunno if this is faster or not...

mov dword ptr PrintSum_Alpha,2
mov dword ptr PrintSum_beta,4

on an 8088, it would be faster because there are fewer memory references, but that rule doesn't apply for pentiums, i guess
(well, for word-sized values, at least)

personally, i like to pass parms in register, provided there are only a couple (as in most cases)
but, i am a dinosaur programmer - i have dinosaur thoughts and i write dinosaur code - lol
these guys are used to procs that get INVOKEd
they all like to be C-compatible and, let's face it, windows seems to have been designed around C
as for me, i dislike C and that is why i write in assembler - lol

i was playing with another method that has some potential
you may find it interesting
http://www.masm32.com/board/index.php?topic=11671.msg87985#msg87985
as you can see, noone seems to be interested in my dinosaur ideas - lol
(http://hewlettroad.com/Animated%20Gifs/T%20rex%20walk.gif)

Title: Re: Avoiding stack frames (again!)
Post by: dedndave on July 03, 2009, 01:32:44 PM

on a similar note, one of the other guys in here had a good idea (i forget who it was and can't locate the thread)
instead of using the EBP register to reference locals, use ESP directly and design the
assembler so that it keeps track of the PUSH's and POP's to calculate the offsets
this frees up the EBP register and is a little faster and smaller than the regular stack frame

AProc PROC

sub esp,8 ;2 local dword variables
mov dword ptr [esp+4],1 ;first local var
mov dword ptr [esp],2 ;second local var
.
.
.
push eax ;assembler maintains PUSH count
.
.
.
mov edx,[esp+8] ;first local var new offset
mov ecx,[esp+4] ;second local var new offset
.
.
.
pop eax
.
.
.
add esp,8
ret

AProc ENDP

Title: Re: Avoiding stack frames (again!)
Post by: hutch-- on July 03, 2009, 01:52:24 PM

Damos,

Using global memory in the .DATA or .DATA? section is an old trick from the days when stack space was very limited but there is no reason not to use it today if it does what you want. Stack based local variables have the advantage that you can call another proc from the current one and the values in the first will be the same when the called proc returns which limits nesting of procedures. In most instances this would not matter and you could handle it with a few different sets of variables but you could not perform recursion by this method.

Title: Re: Avoiding stack frames (again!)
Post by: ramguru on July 03, 2009, 03:27:45 PM

Sometimes global variables are good sometimes they're bad.
Let's say you want to create a custom control &
use global variables as temporal variables for better speed.
That would be very unwise. 'Cuz if you are to support
multiple instances of the control on the same window,
you gotta take into account concurrent reads/writes...
So stack-based variables suit better here.

Title: Re: Avoiding stack frames (again!)
Post by: jj2007 on July 03, 2009, 04:29:06 PM

Quote from: dedndave on July 03, 2009, 01:32:44 PM
on a similar note, one of the other guys in here had a good idea (i forget who it was and can't locate the thread)
instead of using the EBP register to reference locals, use ESP directly

On a P4, using ESP directly is 5 cycles faster but becomes a bit longer with every local variable:

Code Select

1891    cycles for 100*call stack_frame_on
1417    cycles for 100*call stack_frame_OFF

Code sizes:
Frame on:       42
Frame off:      46

Test yourself...

Code Select

.nolist
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm			; get them from the [url=http://www.masm32.com/board/index.php?topic=770.0]Masm32 Laboratory[/url]
LOOP_COUNT	= 1000000

.code
start:
	counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
		mov eax, 1234578h
		REPEAT 100
			call stack_frame_on
		ENDM
	counter_end
	print str$(eax), 9, "cycles for 100*call stack_frame_on", 13, 10

	counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
		mov eax, 1234578h
		REPEAT 100
			call stack_frame_OFF
		ENDM
	counter_end
	print str$(eax), 9, "cycles for 100*stack_frame_OFF", 13, 10, 10, "Code sizes:", 13, 10, "Frame on: ", 9
	mov eax, stack_frame_on_END
	sub eax, stack_frame_on
	print str$(eax), 13, 10, "Frame off: ", 9
	mov eax, stack_frame_OFF_END
	sub eax, stack_frame_OFF
	print str$(eax)

	inkey chr$(13, 10, "--- ok ---", 13)
	exit

stack_frame_on proc
LOCAL v1, v2, v3, v4, v5, v6
  mov v1, eax
  mov v2, eax
  mov v3, 1234h
  mov v4, 5678h
  mov v5, 5555h
  mov v6, 6666h
  ret
stack_frame_on endp
stack_frame_on_END:


stack_frame_OFF proc
; LOCAL v1, v2, v3, v4, v5, v6
  add esp, -4*6
  mov [esp], eax		; v1
  mov [esp+4], eax		; v2
  mov dword ptr [esp+8], 1234h	; v3
  mov dword ptr [esp+12], 5678h	; v4
  mov dword ptr [esp+16], 5555h	; v5
  mov dword ptr [esp+20], 6666h	; v6
  sub esp, -4*6
  ret
stack_frame_OFF endp
stack_frame_OFF_END:

end start

Title: Re: Avoiding stack frames (again!)
Post by: dedndave on July 03, 2009, 04:48:34 PM

i think that's because the instruction set is optimized for using EBP

[ebp+4] uses a byte offset
[esp+4] uses a word offset

of course, the LEAVE saves a couple bytes for you

quite a big difference in speed, don't you think?

Title: Re: Avoiding stack frames (again!)
Post by: jj2007 on July 03, 2009, 04:54:59 PM

Quote from: dedndave on July 03, 2009, 04:48:34 PM
quite a big difference in speed, don't you think?

5 cycles on a P4, we'll see on others. But the code becomes very difficult to read and maintain, unless you revert to a pair of macros and do not use esp:

Code Select

MyLocal MACRO args:VARARG
LOCAL tmp$
  .if 1
	MyLocEsp = 0
	FOR arg, <args>
	  	tmp$ CATSTR <arg>, < equ !<dword ptr [esp+>, %MyLocEsp, <]!>>
		tmp$
		MyLocEsp = MyLocEsp + 4
	ENDM
	tmp$ CATSTR <add esp, ->, %MyLocEsp
	tmp$
ENDM

MyRet MACRO
	tmp$ CATSTR <sub esp, ->, %MyLocEsp
	tmp$
	ret
  .endif
ENDM

Usage (dwords only, names can be used only once because they are global):

Code Select


stack_frame_OFF proc
  MyLocal LocV1, LocV2, LocV3, LocV4, LocV5, LocV6
  mov LocV1, eax
  mov LocV2, eax
  mov LocV3, 1234h
  mov LocV4, 5678h
  mov LocV5, 5555h
  mov LocV6, 6666h
  MyRet
stack_frame_OFF endp

Title: Re: Avoiding stack frames (again!)
Post by: dedndave on July 03, 2009, 05:26:21 PM

i see over 400 cycles diff - am i lookin in the wrong spot Jochen ? - lol

Quote1891 cycles for 100*call stack_frame_on
1417 cycles for 100*call stack_frame_OFF

Code sizes:
Frame on: 42
Frame off: 46

Title: Re: Avoiding stack frames (again!)
Post by: jj2007 on July 03, 2009, 05:30:52 PM

Quote from: dedndave on July 03, 2009, 05:26:21 PM
i see over 400 cycles diff - am i lookin in the wrong spot Jochen ? - lol
Quote1891 cycles for 100*call stack_frame_on
1417 cycles for 100*call stack_frame_OFF

Code sizes:
Frame on: 42
Frame off: 46

Divide by 100 :bg

Celeron M:

Code Select

980     cycles for 100*call stack_frame_on
871     cycles for 100*stack_frame_OFF

i.e. 1 (one) cycle faster

Title: Re: Avoiding stack frames (again!)
Post by: dedndave on July 03, 2009, 05:39:36 PM

smokin ! - lol

Title: Re: Avoiding stack frames (again!)
Post by: jj2007 on July 03, 2009, 06:40:17 PM

Just for fun, here a more complex example. On a Celeron M, the proc without frame is about 0.7 cycles faster, a bit longer and definitely trickier - see the print str$(LocV2).

Code Select


.nolist
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm

LOOP_COUNT	= 200000

.code
start:
	print "Test for correctness:", 13, 10
	mov eax, 123456/2		; magic number
	call stack_frame_OFF

	mov eax, 123456/2		; magic number
	call stack_frame_on

	print chr$(13, 10, "Timings:", 13, 10)
	
	counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
		mov eax, 1234578h
		REPEAT 100
			call stack_frame_on
		ENDM
	counter_end
	print str$(eax), 9, "cycles for 100*call stack_frame_on", 13, 10

	counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
		mov eax, 1234578h
		REPEAT 100
			call stack_frame_OFF
		ENDM
	counter_end
	print str$(eax), 9, "cycles for 100*stack_frame_OFF", 13, 10, 10, "Code sizes:", 13, 10, "Frame on: ", 9
	mov eax, stack_frame_on_END
	sub eax, stack_frame_on
	print str$(eax), 13, 10, "Frame off: ", 9
	mov eax, stack_frame_OFF_END
	sub eax, stack_frame_OFF
	print str$(eax)

	inkey chr$(13, 10, "--- ok ---", 13)
	exit

MyPush MACRO arg
  .if 1
	push arg
	MyBase = MyBase + 4
ENDM

MyPop MACRO arg
	pop arg
	MyBase = MyBase - 4
  .endif
ENDM

MyLocal MACRO args:VARARG
LOCAL tmp$
  .if 1
	MyLocEsp = 0
	MyBase = 0
	FOR arg, <args>
	  	tmp$ CATSTR <arg>, < equ !<dword ptr [esp+MyBase+>, %MyLocEsp, <]!>>
		tmp$
		MyLocEsp = MyLocEsp + 4
	ENDM
	tmp$ CATSTR <add esp, ->, %MyLocEsp
	tmp$
ENDM

MyRet MACRO
	tmp$ CATSTR <sub esp, ->, %MyLocEsp
	tmp$
	ret
  .endif
ENDM

stack_frame_OFF proc
  MyLocal LocV1, LocV2, LocV3, LocV4, LocV5, LocV6
  mov LocV1, eax
  add eax, eax
  mov LocV2, eax
  mov LocV3, 1234h
  .if eax==123456
	MyPush eax
	MyPush eax
	MyPush eax
	MyPush eax
	print chr$("Frame OFF: ")
	mov ecx, LocV2
	print str$(ecx), 9
	print str$(LocV2), 13, 10	; wrong variable because we are pushing [eSp+X]
	MyPop ecx
	MyPop ecx
	MyPop ecx
	MyPop ecx
  .endif
  mov LocV4, 5678h
  mov LocV5, 5555h
  mov LocV6, 6666h
  MyRet
stack_frame_OFF endp
stack_frame_OFF_END:

stack_frame_on proc
LOCAL v1, v2, v3, v4, v5, v6
  mov v1, eax
  add eax, eax
  mov v2, eax
  mov v3, 1234h
  .if eax==123456
	Push eax
	Push eax
	Push eax
	Push eax
	print chr$("Frame  ON: ")
	mov ecx, v2
	print str$(ecx), 9
	print str$(v2), 13, 10	; right variable because we are pushing [eBp+X]
	Pop ecx
	Pop ecx
	Pop ecx
	Pop ecx
  .endif
  mov v4, 5678h
  mov v5, 5555h
  mov v6, 6666h
  ret
stack_frame_on endp
stack_frame_on_END:

end start

Timers.asm in the Masm32 Laboratory (http://www.masm32.com/board/index.php?topic=770.0)

The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: Damos on July 03, 2009, 11:54:41 AM