From i have read
ebp = frame pointer
esp = stack pointer
esp is simple enough for me to understand. it decreases as the stack is enlarged.
ebp seems to be a pointer to a location on the stack where the function started (during normal prologue and epilogue) so that you can refer to local variables from a single position.
Now finally to my questions :) since all local variables are required to be defined at beginning of function that makes them easy to locate based off of ebp. But shouldn't it be fairly simple to locate them based off of esp? The only time i could see that they would not be obvious to find would be when values are pushed based on conditional statements and not popped until outside of that conditional statement but in another one? or are there are cases that i am just not thinking of? I am asking since i would like to just use epilogue none because i imagine i can keep track of local variables off of ESP fairly easily.
What you propose is possible, but complex and rarely worth the effort, both speed- and size-wise. Below a "simple" example, attached a more elaborated way to do it ebp-less using macros.
QuoteOPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
; usage: invoke MyProc, chr$("Arg1"), chr$("Arg2"), 123
; if you want to use invoke MyProc, ..., you need this line:
; MyProc PROTO: DWORD, :DWORD, :DWORD
MyProc[/color] proc arg1_:DWORD, arg2_:DWORD, arg3_:DWORD
args= 3
savedregs= 4
EspOff equ esp+4*savedregs
arg1 equ [EspOff+1*4]
arg2 equ [EspOff+2*4]
arg3 equ [EspOff+3*4]
push edi ; all registers preserved, except eax ecx edx
push esi
push ebx
push ebp ; change savedregs if you do not need ebp
; int 3 ; you may check with Olly what you get here; do not trust Olly's arg.x
mov edi, arg1 ; e.g. lpDest
mov esi, arg2 ; e.g. lpSrc
mov ebx, arg3 ; e.g. count
; print edi, 13, 10
; print esi, 13, 10
; print str$(ebx), 13, 10
; mov ebp, 12345h
pop ebp
pop ebx
pop esi ; all registers preserved, except eax ecx edx
pop edi
ret 4*args
MyProc endp
Joe,
It is routine in MASM to write both normal stack frames and in places where it matters you can write procedures that have no stack frame. It is usually with procedures that are in that twilight zone between small enough to directly inline yet not so big that the stack overhead does not matter. One of the main reasons for writing a no stack frame procedure is to get the extra register EBP which can be very useful in high speed algos but it comes at a price that you must track the stack carefully for pushes and pops and function calls.
For every 32 bit PUSH, the stack address goes up by 4 bytes, for every POP it drops 4 bytes and it can be highly UNintuitive code to write. You can justify the extra work for regularly re-used code but its often not worth the effort for a single procedure.
This technique is refer to by Microsoft as Frame Pointer Omission (FPO). It works well in C because the compiler keeps track of the stack. In some cases it will let pushed parameters build up on the stack and then recover the space later in a single operation. Personally I would not use this method, it requires a lot of unnecessary work to keep track of things that the compiler/assembler would normally do for you. If you need/want to use EBP in a section of code that is not referencing local variables, you can simply push it onto the stack, and pop it later. If you use EBP in an FPO routine you *still* need to preserve it because the calling function presumes EBP, EBX, ESI and EDI are preserved across the call.
Here is an example of what the C compiler does when generating code with and without the frame pointer. Saves you 2 bytes, the [esp+x] encoding is longer, processing a few 100 bytes in this routine, the benefit would be approaching zero.
-Clive
WORD CRC16_C(DWORD Size, BYTE *Buffer)
{
WORD Crc; // 16-bits
static const WORD CrcTable[16]= { // Don't need to copy constants to stack
0x0000,0x1021,0x2042,0x3063,0x4084,0x50A5,0x60C6,0x70E7,
0x8108,0x9129,0xA14A,0xB16B,0xC18C,0xD1AD,0xE1CE,0xF1EF };
Crc = 0;
while(Size--) // For all bytes in the buffer
{
Crc = Crc ^ (*Buffer++ << 8); // Apply the data once, all 8-bits, xor's will cascade
Crc = (Crc << 4) ^ CrcTable[Crc >> 12]; // Presumes 16-bit register, shift providing 4-bit masking
Crc = (Crc << 4) ^ CrcTable[Crc >> 12]; // Next 4-bits
}
return(Crc);
}
With -Ox -Oy- Disable Frame Pointer Omission
00406CE0 _CRC16_C: ; Xref 0040100F
00406CE0 55 push ebp
00406CE1 8BEC mov ebp,esp
00406CE3 8B4D08 mov ecx,[ebp+8]
00406CE6 33C0 xor eax,eax
00406CE8 8BD1 mov edx,ecx
00406CEA 49 dec ecx
00406CEB 85D2 test edx,edx
00406CED 744A jz loc_00406D39
00406CEF 56 push esi
00406CF0 8D7101 lea esi,[ecx+1]
00406CF3 8B4D0C mov ecx,[ebp+0Ch]
00406CF6 57 push edi
00406CF7 loc_00406CF7: ; Xref 00406D35
00406CF7 33D2 xor edx,edx
00406CF9 8A31 mov dh,[ecx]
00406CFB 33C2 xor eax,edx
00406CFD 41 inc ecx
00406CFE 8BD0 mov edx,eax
00406D00 8BF8 mov edi,eax
00406D02 81E2FFFF0000 and edx,0FFFFh
00406D08 C1EA0C shr edx,0Ch
00406D0B C1E704 shl edi,4
00406D0E 668B04551CA04000 mov ax,[off_0040A01C+edx*2]
00406D16 6633C7 xor ax,di
00406D19 8BD0 mov edx,eax
00406D1B 81E2FFFF0000 and edx,0FFFFh
00406D21 C1EA0C shr edx,0Ch
00406D24 C1E004 shl eax,4
00406D27 668B14551CA04000 mov dx,[off_0040A01C+edx*2]
00406D2F 6633D0 xor dx,ax
00406D32 4E dec esi
00406D33 8BC2 mov eax,edx
00406D35 75C0 jnz loc_00406CF7
00406D37 5F pop edi
00406D38 5E pop esi
00406D39 loc_00406D39: ; Xref 00406CED
00406D39 5D pop ebp
00406D3A C3 ret
0040A01C off_0040A01C: ; Xref 00406D0E 00406D27
0040A01C 00 00 21 10 42 20 63 30 - 84 40 A5 50 C6 60 E7 70 ..!.B c0.@.P.`.p
0040A02C 08 81 29 91 4A A1 6B B1 - 8C C1 AD D1 CE E1 EF F1 ..).J.k.........
With -Ox (-Ogityb1 /Gs) Enable Frame Pointer Omission (FPO)
00401140 _CRC16_C: ; Xref 0040100F
00401140 8B4C2404 mov ecx,[esp+4]
00401144 33C0 xor eax,eax
00401146 8BD1 mov edx,ecx
00401148 49 dec ecx
00401149 85D2 test edx,edx
0040114B 744B jz loc_00401198
0040114D 56 push esi
0040114E 8D7101 lea esi,[ecx+1]
00401151 8B4C240C mov ecx,[esp+0Ch]
00401155 57 push edi
00401156 loc_00401156: ; Xref 00401194
00401156 33D2 xor edx,edx
00401158 8A31 mov dh,[ecx]
0040115A 33C2 xor eax,edx
0040115C 41 inc ecx
0040115D 8BD0 mov edx,eax
0040115F 8BF8 mov edi,eax
00401161 81E2FFFF0000 and edx,0FFFFh
00401167 C1EA0C shr edx,0Ch
0040116A C1E704 shl edi,4
0040116D 668B04551CA04000 mov ax,[off_0040A01C+edx*2]
00401175 6633C7 xor ax,di
00401178 8BD0 mov edx,eax
0040117A 81E2FFFF0000 and edx,0FFFFh
00401180 C1EA0C shr edx,0Ch
00401183 C1E004 shl eax,4
00401186 668B14551CA04000 mov dx,[off_0040A01C+edx*2]
0040118E 6633D0 xor dx,ax
00401191 4E dec esi
00401192 8BC2 mov eax,edx
00401194 75C0 jnz loc_00401156
00401196 5F pop edi
00401197 5E pop esi
00401198 loc_00401198: ; Xref 0040114B
00401198 C3 ret
thank you to all 3 replies, put together there is everything i could ever need to know :) Unfortunately this is not the only reason a C compiler will write better code than me, and probably not the most important reason. I will try to catch up to it :)
Joe,
It just comes with practice but be aware that some algorithms reach their memory imposed speed limit well before they reach their most efficient encodings. Compilers get there a lot of the time because of this factor, with practice you will learn what can and what can't be improved and put your work where it matters.