Hello!
I've got a problem in understanding how function parameters on the stack work. Let's say I have a C function that's called "generate" and I want to translate it to Assembly;
typedef struct {
int foo;
unsigned char *bar;
} generate_struct;
int generate(generate_struct *g, int x, int y, long *res);
How would I access the structure members and the function parameters? I know that you access the function parameters through ESP, but after ESP there's a number which I'm not sure of (I've seen different codes and some start at 4, some start at 8 and some start at 16 :eek). Could anyone explain?
MOV EAX, [ESP+4] ; EAX now points to "generate_struct *g"?
; Would I now access the structure members through EAX like below?
MOV foo, [EAX+4] ; an "int" is 4 bytes long
MOV bar, [EAX+8] ; a pointer is 4 bytes long (?)
QuoteHow would I access the structure members and the function parameters? I know that you access the function parameters through ESP, but after ESP there's a number which I'm not sure of (I've seen different codes and some start at 4, some start at 8 and some start at 16 Eek). Could anyone explain?
Assembly was the first language i learned, because funnily enough, it was alot more straight forward to me than HLL like C/C++,
i got confused from the different names for same types and casting..among other cryptic text available to me.
didn't have a good enough book, or access to proper resources, so assembly was what i learned.
There is no standard size for data types in any HLL.
in 32-bits, an INT is 4 bytes, (or a DWORD) and a *pointer is just another DWORD with address of some data.
generate_struct STRUCT
foo DWORD ?
bar DWORD ?
generate_struct ENDS
thats how the structure would look in assembly, a type
double would usually be a
QWORD on 32-bit CPU
(correct me if wrong guys, i only know main types in MASM)
this
char buffer[32];
could become
buffer BYTE 32 dup (?)
an array of characters..
char string[]="ABCDE";
string BYTE "ABCDE",00h
the prototype for
int generate(generate_struct *g, int x, int y, long *res);
could be
generate PROTO :DWORD, :DWORD, :DWORD, :DWORD
generate PROC g:DWORD, x:DWORD, y:DWORD, res:DWORD
mov esi, dword ptr[g]
mov eax, dword ptr [esi][generate_struct.foo]
mov ebx, dword ptr [esi][generate_struct.bar]
....
ret
generate ENDP
just keep in mind that majority parameters on 32-bit stack are 4 bytes, or 1 DWORD, unless dealing with floating point numbers,
or MMX/SSE/2/3 registers.
if i were to call generate routine in old style syntax
.data
g_ptr generate_struct <?>
x_num DWORD 1
y_num DWORD 2
lpRes DWORD ?
.code
..........
push offset lpRes
push dword ptr [y]
push dword ptr [x]
push offset g_ptr
call generate
........
with INVOKE
invoke generate,addr g_ptr, x, y,addr lpRes
Thanks a lot for your answer Kernel_Gaddafi! Even though it wasn't exactly what I meant, it still also cleared out a few things for me. Let's say I want to access a variable which has been pushed by a function on a stack; how to find it? I've noticed that some programs uses, e.g "[ESP+4]" etc. What does 4 mean? Is 4 the size of the variable type (pointer/DWORD)? Does alignment make any difference?
This is a good example to show you what I mean. This small code snippet is from the Monkey's Audio SDK:
;
; void Adapt ( short* pM, const short* pAdapt, int nDirection, int nOrder )
;
; [esp+16] nOrder
; [esp+12] nDirection
; [esp+ 8] pAdapt
; [esp+ 4] pM
; [esp+ 0] Return Address
align 16
proc Adapt
mov eax, [esp + 4] ; pM
mov ecx, [esp + 8] ; pAdapt
mov edx, [esp + 16] ; nOrder
shr edx, 4
...
How does the author know that "pM" will be located at "[esp+4]" - is it an Assembler rule or what indicates that "4" is the unique number used to find "pM" on the stack pointer? The only thing I can think of is that the size of "pM" is 4 (pointer = DWORD = 4 bytes in 32-bit systems), but is this really true?
Thanks!
Regards,
Seb
Seb,
Knowing where a stack argument is located in memory depends on how you set up the stack. If you use a stack frame as with a normal procedure, the first argument starts at [ebp+8]. Now it gets a bit more complicated if you don't use a stack frame because while the first argument starts at [esp+4], you have to calculate and PUSH instructions and add that to the argument location as push and pop change the ESP location.
If for example you had a proc with no stack frame that has 3 registers preserved,
push ebx
push esi
push edi
You must ADD 12 bytes to the ESP location for each argument.
The other thing that is CRITICAL is to use the form of RET that has a trailing number after it as this corrects the stack for you.
If you have 2 x DWORD arguments pushed onto the stack and are using STDCALL calling convention, at the exit of the procedure you use,
RET 8
to balance the stack.
If you call the procedure using the C calling convention you exit with a RET but correct the stack directly after the calling code with something like,
ADD ESP, 8
QuoteHow does the author know that "pM" will be located at "[esp+4]" - is it an Assembler rule or what indicates that "4" is the unique number used to find "pM" on the stack pointer? The only thing I can think of is that the size of "pM" is 4 (pointer = DWORD = 4 bytes in 32-bit systems), but is this really true?
somebody can correct me at some point if i make a mistake, so i don't explain this incorrectly.
the best thing you can do Seb, is write some assembly code with
push/pop instructions and debug
it watching how
esp changes with each instruction, and after a call to a STDCALL routine,
thats how i understand it.
ESP just points to a block of memory..like that returned with HeapAlloc or LocalAlloc
When you push a DWORD on the stack, ESP is reduced by 4 bytes.
You don't have to use PUSH and POP to manage the stack, but it is bad practice
to do it manually.
push 0
push offset szTitle
push offset szMessage
push 0
call MessageBoxA
; you could write the above code like this:
sub esp, 4*4
and dword ptr [esp], 0 ; first arguement
mov dword ptr [esp + 4], offset szMessage ; second
mov dword ptr [esp + 8], offset szTitle ; third
and dword ptr [esp + 12], 0 ; fourth
call MessageBoxA
; not neccessarily in that order, btw:i didn't test that code, so apologies if it
; crashes
;
; when MessageBoxA is called, the next address is pushed on the stack
; imagine below to be MessageBoxA routine..
MessageBoxA:
push ebp ; save ebp on stack (subtract by 4)
mov ebp, esp ; move esp into ebp for local variables.
; dword ptr [esp + 0] = old EBP
; dword ptr [esp + 4] = 0
; dword ptr [esp + 8] = offset szMessage
; dword ptr [esp + 12] = offset szTitle
; dword ptr [esp + 16] = 0
Quote from: Seb on October 16, 2005, 01:04:44 AM
;
; void Adapt ( short* pM, const short* pAdapt, int nDirection, int nOrder )
;
; [esp+16] nOrder
; [esp+12] nDirection
; [esp+ 8] pAdapt
; [esp+ 4] pM
; [esp+ 0] Return Address
align 16
proc Adapt
mov eax, [esp + 4] ; pM
mov ecx, [esp + 8] ; pAdapt
mov edx, [esp + 16] ; nOrder
shr edx, 4
...
How does the author know that "pM" will be located at "[esp+4]" - is it an Assembler rule or what indicates that "4" is the unique number used to find "pM" on the stack pointer? The only thing I can think of is that the size of "pM" is 4 (pointer = DWORD = 4 bytes in 32-bit systems), but is this really true?
What you are seeing is the result of calling conventions. If you want to mix assembly with high-level languages, you follow the conventions.
We know from the way CALL works, that when a subroutine starts, the last item on the stack is the return address, and it is a DWORD (occupying 4 bytes).
Everything else is convention. We know that Win32 C compilers, by convention, define int and pointer as 4-byte values in Win32. We know, by compiler convention, that arguments are put on the stack. We know, by compiler convention, that each argument will occupy 4 bytes or some multiple of that. We know, by compiler convention, in what order they will appear in the stack.
In assembly language code, we do not need to follow these conventions. Some of us follow these conventions because it's consistent, well understood, and allows us to combine code modules without demanding to know how the other modules handle argument passing. Those of us who don't either want to be unconventional, or don't want the overhead associated with the conventions. (or don't know there are conventions)