looking at a disassemby of this simple program assembled with jwasm:
proc2 proc arg:qword
add rax, arg
ret
proc2 endp
start:
invoke proc2, 1
invoke ExitProcess, 0
ret
end start
It get disassembled as:
0000000140001000: 55 push rbp
0000000140001001: 48 8B EC mov rbp,rsp
0000000140001004: 48 03 45 10 add rax,qword ptr [rbp+10h]
0000000140001008: C9 leave
0000000140001009: C3 ret
000000014000100A: 48 83 EC 20 sub rsp,20h
000000014000100E: 48 C7 C1 01 00 00 mov rcx,1 ;<----arg
00
0000000140001015: E8 E6 FF FF FF call 0000000140001000 ;<----call to proc2
000000014000101A: 48 83 C4 20 add rsp,20h
000000014000101E: 48 83 EC 20 sub rsp,20h
0000000140001022: B9 00 00 00 00 mov ecx,0
0000000140001027: FF 15 0B 20 00 00 call qword ptr [40003038h]
000000014000102D: 48 83 C4 20 add rsp,20h
0000000140001031: C3 ret
my question is-
Is it hardwired in the micro code that arg in proc2 automatically gets pushed
from rcx onto the stack on a call to proc2 since this fastcall?
IF so, why don't we see this in the dis-assembly?
Thanks,
Rags
The first few parameters are ALWAYS passed in registers and NEVER on the stack in the 64-bit ABI.
BUT
The caller must RESERVE SPACE for these values AS IF they were passed on the stack, SO THAT if the called procedure needs to save them to the stack, it can.
Just a note that in poasm this is broken and arg evaluates to rcx instead of stack parameter area.
It would be neat to have an option for automatic saving of those params (just before the call) in jwasm.
option storeparams:[on,off]
We could go to https://sourceforge.net/tracker/?group_id=255677&atid=1126896 and suggest this to japheth.
Quote from: Rockoon on July 02, 2010, 04:18:41 PM
The first few parameters are ALWAYS passed in registers and NEVER on the stack in the 64-bit ABI.
BUT
The caller must RESERVE SPACE for these values AS IF they were passed on the stack, SO THAT if the called procedure needs to save them to the stack, it can.
I can't help but wonder what sort of logic led the developers of this to such an apparently inefficient calling convention.
the "sub rsp,20h" "add rsp,20h" is just jwasm inefficient invoke handling. Stack tracking compilers would produce different code.
Quote from: drizz on July 02, 2010, 09:22:38 PM
Just a note that in poasm this is broken and arg evaluates to rcx instead of stack parameter area.
That's what I was wondering about. I don't see in the disassembly where the value in rcx gets moved to
qword ptr [rbp+10h]. I used dumpbin for the dis assembley, I don't know if that makes a difference.
How does the value in rcx get moved to the stack area since I don't see the code for it?
Only fifth and later param is saved/pushed to stack.
If you need to save it you would do:
mov arg1,rcx
mov arg2,rdx
mov arg3,r8
mov arg4,r9
And in poasm you have to write a local var for it as "arg1" evaluates to rcx
Quote from: rags on July 03, 2010, 12:47:29 AM
How does the value in rcx get moved to the stack area since I don't see the code for it?
As it has been said, it does not, you need to do it manually.
I mentioned that a switch for jwasm could be implemented to do this automatically.
It would do this before call
mov [rsp+3*8],r9; arg count>=4
mov [rsp+2*8],r8; arg count>=3
mov [rsp+1*8],rdx; arg count>=2
mov [rsp+0*8],rcx; arg count>=1
Quote from: RockoonThe caller must RESERVE SPACE for these values AS IF they were passed on the stack, SO THAT if the called procedure needs to save them to the stack, it can.
Yes. Several windows API's use the space from RSP+8 to save the parameters for later, one or two I have followed even use it to store RBX instead of a 'push rbx' ::)
One reason it has to be aligned is so they can store xmm registers there, though this is only my guess. I can't imagine any other reason for the align 16 thing.
Quote from: MichaelWI can't help but wonder what sort of logic led the developers of this to such an apparently inefficient calling convention.
I had a rant about it in another topic. My guess is they didn't want us 'amateur' asmers to have it too easy :bdg
Luckily for me I am a push/call coder rather than an invoke coder, makes it easier to port code from 32 to 64-bit. Makes it easier when you write a bare proc, you
reserve the maximum amount of stack an API needs and reuse it, instead of these 'sub rsp,x - call - add rsp,x' messes.
Quote from: MichaelW on July 02, 2010, 10:18:20 PM
I can't help but wonder what sort of logic led the developers of this to such an apparently inefficient calling convention.
I think the idea that it is inefficient is a confusion over the TWO imposed rules here.
The first rule is that you do not misalign the stack.
The second rule is that you reserve space for register parameters.
As soon as we impose the rule that the stack must be aligned, then the other rule of reserving space is essentially at-worst free from a performance perspective.
Consider the case where alignment is imposed but not the reserving space. If the called function needs temporary space to save off edx, then it must allocate it while continuing to maintain alignment. The caller has a 'sub rsp, xxx' and the called function itself also has a 'sub rsp, xxx'
With the enforced stack reservation, there will be many cases where the called function need not allocate any space.. so does not need to manage the stack at all. Thats a performance win.
We have been doing this sort of thing already in 32-bit land, but just not as obviously. We pass 8-bit and 16-bit parameters as 32-bit, for instance.. clearly that is inefficient from some perspectives.. but obviously a performance win in general.
I see here that you use not the val but address of 1st shadow space of 20h provided for 4 args of proc2 and this is why you don't see any RCX here.
upd: use brackets (in code)