News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

64 bit ABI question

Started by rags, July 02, 2010, 02:28:13 PM

Previous topic - Next topic

rags

looking at a disassemby of this simple program assembled with jwasm:


proc2 proc arg:qword
add rax, arg
ret
proc2 endp
start:
invoke proc2, 1

invoke ExitProcess, 0
ret

end start

It get disassembled as:

  0000000140001000: 55                 push        rbp
  0000000140001001: 48 8B EC           mov         rbp,rsp
  0000000140001004: 48 03 45 10        add         rax,qword ptr [rbp+10h]
  0000000140001008: C9                 leave
  0000000140001009: C3                 ret

  000000014000100A: 48 83 EC 20        sub         rsp,20h
  000000014000100E: 48 C7 C1 01 00 00  mov         rcx,1    ;<----arg
                    00
  0000000140001015: E8 E6 FF FF FF     call        0000000140001000 ;<----call to proc2
  000000014000101A: 48 83 C4 20        add         rsp,20h

  000000014000101E: 48 83 EC 20        sub         rsp,20h
  0000000140001022: B9 00 00 00 00     mov         ecx,0
  0000000140001027: FF 15 0B 20 00 00  call        qword ptr [40003038h]
  000000014000102D: 48 83 C4 20        add         rsp,20h
  0000000140001031: C3                 ret

my question is-
Is it hardwired in the micro code that arg in proc2 automatically gets pushed
from rcx onto the stack on a call to proc2 since this fastcall?

IF so, why don't we see this in the dis-assembly?
Thanks,
        Rags
God made Man, but the monkey applied the glue -DEVO

Rockoon

The first few parameters are ALWAYS passed in registers and NEVER on the stack in the 64-bit ABI.

BUT

The caller must RESERVE SPACE for these values AS IF they were passed on the stack, SO THAT if the called procedure needs to save them to the stack, it can.
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

drizz

Just a note that in poasm this is broken and arg evaluates to rcx instead of stack parameter area.

It would be neat to have an option for automatic saving of those params (just before the call) in jwasm.
option storeparams:[on,off]
We could go to  https://sourceforge.net/tracker/?group_id=255677&atid=1126896 and suggest this to japheth.
The truth cannot be learned ... it can only be recognized.

MichaelW

Quote from: Rockoon on July 02, 2010, 04:18:41 PM
The first few parameters are ALWAYS passed in registers and NEVER on the stack in the 64-bit ABI.

BUT

The caller must RESERVE SPACE for these values AS IF they were passed on the stack, SO THAT if the called procedure needs to save them to the stack, it can.

I can't help but wonder what sort of logic led the developers of this to such an apparently inefficient calling convention.


eschew obfuscation

drizz

the "sub rsp,20h" "add rsp,20h" is just jwasm inefficient invoke handling. Stack tracking compilers would produce different code.
The truth cannot be learned ... it can only be recognized.

rags

Quote from: drizz on July 02, 2010, 09:22:38 PM
Just a note that in poasm this is broken and arg evaluates to rcx instead of stack parameter area.
That's what I was wondering about. I don't see in the disassembly where the value in rcx gets moved to
qword ptr [rbp+10h]. I used dumpbin for the dis assembley, I don't know if that makes a difference.

How does the value in rcx get moved to the stack area since I don't see the code for it?
God made Man, but the monkey applied the glue -DEVO

drizz

Only fifth and later param is saved/pushed to stack.
If you need to save it you would do:
mov arg1,rcx
mov arg2,rdx
mov arg3,r8
mov arg4,r9
And in poasm you have to write a local var for it as "arg1" evaluates to rcx

Quote from: rags on July 03, 2010, 12:47:29 AM
How does the value in rcx get moved to the stack area since I don't see the code for it?
As it has been said, it does not, you need to do it manually.
I mentioned that a switch for jwasm could be implemented to do this automatically.
It would do this before call
mov [rsp+3*8],r9; arg count>=4
mov [rsp+2*8],r8; arg count>=3
mov [rsp+1*8],rdx; arg count>=2
mov [rsp+0*8],rcx; arg count>=1
The truth cannot be learned ... it can only be recognized.

sinsi

Quote from: RockoonThe caller must RESERVE SPACE for these values AS IF they were passed on the stack, SO THAT if the called procedure needs to save them to the stack, it can.
Yes. Several windows API's use the space from RSP+8 to save the parameters for later, one or two I have followed even use it to store RBX instead of a 'push rbx'  ::)
One reason it has to be aligned is so they can store xmm registers there, though this is only my guess. I can't imagine any other reason for the align 16 thing.

Quote from: MichaelWI can't help but wonder what sort of logic led the developers of this to such an apparently inefficient calling convention.
I had a rant about it in another topic. My guess is they didn't want us 'amateur' asmers to have it too easy  :bdg

Luckily for me I am a push/call coder rather than an invoke coder, makes it easier to port code from 32 to 64-bit. Makes it easier when you write a bare proc, you
reserve the maximum amount of stack an API needs and reuse it, instead of these 'sub rsp,x - call - add rsp,x' messes.
Light travels faster than sound, that's why some people seem bright until you hear them.

Rockoon

Quote from: MichaelW on July 02, 2010, 10:18:20 PM
I can't help but wonder what sort of logic led the developers of this to such an apparently inefficient calling convention.

I think the idea that it is inefficient is a confusion over the TWO imposed rules here.

The first rule is that you do not misalign the stack.
The second rule is that you reserve space for register parameters.

As soon as we impose the rule that the stack must be aligned, then the other rule of reserving space is essentially at-worst free from a performance perspective.

Consider the case where alignment is imposed but not the reserving space. If the called function needs temporary space to save off edx, then it must allocate it while continuing to maintain alignment. The caller has a 'sub rsp, xxx' and the called function itself also has a 'sub rsp, xxx'

With the enforced stack reservation, there will be many cases where the called function need not allocate any space.. so does not need to manage the stack at all. Thats a performance win.

We have been doing this sort of thing already in 32-bit land, but just not as obviously. We pass 8-bit and 16-bit parameters as 32-bit, for instance.. clearly that is inefficient from some perspectives.. but obviously a performance win in general.
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

asmfan

I see here that you use not the val but address of 1st shadow space of 20h provided for 4 args of proc2 and this is why you don't see any RCX here.

upd: use brackets (in code)
Russia is a weird place