News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Automatic stack alignment on x64

Started by MazeGen, May 09, 2006, 03:06:08 PM

Previous topic - Next topic

MazeGen

I work on macros to support FASTCALL calling convention in 64-bit MASM. I have found an interesting article about automatic stack alignment on GoAsm site.

The former method of achieving the alignment seems to be incorrect though:

PUSH RSP             ;save current RSP position on the stack
PUSH [RSP]           ;keep another copy of that on the stack
AND SPL,0F0h         ;adjust RSP to align the stack if not already there
                     ;
                     ;  parameters dealt with here
                     ;
SUB RSP,20h          ;adjust RSP to provide placeholders
CALL TheAPI
ADD RSP,xxh          ;get RSP back to correct place for next
POP RSP              ;restore RSP to its original value


Here comes the stack layout for this example (I don't take SUB RSP and ADD RSP into account):

(inital)         |     xyz     | <- RSP = 88 (8-byte aligned)
PUSH RSP         |     88      | <- RSP = 80
PUSH [RSP]       |     88      | <- RSP = 72
AND SPL, 0F0h    |   ~HOLE~    | <- RSP = 64
CALL func        | return link | <- RSP = 56 (8-byte aligned!)
POP RSP          |     ??      | <- RSP = ???


After return from function, it is not possible to restore the stack!

As for latter method, it seems to be correct for both 8-byte and 16-byte inital alignment:

PUSH RSP             ;save current RSP position on the stack
PUSH [RSP]           ;keep another copy of that on the stack
OR SPL,8h            ;adjust RSP to align the stack if not already there
                     ;
                     ;  parameters dealt with here
                     ;
SUB RSP,20h          ;adjust RSP to provide placeholders
CALL TheAPI
ADD RSP,xxh          ;get RSP back to correct place for next
POP RSP              ;restore RSP to its original value



16-byte aligned initally:

(inital)         |     xyz     | <- RSP = 96 (16-byte aligned)
PUSH RSP         |     96      | <- RSP = 88
PUSH [RSP]       |     96      | <- RSP = 80
0R SPL, 8 (removes previous one) <- RSP = 88
CALL func        | return link | <- RSP = 80 (16-byte aligned)
POP RSP          |     xyz     | <- RSP = 96 (inital)



8-byte aligned initally:

(inital)         |     xyz     | <- RSP = 88 (8-byte aligned)
PUSH RSP         |     88      | <- RSP = 80
PUSH [RSP]       |     88      | <- RSP = 72
0R SPL, 8          (no change)   <- RSP = 72
CALL func        | return link | <- RSP = 64 (16-byte aligned)
POP RSP          |     xyz     | <- RSP = 88 (inital)

manhattan

The first method is OK. The called function will have an unaligned stack pointer. It's the initial condition. If you follow the convention RSP cannot be 16-byte aligned when the function is entered.

jorgon

What's missing from this poser is this passage appearing in the article "writing 64-bit programs" immediately before the code snippet quoted by MazeGen:
Quotewhich one is used depends on the number of parameters

In fact, GoAsm with the /x64 switch when INVOKE is used, works as follows:-
  • Where the number of parameters are four or less, or even in number, before any parameters are dealt with, the stack will be set on a xx0h boundary (code snippet 1).
  • Where the number of parameters are more than four and odd in number, before any parameters are pushed, the stack will be set on a xx8h boundary (code snippet 2).
There are no pushes if the parameters are four or less.  In code snippet 2 the odd number of pushes move the stack by 8 bytes.  Therefore in each case irrespective of the initial alignment, the code arrives at the destination of the CALL with the stack on a 16-byte boundary.  And the ADD RSP,xxh instruction gets everything back to the correct position afterwards.  The API expects the pushed parameters and placeholders to be in an exact position up the stack, so the alignment must be done before parameters are dealt with.

In practical tests this worked well.

This above means that when code actually arrives at the call destination (the API) the stack is offset from the 16-byte boundary by 8 bytes.  This is correct, and is what happens also in a window procedure (which is called by the system).  RSP always arrives in a window procedure offset from the 16-byte boundary by 8 bytes.
Author of the "Go" tools (GoAsm, GoLink, GoRC, GoBug)

vid

Quote from: jorgon on May 10, 2006, 06:31:40 AM
Where the number of parameters are four or less, or even in number, before any parameters are dealt with, the stack will be set on a xx0h boundary (code snippet 1).
but your AND can move stack pointer by 8 down, and i can't see where you move it back to position, where "push [rsp]" stored original RSP value. Here is what i mean:


;RSP = 118h
PUSH RSP             ;RSP = 110h
PUSH [RSP]           ;RSP = 108h
AND SPL,0F0h         ;RSP = 100h, value at [ss:100h] is undefined
;4 or less params, just some MOVs
SUB RSP,20h          ;RSP = E0h
CALL TheAPI
ADD RSP,20h          ;RSP = 100h
POP RSP              ;popping from [ss:100h] !!!!

did i misunderstood something?

jorgon

Quotebut your AND can move stack pointer by 8 down, and i can't see where you move it back to position, where "push [rsp]" stored original RSP value
the answer to this is that when AND SPL,0F0h (code snippet 1) is used, there are an extra 8 bytes added to restore the stack.  In your example, the "ADD RSP,xxh" in the code snippet is not ADD RSP,20h but in fact ADD RSP,28h.  The exact amount added depends on the number of parameters.

Sorry if the details are not clear to members of the forum from the "interesting article".  The section of the article partially quoted by MazeGen was not intended to give a full detailed and exact account of GoAsm's working in this area.  It was headed
QuoteThe optimisations and refinements are listed here to help you when you look at the code produced by GoAsm in the debugger
which is why it lacked in the detail which is required fully to understand it.
Author of the "Go" tools (GoAsm, GoLink, GoRC, GoBug)

MazeGen

Thanks for your explanation, jorgon :U

I was still confused how you can determine at compile time how much to remove from the stack when its alignment is known at run time. Finally it is clear :)