Japeth,
On Win7 Home premium, this simple test piece fails when compiled using the -win64 switch:
option casemap :none ; case sensitive
_WIN64 EQU 1
include \jwasm\Win32Inc200\Include\windows.inc ;Japeth's window.inc
includelib \jwasm\Win32Inc200\Lib64\kernel32.lib
includelib \jwasm\Win32Inc200\Lib64\user32.lib
.data
align 8
szTitle db "64 Bit MessageBox",0
szMsg db "Hey this works!!",0
.code
main:
; int 3 ; for the debugger use
; sub rsp, 8h
invoke MessageBox, NULL, addr szMsg, addr szTitle, MB_OK
; add rsp, 8h
; sub rsp, 8h
invoke ExitProcess, 0
; add rsp, 8h
end main
It seems that JWASM 2.02 when compiling the invoke statement,
should reserve another 8 bytes on the stack to allow room for the push of
the return address.
When the sub/add rsp code is uncommented, making the neccessary adjustment
to the stack before the call and after the call, it works as expected.
Quote from: rags on January 22, 2010, 10:46:25 PM
It seems that JWASM 2.02 when compiling the invoke statement,
should reserve another 8 bytes on the stack to allow room for the push of
the return address.
the invoke directive assumes that the stack is aligned. It subtracts 20h, 30h, 40h, ... from RSP, fills parameters and emits a CALL. IMO it would be a very bad idea if invoke has to implement some runtime stack alignment tests.
Quote from: japheth on January 23, 2010, 07:53:54 AM
the invoke directive assumes that the stack is aligned.
That is corect.
Quote
It subtracts 20h, 30h, 40h, ... from RSP, fills parameters and emits a CALL.
This is incorrect.
You must also consider the return address pushed by the CALL when you calculate the value to substract / add to RSP. Hence you must substract 28h, 30h, 38h, 40h, 48h ... etc depending on the even/odd count of invoke parameters.
Think about it... if you do not compensate for the 8 bytes of the return address THEN inside the PROC that you invoke the stack will be unbalanced by 8 and you will have to do it again anyway. This is one of the subtle issues with the 64bit ABI standard.
Quote
IMO it would be a very bad idea if invoke has to implement some runtime stack alignment tests.
Yes, this is also my opinion.
However in some minor circumstances it might be useful as an option. For example when mixing with other call conventions that can not guarantee a stack alignment at run time (like stdcall) or when debugging to do a fast check if your application has stack alignment issues ;)
Quote
Quote
It subtracts 20h, 30h, 40h, ... from RSP, fills parameters and emits a CALL.
This is incorrect.
I just verified what JWasm currently does:
for 0 - 4 parameters, 20h is subtracted from RSP
for 5 - 6 parameters, 30h is subtracted from RSP
for 7 - 8 parameters, 40h is subtracted from RSP
...
IMO this behavior complies to the win64 ABI.
Quote
You must also consider the return address pushed by the CALL when you calculate the value to substract / add to RSP. Hence you must substract 28h, 30h, 38h, 40h, 48h ... etc depending on the even/odd count of invoke parameters.
From what I did understand when reading the win64 ABI the one thing which is important is that RSP is 16-byte aligned
just before the CALL instruction.
Quote
Think about it... if you do not compensate for the 8 bytes of the return address THEN inside the PROC that you invoke the stack will be unbalanced by 8 and you will have to do it again anyway. This is one of the subtle issues with the 64bit ABI standard.
Yes, on entry the stack is - always - unbalanced. AFAIU it's the job of the procedure to balance it again if necessary.
Quote from: japheth on January 23, 2010, 02:02:11 PM
I just verified what JWasm currently does:
for 0 - 4 parameters, 20h is subtracted from RSP
for 5 - 6 parameters, 30h is subtracted from RSP
for 7 - 8 parameters, 40h is subtracted from RSP
This seems very odd, what happens when you call MessageBox API?
*MessageBox is one of the functions which crash immediately if stack is missaligned
I'm still not convinced that jwasm balances the stack correctly.
Whats worse is that windows ( xp 64 in vmware machine ) silently ignores access violation and resumes execution!
lets take this modified WinCUI1 example:
option casemap:none
.nolist
.nocref
WIN32_LEAN_AND_MEAN equ 1
include windows.inc
.list
.cref
.DATA
align 16
txt db "Hello World",0
.CODE
main proc c uses rbx rsi rdi
LOCAL var1:qword,var2:qword
sub esp,16
movdqa xmm0,[esp]; create access violation if unaligned read
add esp,16
invoke MessageBox,0,addr txt,0,0
ret
main endp
mainCRTStartup proc
and rsp,-16
call main
invoke ExitProcess, eax
mainCRTStartup endp
END mainCRTStartup
If I run it directly the message box appears. If I run it through windbg the access violation pops out because rsp is not 16-byte aligned.
Can anyone confirm this behavior on a normal system? Here is the exe file.
And yes, it's the same with FRAME option.
option casemap:none
option frame:auto
include windows.inc
.DATA
align 16
txt db "Hello World",0
.CODE
main proc FRAME uses rbx rsi rdi
LOCAL var1:qword
sub esp,16
movdqa xmm0,[esp]; create access violation if unaligned read
add esp,16
invoke MessageBox,0,addr txt,0,0
ret
main endp
mainCRTStartup proc
and rsp,-16
call main
invoke ExitProcess, eax
mainCRTStartup endp
END mainCRTStartup
Even though the manual states:
QuoteThe PROC's FRAME attribute ensures that the stack is correctly aligned after the prologue is done.
That test.exe has "stopped working" in my real win7 x64. :eek
Access violation at :
000000014000100E 67660F6F0424 movdqa xmm0,oword [esp] ; [000000000012FF08]=00000000000000000000000000000000
Quote from: drizz on March 01, 2010, 06:41:55 PM
And yes, it's the same with FRAME option.
...
Even though the manual states:
QuoteThe PROC's FRAME attribute ensures that the stack is correctly aligned after the prologue is done.
You're right, the stack alignment calculations were probably a bit too lazy.
But I tried an improvement. It may work better... :wink
Japheth, thanks for looking into it. Works as expected now (2.03). There is just this tiny issue (or non issue) of "add rsp,0" being generated.
Cheers!