A helpful article on getting started with ml64.exe.
Moving to Windows Vista x64 - x64 Assembly (http://www.codeproject.com/KB/vista/vista_x64.aspx#x64_Assembly)
My HelloWorld program for ml64
EXTRN GetStdHandle: PROC
EXTRN WriteFile: PROC
EXTRN lstrlen: PROC
EXTRN ExitProcess: PROC
.DATA
hFile QWORD 0
msglen DWORD 0
BytesWritten DWORD 0
msg BYTE "Hello x64 World!", 13, 10, 0
.CODE
main PROC
;int 3 ; breakpoint for debugger
sub rsp, 28h
lea rcx, msg
call lstrlen
mov msglen, eax
mov ecx, -11 ; STD_OUTPUT
call GetStdHandle
mov hFile, rax
lea r9, BytesWritten
mov r8d, msglen
lea rdx, msg
mov rcx, hFile
call WriteFile
xor ecx, ecx ; exit code = 0
call ExitProcess
main ENDP
END
makeit.bat
@echo off
set LIB=%LIB%;C:\Program Files\Microsoft SDKs\Windows\v7.0\Lib\x64;
"C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\amd64\ml64.exe" HelloWorld.asm /link /subsystem:console /defaultlib:kernel32.lib /entry:main
echo.
It works! I really feel like a noob. :bg
In the code above, should I be doing an add, rsp 28h before calling ExitProcess?
The author of the article I linked to above didn't do that in the code (he does in sub-routines). It doesn't seem to matter before calling ExitProcess, but I'd like to know what is correct.
Looking at the disassembly for a similar C program, the compiler (VC 9.0) does adjust rsp before exiting, although the exit is just a ret.
disassembly
#include <stdio.h>
int main(void)
{
000000013F081000 sub rsp,28h
printf("Hello x64 world!\n");
000000013F081004 lea rcx,[__native_dllmain_reason-3Ch (13F083000h)]
000000013F08100B call qword ptr [__imp_printf (13F0820F8h)]
return 0;
000000013F081016 xor eax,eax
}
000000013F081018 add rsp,28h
000000013F08101C ret
If you use RET to exit from a thread (process), you have to adjust the stack. If you call ExitProcess, all what is needed are four QWORDs allocated on the stack before you call ExitProcess (by convention; it should work also without them).
MazeGen,
Thank you. I think I get it now. Before calling functions, you always need at least the four QWORDS allocated on the stack for the 'Register Parameter Stack Area', right?
So, in my code above I should have used sub rsp, 20h instead of sub rsp, 28h since i never had more than 4 arguments in any function call. I just cut and pasted the code and never changed the stack allocation.
Thanks for the help. Are you still doing any x64 assembly?
That's right, four QWORDs are enough.
There are two ways how to call functions in x64 assembler code. You either allocate/deallocate stack before calling each function:
push arg6 ; sixth parameter
push arg5 ; fifth parameter
sub rsp, 4*8 ; allocate space for 'Register Parameter Stack Area'
mov r9, arg4
mov r8, arg3
mov rdx, arg2
mov rcx, arg1
call function
add rsp, 4*8 + 2*8 ; release all parameters from stack
Or, you first allocate the number of QWORDs corresponding to the highest number of parameters of a function you call, and use MOVs to fill the stack:
GregsFunc PROC
sub rsp, 4*8 + 2*8 ; part of prologue
...
mov [rsp + 4*8 + 2*8], arg6
mov [rsp + 4*8 + 1*8], arg5
mov r9, arg4
mov r8, arg3
mov rdx, arg2
mov rcx, arg1
call function
... ; call more functions the same way
add rsp, 4*8 + 2*8 ; part of epilogue
GregsFunc ENDP
For a compiler, it is easy to get the highest number of parameters so compilers prefer the latter way, because it also generates less code.
If you know that you will hardly call a function with, say, 15 parameters, you can always allocate 15 QWORDs on the stack in your function's prologue (SUB RSP, 15*8) and use the latter method as well. It is not a clean way because it may waste stack, but should work well.
Well, I'm not really doing x64 assembly, I had to move to C because of portability between x86 and x64. (I was also thinking about portable masm syntax (http://x86asm.net/articles/portable-x86-flat-syntax/index.html), but it would be too complicated.) I'm mostly interested in reversing and general research of x86 architecture.
MazeGen,
Thanks again. That is pretty much how I thought it worked and you confirmed it. x64 is a bit more complicated than 32-bit x86. I'm not sure I'll be writing many full programs in x64 asm, but I definitely want to know how to write asm functions that can be called from C since inline assembler is no longer supported. It's also just fun to learn and a good thing to know for debugging if nothing else.
Hello x64 coders :)
I've started to fiddle with x64 MASM just couple of days ago, and this is my experience so far:
I'm forced to use windbg, which is not as good as olly !!
I had some trouble with understanding the new stack layout but it seems it's not my fault but MASMs.
The "proc" directive does not align the stack to 16 bytes but just leaves it as is, and that's not good.
This is my prologue replacement (you must have some locals/arguments or it won't work with default epilogue):
NewPrologue64 MACRO procname, flags, parambytes, localbytes, reglist, userparms
LOCAL regadjust
regadjust = 0
FOR reg,reglist
regadjust = regadjust XOR 1; odd,even,odd
ENDM
push rbp
; aligned @16 after push
mov rbp,rsp
; has locals?
IF (localbytes GT 0)
IF ((localbytes SHR 3) AND 1) EQ regadjust
;; if both are odd or both are even
sub rsp,localbytes
ELSE
sub rsp,localbytes+8
ENDIF
ELSEIF (regadjust EQ 0); is number of regs to push EVEN?
sub rsp,8
ENDIF
FOR reg,reglist
push reg
ENDM
EXITM %localbytes
ENDM
OPTION PROLOGUE:NewPrologue64
;; this macro makes rsp aligned after "proc" line
The lack of "invoke" can be replaced with a macro, and i wrote a simple one for that.
btw, MazeGen, i think i remember you writting invoke replacement, did you finish it and can you share it? (or was that for ml32?)
Quote from: drizz on February 19, 2009, 01:32:54 AM
btw, MazeGen, i think i remember you writting invoke replacement, did you finish it and can you share it? (or was that for ml32?)
Yeah, I wrote pretty advanced replacement for PROC, INVOKE and LOCAL, but when I was about finishing them, I realized that I have to learn C instead of coding in x64 asm so I actually never used them :lol
Good you mentioned it, I will review them and try to publish them!
Quote from: drizz on February 19, 2009, 01:32:54 AM
I'm forced to use windbg, which is not as good as olly !!
Have you seen fdbg (http://fdbg.x86asm.net/)? I was trying to convince the author to copy ollydbg's GUI, but I succeed only partially :wink Anyway, it is simpler than WinDbg and therefore easier to use, IMHO.
The Visual C++ Express versions can also symbolically debug 64bit too.
Quote from: MazeGen on February 18, 2009, 10:12:09 AM
That's right, four QWORDs are enough.
Are you sure about this?
This doesnt work for me:
sub rsp, 20h
mov r9, MB_OK
mov r8, NULL
mov rdx, NULL
mov rcx, NULL
call MessageBox
and this does work:
sub rsp, 28h
mov r9, MB_OK
mov r8, NULL
mov rdx, NULL
mov rcx, NULL
call MessageBox
I think that "Caller return adress" missaligns the stack so you need another 8bytes to align it and its probably better to align it always then to run into random problems later on.
http://msdn.microsoft.com/en-us/library/ms794596.aspx
Igor,
I have wondered why C compilers always use 28h (40) instead of 20h (32). I think you may be right about the alignment.
[Edit] Igor is right about this.
I wrote a asm program and work well
include windows.inc
includelib kernel32.lib
includelib user32.lib
includelib gdi32.lib
extrn MessageBoxA : proc
extrn ExitProcess : proc
extrn LoadIconA : proc
extrn LoadCursorA : proc
extrn RegisterClassExA : proc
extrn CreateWindowExA : proc
extrn GetModuleHandleA : proc
extrn ShowWindow : proc
extrn UpdateWindow : proc
extrn GetMessageA : proc
extrn TranslateMessage : proc
extrn DispatchMessageA : proc
extrn PostQuitMessage : proc
extrn DefWindowProcA : proc
extrn DestroyWindow : proc
cbf proto ;:dq, :dd, :dq, :dq
.data
cap db 'Hello !',0
txt db 'Hello world !',0
regw db 'Register Windows failed !',0
crww db 'Create Window failed !',0
shww db 'Show Window failed !',0
upww db 'Update Window failed !',0
hIn qword 0
hWnd qword 0
.code
;call back function fow windows
main proc
local wc:WNDCLASSEX
local umsg:MSG
sub rsp,(sizeof WNDCLASSEX) + (sizeof MSG) + 8 * 11 + 28h
;Register Windows Class
;file the wc
mov wc.cbSize,sizeof WNDCLASSEX
mov wc.style,CS_HREDRAW or CS_VREDRAW
lea rax,cbf
mov wc.lpfnWndProc,rax
xor rax,rax
mov wc.cbClsExtra,eax
mov wc.cbWndExtra,eax
xor rcx,rcx
call GetModuleHandleA
mov rax,hIn
mov wc.hInstance,rax
mov rdx,IDI_APPLICATION
mov rcx,rax
call LoadIconA
mov wc.hIcon,rax
mov wc.hIconSm,rax
mov rdx,IDC_ARROW
xor rcx,rcx
call LoadCursorA
mov wc.hCursor,rax
mov wc.hbrBackground,COLOR_WINDOW + 1
xor rax,rax
mov wc.lpszMenuName,rax
lea rax,cap
mov wc.lpszClassName,rax
lea rcx,wc
call RegisterClassExA
cmp rax,0
jnz conti
add rsp,sizeof (WNDCLASSEX) + sizeof (MSG) + 1
xor r9,r9
lea r8,cap
lea rdx,regw
xor rcx,rcx
call MessageBoxA
conti:
;cteatewindow
;sub rsp,8 * 11
xor rax,rax
mov [rsp + 8 * 11],rax
mov rax,hIn
mov [rsp + 8 * 10],rax
xor rax,rax
mov [rsp + 8 * 9],rax
mov [rsp + 8 * 8],rax
mov rax,400
mov [rsp + 8 * 7],rax
mov [rsp + 8 * 6],rax
mov rax,CW_USEDEFAULT
mov [rsp + 8 * 5],rax
mov [rsp + 8 * 4],rax
mov r9,WS_OVERLAPPEDWINDOW
lea r8,txt
lea rdx,cap
xor rcx,rcx
call CreateWindowExA
mov hWnd,rax
cmp rax,0
jnz n1
xor r9,r9
lea r8,cap
lea rdx,crww
xor rcx,rcx
call MessageBoxA
n1:
mov rdx,SW_SHOW
mov rcx,hWnd
call ShowWindow
cmp rax,0
jz n2
xor r9,r9
lea r8,cap
lea rdx,shww
xor rcx,rcx
call MessageBoxA
n2:
mov rcx,hWnd
call UpdateWindow
cmp rax,0
jz lo
xor r9,r9
lea r8,cap
lea rdx,upww
xor rcx,rcx
call MessageBoxA
lo: ;Message loop
xor r9,r9
xor r8,r8
mov rdx,NULL
lea rcx,umsg
call GetMessageA
cmp rax,0
jz exit
lea rcx,umsg
;call TranslateMessage
lea rcx,umsg
call DispatchMessageA
jmp lo
exit:
xor rcx,rcx
call ExitProcess
add rsp,(sizeof WNDCLASSEX) + (sizeof MSG) + 8 * 11 + 28h
ret
main endp
cbf proc ;hw : dq , umsg : dd, wp : dq , lpa : dq
;local ps:PAINTSTRUCT
mov [rsp + 20h],r9
mov [rsp + 18h],r8
mov [rsp + 10h],rdx
mov [rsp + 8],rcx
;push rsi
;push rdi
;sub rsp,28h + (sizeof PAINTSTRUCT)
sub rsp,28h
;jmp ex
mov eax,edx
;Handle WM_DESTORY
cmp eax,WM_DESTROY
jnz def
xor rcx,rcx
call PostQuitMessage
;Handle WM_CLOSE
cmp eax,WM_CLOSE
jnz def
mov rbp,rsp
add rbp,28h
;add rbp,28h + (sizeof PAINTSTRUCT)
mov rcx,[rbp + 8]
call DestroyWindow
;Handle WM_PAINT
cmp eax,WM_PAINT
jnz def
def:
mov rbp,rsp
add rbp,28h
mov r9,[rbp +20h]
mov r8,[rbp + 18h]
mov rdx,[rbp + 10h]
mov rcx,[rbp + 8h]
call DefWindowProcA
ex: ;cmp rax,
;add rsp,28h + (sizeof PAINTSTRUCT)
add rsp,28h
;pop rdi
;pop rsi
ret
cbf endp
end
but when I declare a local variable PAINTSTRUCT it dosen't work anyway,I wonder why
include windows.inc
includelib kernel32.lib
includelib user32.lib
includelib gdi32.lib
extrn MessageBoxA : proc
extrn ExitProcess : proc
extrn LoadIconA : proc
extrn LoadCursorA : proc
extrn RegisterClassExA : proc
extrn CreateWindowExA : proc
extrn GetModuleHandleA : proc
extrn ShowWindow : proc
extrn UpdateWindow : proc
extrn GetMessageA : proc
extrn TranslateMessage : proc
extrn DispatchMessageA : proc
extrn PostQuitMessage : proc
extrn DefWindowProcA : proc
extrn DestroyWindow : proc
cbf proto ;:dq, :dd, :dq, :dq
.data
cap db 'Hello !',0
txt db 'Hello world !',0
regw db 'Register Windows failed !',0
crww db 'Create Window failed !',0
shww db 'Show Window failed !',0
upww db 'Update Window failed !',0
hIn qword 0
hWnd qword 0
.code
;call back function fow windows
main proc
local wc:WNDCLASSEX
local umsg:MSG
sub rsp,(sizeof WNDCLASSEX) + (sizeof MSG) + 8 * 11 + 28h
;Register Windows Class
;file the wc
mov wc.cbSize,sizeof WNDCLASSEX
mov wc.style,CS_HREDRAW or CS_VREDRAW
lea rax,cbf
mov wc.lpfnWndProc,rax
xor rax,rax
mov wc.cbClsExtra,eax
mov wc.cbWndExtra,eax
xor rcx,rcx
call GetModuleHandleA
mov rax,hIn
mov wc.hInstance,rax
mov rdx,IDI_APPLICATION
mov rcx,rax
call LoadIconA
mov wc.hIcon,rax
mov wc.hIconSm,rax
mov rdx,IDC_ARROW
xor rcx,rcx
call LoadCursorA
mov wc.hCursor,rax
mov wc.hbrBackground,COLOR_WINDOW + 1
xor rax,rax
mov wc.lpszMenuName,rax
lea rax,cap
mov wc.lpszClassName,rax
lea rcx,wc
call RegisterClassExA
cmp rax,0
jnz conti
add rsp,sizeof (WNDCLASSEX) + sizeof (MSG) + 1
xor r9,r9
lea r8,cap
lea rdx,regw
xor rcx,rcx
call MessageBoxA
conti:
;cteatewindow
;sub rsp,8 * 11
xor rax,rax
mov [rsp + 8 * 11],rax
mov rax,hIn
mov [rsp + 8 * 10],rax
xor rax,rax
mov [rsp + 8 * 9],rax
mov [rsp + 8 * 8],rax
mov rax,400
mov [rsp + 8 * 7],rax
mov [rsp + 8 * 6],rax
mov rax,CW_USEDEFAULT
mov [rsp + 8 * 5],rax
mov [rsp + 8 * 4],rax
mov r9,WS_OVERLAPPEDWINDOW
lea r8,txt
lea rdx,cap
xor rcx,rcx
call CreateWindowExA
mov hWnd,rax
cmp rax,0
jnz n1
xor r9,r9
lea r8,cap
lea rdx,crww
xor rcx,rcx
call MessageBoxA
n1:
mov rdx,SW_SHOW
mov rcx,hWnd
call ShowWindow
cmp rax,0
jz n2
xor r9,r9
lea r8,cap
lea rdx,shww
xor rcx,rcx
call MessageBoxA
n2:
mov rcx,hWnd
call UpdateWindow
cmp rax,0
jz lo
xor r9,r9
lea r8,cap
lea rdx,upww
xor rcx,rcx
call MessageBoxA
lo: ;Message loop
xor r9,r9
xor r8,r8
mov rdx,NULL
lea rcx,umsg
call GetMessageA
cmp rax,0
jz exit
lea rcx,umsg
;call TranslateMessage
lea rcx,umsg
call DispatchMessageA
jmp lo
exit:
xor rcx,rcx
call ExitProcess
add rsp,(sizeof WNDCLASSEX) + (sizeof MSG) + 8 * 11 + 28h
ret
main endp
cbf proc ;hw : dq , umsg : dd, wp : dq , lpa : dq
local ps:PAINTSTRUCT
mov [rsp + 20h],r9
mov [rsp + 18h],r8
mov [rsp + 10h],rdx
mov [rsp + 8],rcx
;push rsi
;push rdi
sub rsp,28h + (sizeof PAINTSTRUCT)
;sub rsp,28h
;jmp ex
mov eax,edx
;Handle WM_DESTORY
cmp eax,WM_DESTROY
jnz def
xor rcx,rcx
call PostQuitMessage
;Handle WM_CLOSE
cmp eax,WM_CLOSE
jnz def
mov rbp,rsp
;add rbp,28h
add rbp,28h + (sizeof PAINTSTRUCT)
mov rcx,[rbp + 8]
call DestroyWindow
;Handle WM_PAINT
cmp eax,WM_PAINT
jnz def
def:
mov rbp,rsp
add rbp,28h
mov r9,[rbp +20h]
mov r8,[rbp + 18h]
mov rdx,[rbp + 10h]
mov rcx,[rbp + 8h]
call DefWindowProcA
ex: ;cmp rax,
add rsp,28h + (sizeof PAINTSTRUCT)
;add rsp,28h
;pop rdi
;pop rsi
ret
cbf endp
end
Quote from: leemarx's codeinclude windows.inc
includelib kernel32.lib
includelib user32.lib
includelib gdi32.lib
If these are MASM32 includes and libraries (32-bit), they are going to cause you major problems. If they are, I'm surprised the program in your previous post worked.