News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Getting started with ml64.exe

Started by GregL, February 15, 2009, 11:21:31 PM

Previous topic - Next topic

GregL

A helpful article on getting started with ml64.exe.

Moving to Windows Vista x64 - x64 Assembly


My HelloWorld program for ml64


EXTRN GetStdHandle: PROC
EXTRN WriteFile:    PROC
EXTRN lstrlen:      PROC
EXTRN ExitProcess:  PROC

.DATA

    hFile        QWORD 0
    msglen       DWORD 0
    BytesWritten DWORD 0
    msg          BYTE  "Hello x64 World!", 13, 10, 0

.CODE

    main PROC

;int 3              ; breakpoint for debugger

        sub rsp, 28h

        lea rcx, msg
        call lstrlen
        mov msglen, eax

mov ecx, -11        ; STD_OUTPUT
call GetStdHandle
        mov hFile, rax

        lea r9, BytesWritten
        mov r8d, msglen
lea rdx, msg
        mov rcx, hFile
call WriteFile

xor ecx, ecx        ; exit code = 0
call ExitProcess

    main ENDP

END



makeit.bat


@echo off
set LIB=%LIB%;C:\Program Files\Microsoft SDKs\Windows\v7.0\Lib\x64;
"C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\amd64\ml64.exe" HelloWorld.asm /link /subsystem:console /defaultlib:kernel32.lib /entry:main
echo.


It works!    I really feel like a noob.  :bg


GregL

#1
In the code above, should I be doing an add, rsp 28h before calling ExitProcess?

The author of the article I linked to above didn't do that in the code (he does in sub-routines).  It doesn't seem to matter before calling ExitProcess, but I'd like to know what is correct.

Looking at the disassembly for a similar C program, the compiler (VC 9.0) does adjust rsp before exiting, although the exit is just a ret.


disassembly


#include <stdio.h>

int main(void)
{
000000013F081000  sub         rsp,28h
    printf("Hello x64 world!\n");
000000013F081004  lea         rcx,[__native_dllmain_reason-3Ch (13F083000h)]
000000013F08100B  call        qword ptr [__imp_printf (13F0820F8h)]
    return 0;
000000013F081016  xor         eax,eax
}
000000013F081018  add         rsp,28h
000000013F08101C  ret             


MazeGen

If you use RET to exit from a thread (process), you have to adjust the stack. If you call ExitProcess, all what is needed are four QWORDs allocated on the stack before you call ExitProcess (by convention; it should work also without them).

GregL

MazeGen,

Thank you.  I think I get it now.  Before calling functions, you always need at least the four QWORDS allocated on the stack for the 'Register Parameter Stack Area',  right?

So, in my code above I should have used sub rsp, 20h instead of sub rsp, 28h since i never had more than 4 arguments in any function call.  I just cut and pasted the code and never changed the stack allocation.

Thanks for the help.  Are you still doing any x64 assembly?


MazeGen

That's right, four QWORDs are enough.

There are two ways how to call functions in x64 assembler code. You either allocate/deallocate stack before calling each function:


push arg6 ; sixth parameter
push arg5 ; fifth parameter
sub rsp, 4*8 ; allocate space for 'Register Parameter Stack Area'
mov r9, arg4
mov r8, arg3
mov rdx, arg2
mov rcx, arg1
call function
add rsp, 4*8 + 2*8 ; release all parameters from stack


Or, you first allocate the number of QWORDs corresponding to the highest number of parameters of a function you call, and use MOVs to fill the stack:


GregsFunc PROC
sub rsp, 4*8 + 2*8 ; part of prologue
...
mov [rsp + 4*8 + 2*8], arg6
mov [rsp + 4*8 + 1*8], arg5
mov r9, arg4
mov r8, arg3
mov rdx, arg2
mov rcx, arg1
call function

... ; call more functions the same way

add rsp, 4*8 + 2*8 ; part of epilogue
GregsFunc ENDP


For a compiler, it is easy to get the highest number of parameters so compilers prefer the latter way, because it also generates less code.

If you know that you will hardly call a function with, say, 15 parameters, you can always allocate 15 QWORDs on the stack in your function's prologue (SUB RSP, 15*8) and use the latter method as well. It is not a clean way because it may waste stack, but should work well.

Well, I'm not really doing x64 assembly, I had to move to C because of portability between x86 and x64. (I was also thinking about portable masm syntax, but it would be too complicated.) I'm mostly interested in reversing and general research of x86 architecture.

GregL

MazeGen,

Thanks again.  That is pretty much how I thought it worked and you confirmed it.  x64 is a bit more complicated than 32-bit x86.  I'm not sure I'll be writing many full programs in x64 asm, but I definitely want to know how to write asm functions that can be called from C since inline assembler is no longer supported.  It's also just fun to learn and a good thing to know for debugging if nothing else.


drizz

Hello x64 coders :)

I've started to fiddle with x64 MASM just couple of days ago, and this is my experience so far:
I'm forced to use windbg, which is not as good as olly !!
I had some trouble with understanding the new stack layout but it seems it's not my fault but MASMs.
The "proc" directive does not align the stack to 16 bytes but just leaves it as is, and that's not good.
This is my prologue replacement (you must have some locals/arguments or it won't work with default epilogue):
NewPrologue64 MACRO procname, flags, parambytes, localbytes, reglist, userparms
LOCAL regadjust
regadjust = 0
FOR reg,reglist
regadjust = regadjust XOR 1; odd,even,odd
ENDM
push rbp
; aligned @16 after push
mov rbp,rsp
; has locals?
IF (localbytes GT 0)
IF ((localbytes SHR 3) AND 1) EQ regadjust
;; if both are odd or both are even
sub rsp,localbytes
ELSE
sub rsp,localbytes+8
ENDIF
ELSEIF (regadjust EQ 0); is number of regs to push EVEN?
sub rsp,8
ENDIF
FOR reg,reglist
push reg
ENDM
EXITM %localbytes
ENDM

OPTION PROLOGUE:NewPrologue64

;; this macro makes rsp aligned after "proc" line


The lack of "invoke" can be replaced with a macro, and i wrote a simple one for that.

btw, MazeGen, i think i remember you writting invoke replacement, did you finish it and can you share it? (or was that for ml32?)
The truth cannot be learned ... it can only be recognized.

MazeGen

Quote from: drizz on February 19, 2009, 01:32:54 AM
btw, MazeGen, i think i remember you writting invoke replacement, did you finish it and can you share it? (or was that for ml32?)

Yeah, I wrote pretty advanced replacement for PROC, INVOKE and LOCAL, but when I was about finishing them, I realized that I have to learn C instead of coding in x64 asm so I actually never used them :lol

Good you mentioned it, I will review them and try to publish them!

MazeGen

Quote from: drizz on February 19, 2009, 01:32:54 AM
I'm forced to use windbg, which is not as good as olly !!

Have you seen fdbg? I was trying to convince the author to copy ollydbg's GUI, but I succeed only partially  :wink Anyway, it is simpler than WinDbg and therefore easier to use, IMHO.

Alloy

The Visual C++ Express versions can also symbolically debug 64bit too.
We all used to be something else. Nature has always recycled.

Igor

Quote from: MazeGen on February 18, 2009, 10:12:09 AM
That's right, four QWORDs are enough.
Are you sure about this?

This doesnt work for me:
sub rsp, 20h
mov r9, MB_OK
mov r8, NULL
mov rdx, NULL
mov rcx, NULL
call MessageBox


and this does work:
sub rsp, 28h
mov r9, MB_OK
mov r8, NULL
mov rdx, NULL
mov rcx, NULL
call MessageBox


I think that "Caller return adress" missaligns the stack so you need another 8bytes to align it and its probably better to align it always then to run into random problems later on.
http://msdn.microsoft.com/en-us/library/ms794596.aspx

GregL

#11
Igor,

I have wondered why C compilers always use 28h (40) instead of 20h (32). I think you may be right about the alignment.

[Edit] Igor is right about this.


leemarx

I wrote a asm program and work well
include      windows.inc
includelib   kernel32.lib
includelib   user32.lib
includelib   gdi32.lib
extrn   MessageBoxA : proc
extrn   ExitProcess : proc
extrn   LoadIconA : proc
extrn   LoadCursorA : proc
extrn   RegisterClassExA : proc
extrn   CreateWindowExA : proc
extrn   GetModuleHandleA : proc
extrn   ShowWindow : proc
extrn   UpdateWindow : proc
extrn   GetMessageA : proc
extrn   TranslateMessage : proc
extrn   DispatchMessageA : proc
extrn   PostQuitMessage : proc
extrn   DefWindowProcA : proc
extrn   DestroyWindow : proc
cbf proto ;:dq, :dd, :dq, :dq
.data
cap         db   'Hello !',0
txt         db   'Hello world !',0
regw      db   'Register Windows failed !',0
crww      db   'Create Window failed !',0
shww      db   'Show Window failed !',0
upww      db   'Update Window failed !',0
hIn         qword   0
hWnd      qword   0

.code
;call back function fow windows


main   proc
      local   wc:WNDCLASSEX
      local   umsg:MSG
      sub      rsp,(sizeof WNDCLASSEX) + (sizeof MSG) + 8 * 11 + 28h
;Register Windows Class
      ;file the wc
      mov      wc.cbSize,sizeof WNDCLASSEX
      mov      wc.style,CS_HREDRAW or CS_VREDRAW
      lea      rax,cbf
      mov      wc.lpfnWndProc,rax
      xor      rax,rax
      mov      wc.cbClsExtra,eax
      mov      wc.cbWndExtra,eax
      xor      rcx,rcx
      call   GetModuleHandleA
      mov      rax,hIn
      mov      wc.hInstance,rax
      mov      rdx,IDI_APPLICATION
      mov      rcx,rax
      call   LoadIconA
      mov      wc.hIcon,rax
      mov      wc.hIconSm,rax
      mov      rdx,IDC_ARROW
      xor      rcx,rcx
      
      call   LoadCursorA
      mov      wc.hCursor,rax
      mov      wc.hbrBackground,COLOR_WINDOW + 1
      xor      rax,rax
      mov      wc.lpszMenuName,rax
      lea      rax,cap
      mov      wc.lpszClassName,rax
      lea      rcx,wc
      call   RegisterClassExA
      cmp      rax,0
      jnz      conti
      add      rsp,sizeof (WNDCLASSEX) + sizeof (MSG) + 1
      xor      r9,r9
      lea      r8,cap
      lea      rdx,regw
      xor      rcx,rcx
      call   MessageBoxA
conti:
      ;cteatewindow
      ;sub      rsp,8 * 11
      xor      rax,rax
      mov      [rsp + 8 * 11],rax
      mov      rax,hIn
      mov      [rsp + 8 * 10],rax
      xor      rax,rax
      mov      [rsp + 8 * 9],rax
      mov      [rsp + 8 * 8],rax
      mov      rax,400
      mov      [rsp + 8 * 7],rax
      mov      [rsp + 8 * 6],rax
      mov      rax,CW_USEDEFAULT
      mov      [rsp + 8 * 5],rax
      mov      [rsp + 8 * 4],rax
      mov      r9,WS_OVERLAPPEDWINDOW
      lea      r8,txt
      lea      rdx,cap
      xor      rcx,rcx
      call   CreateWindowExA
      mov      hWnd,rax
      cmp      rax,0
      jnz      n1
      
      xor      r9,r9
      lea      r8,cap
      lea      rdx,crww
      xor      rcx,rcx
      call   MessageBoxA
n1:      
      mov      rdx,SW_SHOW
      mov      rcx,hWnd
      call   ShowWindow
      
      cmp      rax,0
      jz      n2
      
      xor      r9,r9
      lea      r8,cap
      lea      rdx,shww
      xor      rcx,rcx
      call   MessageBoxA
      
n2:      
      mov      rcx,hWnd
      call   UpdateWindow
      cmp      rax,0
      jz      lo
      
      xor      r9,r9
      lea      r8,cap
      lea      rdx,upww
      xor      rcx,rcx
      call   MessageBoxA
      
lo:      ;Message loop
      
      xor      r9,r9
      xor      r8,r8
      mov      rdx,NULL
      lea      rcx,umsg
      call   GetMessageA
      
      cmp      rax,0
      jz      exit
      lea      rcx,umsg
      ;call   TranslateMessage
      lea      rcx,umsg
      call   DispatchMessageA
      jmp      lo
exit:
      xor      rcx,rcx
      call   ExitProcess
   
      add      rsp,(sizeof WNDCLASSEX) + (sizeof MSG) + 8 * 11 + 28h
      ret
main   endp
cbf      proc ;hw : dq , umsg : dd, wp : dq , lpa : dq
      ;local   ps:PAINTSTRUCT
      mov      [rsp + 20h],r9
      mov      [rsp + 18h],r8
      mov      [rsp + 10h],rdx
      mov      [rsp + 8],rcx
            ;push        rsi
            ;push        rdi
            ;sub      rsp,28h + (sizeof PAINTSTRUCT)
      sub          rsp,28h
      ;jmp      ex
      mov      eax,edx
      ;Handle WM_DESTORY
      cmp      eax,WM_DESTROY
      jnz      def
      xor      rcx,rcx
      call   PostQuitMessage
      ;Handle   WM_CLOSE
      cmp      eax,WM_CLOSE
      jnz      def
      mov      rbp,rsp
      add      rbp,28h
            ;add      rbp,28h + (sizeof PAINTSTRUCT)
      mov      rcx,[rbp + 8]
      call   DestroyWindow
      ;Handle WM_PAINT
      cmp      eax,WM_PAINT
      jnz      def
      
      
def:
      mov      rbp,rsp
      add      rbp,28h
      mov      r9,[rbp +20h]
      mov      r8,[rbp + 18h]
      mov      rdx,[rbp + 10h]
      mov      rcx,[rbp + 8h]
      call   DefWindowProcA
ex:      ;cmp      rax,
      ;add      rsp,28h + (sizeof PAINTSTRUCT)
            add         rsp,28h
            ;pop         rdi
            ;pop         rsi
          ret
cbf      endp

      end

leemarx

but when I declare a local variable  PAINTSTRUCT it dosen't work anyway,I wonder why
include      windows.inc
includelib   kernel32.lib
includelib   user32.lib
includelib   gdi32.lib
extrn   MessageBoxA : proc
extrn   ExitProcess : proc
extrn   LoadIconA : proc
extrn   LoadCursorA : proc
extrn   RegisterClassExA : proc
extrn   CreateWindowExA : proc
extrn   GetModuleHandleA : proc
extrn   ShowWindow : proc
extrn   UpdateWindow : proc
extrn   GetMessageA : proc
extrn   TranslateMessage : proc
extrn   DispatchMessageA : proc
extrn   PostQuitMessage : proc
extrn   DefWindowProcA : proc
extrn   DestroyWindow : proc
cbf proto ;:dq, :dd, :dq, :dq
.data
cap         db   'Hello !',0
txt         db   'Hello world !',0
regw      db   'Register Windows failed !',0
crww      db   'Create Window failed !',0
shww      db   'Show Window failed !',0
upww      db   'Update Window failed !',0
hIn         qword   0
hWnd      qword   0

.code
;call back function fow windows


main   proc
      local   wc:WNDCLASSEX
      local   umsg:MSG
      sub      rsp,(sizeof WNDCLASSEX) + (sizeof MSG) + 8 * 11 + 28h
;Register Windows Class
      ;file the wc
      mov      wc.cbSize,sizeof WNDCLASSEX
      mov      wc.style,CS_HREDRAW or CS_VREDRAW
      lea      rax,cbf
      mov      wc.lpfnWndProc,rax
      xor      rax,rax
      mov      wc.cbClsExtra,eax
      mov      wc.cbWndExtra,eax
      xor      rcx,rcx
      call   GetModuleHandleA
      mov      rax,hIn
      mov      wc.hInstance,rax
      mov      rdx,IDI_APPLICATION
      mov      rcx,rax
      call   LoadIconA
      mov      wc.hIcon,rax
      mov      wc.hIconSm,rax
      mov      rdx,IDC_ARROW
      xor      rcx,rcx
      
      call   LoadCursorA
      mov      wc.hCursor,rax
      mov      wc.hbrBackground,COLOR_WINDOW + 1
      xor      rax,rax
      mov      wc.lpszMenuName,rax
      lea      rax,cap
      mov      wc.lpszClassName,rax
      lea      rcx,wc
      call   RegisterClassExA
      cmp      rax,0
      jnz      conti
      add      rsp,sizeof (WNDCLASSEX) + sizeof (MSG) + 1
      xor      r9,r9
      lea      r8,cap
      lea      rdx,regw
      xor      rcx,rcx
      call   MessageBoxA
conti:
      ;cteatewindow
      ;sub      rsp,8 * 11
      xor      rax,rax
      mov      [rsp + 8 * 11],rax
      mov      rax,hIn
      mov      [rsp + 8 * 10],rax
      xor      rax,rax
      mov      [rsp + 8 * 9],rax
      mov      [rsp + 8 * 8],rax
      mov      rax,400
      mov      [rsp + 8 * 7],rax
      mov      [rsp + 8 * 6],rax
      mov      rax,CW_USEDEFAULT
      mov      [rsp + 8 * 5],rax
      mov      [rsp + 8 * 4],rax
      mov      r9,WS_OVERLAPPEDWINDOW
      lea      r8,txt
      lea      rdx,cap
      xor      rcx,rcx
      call   CreateWindowExA
      mov      hWnd,rax
      cmp      rax,0
      jnz      n1
      
      xor      r9,r9
      lea      r8,cap
      lea      rdx,crww
      xor      rcx,rcx
      call   MessageBoxA
n1:      
      mov      rdx,SW_SHOW
      mov      rcx,hWnd
      call   ShowWindow
      
      cmp      rax,0
      jz      n2
      
      xor      r9,r9
      lea      r8,cap
      lea      rdx,shww
      xor      rcx,rcx
      call   MessageBoxA
      
n2:      
      mov      rcx,hWnd
      call   UpdateWindow
      cmp      rax,0
      jz      lo
      
      xor      r9,r9
      lea      r8,cap
      lea      rdx,upww
      xor      rcx,rcx
      call   MessageBoxA
      
lo:      ;Message loop
      
      xor      r9,r9
      xor      r8,r8
      mov      rdx,NULL
      lea      rcx,umsg
      call   GetMessageA
      
      cmp      rax,0
      jz      exit
      lea      rcx,umsg
      ;call   TranslateMessage
      lea      rcx,umsg
      call   DispatchMessageA
      jmp      lo
exit:
      xor      rcx,rcx
      call   ExitProcess
   
      add      rsp,(sizeof WNDCLASSEX) + (sizeof MSG) + 8 * 11 + 28h
      ret
main   endp
cbf      proc ;hw : dq , umsg : dd, wp : dq , lpa : dq
      local   ps:PAINTSTRUCT
      mov      [rsp + 20h],r9
      mov      [rsp + 18h],r8
      mov      [rsp + 10h],rdx
      mov      [rsp + 8],rcx
            ;push        rsi
            ;push        rdi
            sub      rsp,28h + (sizeof PAINTSTRUCT)
      ;sub          rsp,28h
      ;jmp      ex
      mov      eax,edx
      ;Handle WM_DESTORY
      cmp      eax,WM_DESTROY
      jnz      def
      xor      rcx,rcx
      call   PostQuitMessage
      ;Handle   WM_CLOSE
      cmp      eax,WM_CLOSE
      jnz      def
      mov      rbp,rsp
      ;add      rbp,28h
            add      rbp,28h + (sizeof PAINTSTRUCT)
      mov      rcx,[rbp + 8]
      call   DestroyWindow
      ;Handle WM_PAINT
      cmp      eax,WM_PAINT
      jnz      def
      
      
def:
      mov      rbp,rsp
      add      rbp,28h
      mov      r9,[rbp +20h]
      mov      r8,[rbp + 18h]
      mov      rdx,[rbp + 10h]
      mov      rcx,[rbp + 8h]
      call   DefWindowProcA
ex:      ;cmp      rax,
      add      rsp,28h + (sizeof PAINTSTRUCT)
            ;add         rsp,28h
            ;pop         rdi
            ;pop         rsi
          ret
cbf      endp

      end

GregL

Quote from: leemarx's codeinclude      windows.inc
includelib   kernel32.lib
includelib   user32.lib
includelib   gdi32.lib

If these are MASM32 includes and libraries (32-bit), they are going to cause you major problems. If they are, I'm surprised the program in your previous post worked.