News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Translating from 32 bit to 64 bit

Started by frktons, August 25, 2010, 08:00:29 PM

Previous topic - Next topic

frktons

Hi all.

rep stosd was considered the fastest way to inizialize a block of
memory in 32 bit assembly.
Now SSE instructions beat it on INTEL machine at least.
It's my opinion that in 64 bit machines, working with 64 bit native operations,
we could get better results than SSE mnemonics just using general 64 bit registers.

Could anyone translate the following code into a 64 bit code
and post the performance it has versus the 32 bit version?

Any help is welcome

Thanks


include \masm32\include\masm32rt.inc


ClearBuffer PROTO :DWORD


;----------------------------------------------------------------------


.data?

    buf2clear CHAR_INFO 2000 dup (<>)

.code

start:

Main PROC

    INVOKE ClearBuffer, ADDR buf2clear
   
    print "Clearing done",13,10,13,10
   
    inkey

finish: INVOKE ExitProcess,0

    ret

Main ENDP

ClearBuffer PROC AddrBuffer:DWORD

    push edi

    mov ecx, 2000 

    mov edi, AddrBuffer

    mov eax,20202020h

    rep stosd

    pop edi

    ret

ClearBuffer ENDP

; -------------------------------------------------------------------------

end start




Mind is like a parachute. You know what to do in order to use it :-)

frktons

Nobody in there using JWASM, GoASM or any 64 bit ASM that can change
a couple of instructions and post the source and exe to test on a 64 bit OS?

Or anybody who can tell me how to use JWASM to compile with COMMAND LINE
OPTIONS for 64 bit OS?

Frank
Mind is like a parachute. You know what to do in order to use it :-)

GregL

Frank,

Would an ml64 version be of use to you?

[Edit] It's late now, I'll check back in the morning.

frktons

Quote from: GregL on September 04, 2010, 04:27:32 AM
Frank,

Would an ml64 version be of use to you?

[Edit] It's late now, I'll check back in the morning.


I don't know if I can use ML64, and how to use it.
What about include \masm32\include\masm32rt.inc ?
Anything to change in the source other than


    push edi

    mov ecx, 2000

    mov edi, AddrBuffer

    mov eax,20202020h

    rep stosd

    pop edi


That should be translated something like:


    push rdi

    mov rcx, 1000

    mov rdi, AddrBuffer

    mov rax,2020202020202020h

    rep stosq

    pop rdi


?

and what have I to pass to ML64.EXE for parameters to compile
to 64 native bit?

Thanks

Mind is like a parachute. You know what to do in order to use it :-)

GregL

Frank,

Sorry, I just realized my Visual Studio 2010 Professional trial has expired and the Express Edition does not build 64-bit without some substantial modifications.  So, I'm not set up to build 64-bit with ml64. Using include \masm32\include\masm32rt.inc won't work because it is for 32-bit. Your 64-bit translation of the core code looks good. I'm going to work on getting my 2010 Express Edition building 64-bit. It should be possible, I was able to do it with the 2008 Express Edition.

Maybe someone else could build it with GoASM or JWASM in the mean time.


GregL

If anyone is interested I got VC++ 2010 Express Edition building x64 C programs by installing the latest Windows SDK and setting the VC++ LIB directory to the one from the SDK for x64. That's it, it works.

Now to get it to build x64 MASM programs. Trouble is, I can't find ml64.exe anywhere, and I uninstalled the VC++ Pro trial.

sinsi

I've never had any C stuff, only SDKs, but found ML64 here:

C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\amd64
Light travels faster than sound, that's why some people seem bright until you hear them.

frktons

Quote from: sinsi on September 05, 2010, 04:20:03 AM
I've never had any C stuff, only SDKs, but found ML64 here:

C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\amd64

Well, I've got ML64 in the same directory sinsi, the trouble is how to translate into
64 bit a simple routine like the one I posted, assemble it, and see the performance
using some kind of CPU timings, all things beyond my actual knowledge  :red

Frank
Mind is like a parachute. You know what to do in order to use it :-)

sinsi

The timing is simple, just wrap ClearBuffer in a couple of 'rdtsc' for an easy test, the problem is the lack of a print/inkey macro for 64-bit.
Light travels faster than sound, that's why some people seem bright until you hear them.

japheth

Quote from: sinsi on September 05, 2010, 04:48:13 AM
The timing is simple, just wrap ClearBuffer in a couple of 'rdtsc' for an easy test, the problem is the lack of a print/inkey macro for 64-bit.

A workaround may be to use MSVCRT's printf and kbhit.

Here's a sample:

;--- Win64 "hello world" console application.
;--- uses CRT functions.
;--- assemble: JWasm -win64 Win64_6.asm
;---       or: ml64 -c Win64_6.asm
;--- link:     Link /subsystem:console Win64_6.obj

option casemap:none

includelib msvcrt.lib

externdef printf : near
externdef kbhit : near

.data

string   db 10,"hello, world.",10,0

.code

main proc
sub rsp, 28h        ; space for 4 arguments + 16byte aligned stack
mov rcx, offset string
call printf
call kbhit
xor eax, eax
add rsp,28h
ret
main endp

end


You'll need a 64-bit version of msvcrt.lib.

frktons

Quote from: japheth on September 05, 2010, 05:43:20 AM
Quote from: sinsi on September 05, 2010, 04:48:13 AM
The timing is simple, just wrap ClearBuffer in a couple of 'rdtsc' for an easy test, the problem is the lack of a print/inkey macro for 64-bit.

A workaround may be to use MSVCRT's printf and kbhit.

Here's a sample:

;--- Win64 "hello world" console application.
;--- uses CRT functions.
;--- assemble: JWasm -win64 Win64_6.asm
;---       or: ml64 -c Win64_6.asm
;--- link:     Link /subsystem:console Win64_6.obj

option casemap:none

includelib msvcrt.lib

externdef printf : near
externdef kbhit : near

.data

string   db 10,"hello, world.",10,0

.code

main proc
sub rsp, 28h        ; space for 4 arguments + 16byte aligned stack
mov rcx, offset string
call printf
call kbhit
xor eax, eax
add rsp,28h
ret
main endp

end


You'll need a 64-bit version of msvcrt.lib.

Thanks japheth.

I was thinking about using your JWASM for this task, but I lack the necessary
expertise. By the way I'll try to convert myself the routine, and add a couple
of  rdtsc as sinsi suggested, and see if I can get it working.

Next week I hope to find the available time to undertake the task.

Frank
Mind is like a parachute. You know what to do in order to use it :-)

sinsi

Thanks japheth. I noticed you aligned the stack, does calling the C dll need alignment too?
I usually align it at the start of a proc even if I don't use an API, mainly so I know that RSP will be 8-aligned if I call another proc.

What I am trying to get at is, the windows API requires 16-byte alignment but does anything else? Or is it just a good habit to get into?
Light travels faster than sound, that's why some people seem bright until you hear them.

japheth

Quote from: sinsi on September 05, 2010, 06:05:17 AM
Thanks japheth. I noticed you aligned the stack, does calling the C dll need alignment too?

I don't know and can't cite any serious sources, but I guess that it is required for CRT functions as well because the 16-byte alignment stuff is a basic requirement of Win64 SEH.

sinsi

It doesn't really cost to much to do, that's why I get into the habit even if it's unneeded.

>Win64 SEH.
yuck *shudder*
Light travels faster than sound, that's why some people seem bright until you hear them.

Rockoon

It really isnt a performance problem. As you see above, including the extra space to maintain alignment was free, and its even cheaper than free if more calls that also have 5 or less parameters are made (rsp was already set to ensure alignment, so no need to mess with it again)

Also note that you do not have to maintain alignment if you make no calls, so leaf functions dont need to worry about it and can use your more typical push and pop mechanics for temporarily saving registers.
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.