Is there any performance problem between using a 32-bit or 64-bit register?
sub r8d,r8d
sub r8,r8
Both do the same thing, both are encoded as 3 bytes, but is one better?
I used r8 in this example because using eax/rax there is a difference (extra byte for the rex prefix).
Maybe I should qualify my question, it's not so much the performance (1 clock cycle ain't a killer) but more like a gotcha.
I am thinking of stalls, like when using an 8- or 16-bit register in 32-bit mode.
Hi sinsi,
I am using XOR reg,reg instead of SUB reg,reg
even though both instructions use 1 clock cycle on 486 processor
XOR is looking better and more sophisticated because it looks you understand binary numbers
here http://classes.engr.oregonstate.edu/eecs/summer2008/cs271/Instructions.htm you can check for clock cycles
regards
sinsi,
I don't know if 64 bit capable hardware suffered the problem that early PIVs did with partial register writes stalling a larger register read or write shortly after it. I personally doubt that a zeroing operation fits into that style of problem as both SUB and XOR tend to live in silicon, not microcode but probably the only safe way is to make a small test piece and time it. I remember on a PIII that you used to get very bad stalls if you performed a BYTE operation followed shortly after with a DWORD operation on a register and it was blatantly obvious that the timing was different.
If you don't get major differences in the timing, then it probably is not a big deal.
Yeah, I can't really see a problem with zeroing the upper bits, that's built in to all sorts of other instructions.
Interesting, I was wondering about 32/64, never thought about e.g. r8b and how that affects r8/r8d/r8w. Same I should think as al/eax in 32-bit cpus.
All we need is for MichaelW to make timers64...although I am having a go at it on and off.
If Dave would hurry up and win the lottery he could buy me a new system as he promised, and then I could make the move to 64 bits :bg
Quote from: sinsi on February 04, 2012, 10:13:25 AM
All we need is for MichaelW to make timers64...although I am having a go at it on and off.
I've translate them a while ago:
; x64-Version of MichaelW's macros
counter_begin MACRO loopcount:REQ, priority
LOCAL label
IFNDEF tmcb__nLoops
.data
align 16
tmcb__nLoops dd 0
tmcb__cntr dd 0
tmcb__qw dq 2 dup (?)
.code
ENDIF
mov tmcb__nLoops,loopcount
IFNB <priority>
call GetCurrentProcess
mov rdx,priority
mov rcx,rax
call SetPriorityClass
ENDIF
xor rax,rax
cpuid
rdtsc
mov DWORD ptr tmcb__qw[0],eax
mov DWORD ptr tmcb__qw[4],edx
mov tmcb__cntr, loopcount
xor rax,rax
cpuid
align 16
@@:
sub tmcb__cntr,1
jnz @B
xor rax,rax
cpuid
rdtsc
shl rdx,32
or rax,rdx
sub rax,tmcb__qw[0]
mov tmcb__qw[0],rax
xor rax, rax
cpuid
rdtsc
mov tmcb__cntr,loopcount
mov DWORD ptr tmcb__qw[8],eax
mov DWORD ptr tmcb__qw[12],edx
xor rax,rax
cpuid
align 16
label:
tmcb__label equ <label>
ENDM
; x64-Version of MichaelW's macros
counter_end MACRO
sub tmcb__cntr,1
jnz tmcb__label
xor rax,rax
cpuid
rdtsc
shl rdx,32
or rax,rdx
sub rax,tmcb__qw[0]
sub rax,tmcb__qw[8]
mov tmcb__qw[0],rax
call GetCurrentProcess
mov rdx,NORMAL_PRIORITY_CLASS
mov rcx,rax
call SetPriorityClass
IFDEF _EMMS
EMMS
ENDIF
finit
fild tmcb__qw[0]
fild tmcb__nLoops
fdiv
fistp tmcb__qw[0]
mov rax,tmcb__qw[0]
ENDM