News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Meta branch predictor Core2?

Started by theunknownguy, June 24, 2010, 08:14:28 AM

Previous topic - Next topic

qWord

Quote from: theunknownguy on June 29, 2010, 09:34:15 PM
Got me on that, then how it could calculate:

LEA EAX, [EDX]

Without memory access?
this is the same as:
mov eax,edx
FPU in a trice: SmplMath
It's that simple!

theunknownguy

Quote from: qWord on June 29, 2010, 09:36:51 PM
Quote from: theunknownguy on June 29, 2010, 09:34:15 PM
Got me on that, then how it could calculate:

LEA EAX, [EDX]

Without memory access?
this is the same as:
mov eax,edx

what is the real intention of having LEA then? (Regarding to aligment, apart of filling more bytes?)

qWord

Quote from: theunknownguy on June 29, 2010, 09:38:29 PMwhat is the real intention of having LEA then?
computing the effective addres of memory locations (using ModRM and SIB byte). A typical example are local variables, which are relative to EBP.
FPU in a trice: SmplMath
It's that simple!

theunknownguy

Quote from: qWord on June 29, 2010, 09:41:24 PM
Quote from: theunknownguy on June 29, 2010, 09:38:29 PMwhat is the real intention of having LEA then?
computing the effective addres of memory locations. Typical example are local variables, which are relative to EBP.

Yes, sorry i always correct my post a little after you answer  :lol

But regarding to the aligment:

00401000 >   8D00           LEA EAX,DWORD PTR DS:[EAX]
00401002     8BC0           MOV EAX,EAX

What would be the difference?...

qWord

Quote from: theunknownguy on June 29, 2010, 09:43:21 PMWhat would be the difference?...
other opcode, same operation -> nop.
FPU in a trice: SmplMath
It's that simple!

redskull

The Agner Fog instruction timing manual, under Core2 (65nm), page 31 - 34

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

clive

Quote from: theunknownguy on June 29, 2010, 09:43:21 PM
00401000 >   8D00           LEA EAX,DWORD PTR DS:[EAX]
00401002     8BC0           MOV EAX,EAX

What would be the difference?...

There are the WORD and DWORD forms of lea reg,[reg+0]

00000000 8D4000                 lea     eax,[eax]
00000003 8D642400               lea     esp,[esp]
00000007 8D5200                 lea     edx,[edx]
0000000A 8D8000000000           lea     eax,[eax]
00000010 8DA42400000000         lea     esp,[esp]
00000017 8D9200000000           lea     edx,[edx]
It could be a random act of randomness. Those happen a lot as well.

clive

Quote from: theunknownguy
Got me on that, then how it could calculate:

LEA EAX, [EDX]

Without memory access?

what is the real intention of having LEA then? (Regarding to aligment, apart of filling more bytes?)

There are at least 3 forms of LEA that can be used to expand the size of the opcodes, without materially effecting the speed of execution. I'd recommend using a register you are not using so as not to create a dependency.

It is computing the address that would be accessed, without performing the access. The computation is done by the address computation logic, using simple adders and barrel shifters (for 1x, 2x, 4x, 8x). Given the simplicity of pipelining this computation it's hard to imagine it being sent to a complex ALU.

LEA is less relevant these days as the computational costs are fairly minimal, but if we look at the 8086 the costs of recomputing the assorted index modes was higher. As I recall [si+bx] was 2 cycles more than [bx] for the 8086, but the same on a 386, and [si+bx+1] was only one cycle longer on the 386.

For example in C, calculating a pointer address that is used later.

char *ptr;
char Buffer[0x200];
ptr = (char *)&Buffer[0x100]


would convert to
lea eax,byte ptr [edx]

char *ptr;
long LongBuffer[0x200];
int i = 0x100;
ptr = (char *)&LongBuffer[i];


mov eax,0100h
lea eax,dword ptr [edx + eax*4];


It also has the benefit of NOT changing the flags.


  xor ecx,ecx ; clc
  mov ecx,10 ; 40 byte multi precision addition
@@:
  mov eax,[esi]
  lea esi,[esi+4]
  adc eax,[edi]
  mov [edi],eax
  lea edi,[edi+4]
  dec ecx ; NOT sub ecx,1 which destroys carry
  jnz @B
It could be a random act of randomness. Those happen a lot as well.