The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: bomz on July 04, 2011, 10:00:07 PM

Title: LEA
Post by: bomz on July 04, 2011, 10:00:07 PM
(http://smiles.kolobok.us/he_and_she/girl_impossible.gif)
Somebode use it? Say a few word about LEA.
String2Dword proc uses ecx edi ebx edx esi String:DWORD

mov esi, String
xor eax, eax
xor ecx, ecx
@@:
mov cl, byte ptr[esi]
cmp cl, 0
jz @F

mov ebx, eax
shl eax, 2
add eax, ebx

shl eax, 1

sub cl, 48
add eax, ecx
add esi, 1
jmp @B
@@:
        ret

String2Dword endp



.386

.model flat, stdcall
option casemap :none

include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib

.data
form db "Number: %u", 0
String db "4294967295",0

.data?
buffer db 512 dup(?)

.code
start:
lea esi, String
xor eax, eax
xor ecx, ecx
@@:
mov edx, eax
mov cl, byte ptr[esi]
cmp cl, 0
jz @F
lea eax, [4*EAX+EDX] ;1 tick!!!
add esi, 1
lea eax, [2*EAX-48+ECX] ;1 tick!!!

jmp @B
@@:

invoke wsprintf,ADDR buffer,ADDR form,eax
invoke MessageBox,0,ADDR buffer,0,MB_ICONASTERISK
invoke ExitProcess,0
end start


1.5 times quicker in ticks
Title: Re: LEA
Post by: dedndave on July 04, 2011, 11:12:22 PM
mov esi, String
lea esi, String

not the same thing

mov esi, String
loads ESI with the dword value at String

lea esi, String
loads ESI with the address of String

mov esi, offset String
loads ESI with the address of String
Title: Re: LEA
Post by: bomz on July 04, 2011, 11:15:23 PM
This I know. I am talking about that LEA may do for 1 tick one shl-shr, two add-sub, and move result to register. I never know about this and not sure how to use. I am tring - it working. but correctly?

may be not for one tick but quickly

lea eax, [2*EAX-48+ECX]
Title: Re: LEA
Post by: redskull on July 04, 2011, 11:32:25 PM
LEA is just a way to make the CPU perform its fancy effective address calculation, i.e. "mov eax, [displacement+base+index*scale]", without actually moving anything in or out of memory.  It's most effective when the address requires the fancy calulations; without it, you would have to manually perform the additions and multiplcations using the ADD and MUL commands.  If you are just talking about regular, non-stack, non-array, data section values, LEA is equivelent to a MOV OFFSET. 

The "trick" is that if you have to perform two additions and a multiply (by the allowed values), you don't have to be "calculating" and address at all; you can use it to calculate any result you care to know, and use it as a "super adder" instruction. 

-r
Title: Re: LEA
Post by: bomz on July 04, 2011, 11:35:53 PM
the situation becomes clear


AGI  PPlain  PMMX -- ????
Title: Re: LEA
Post by: redskull on July 05, 2011, 12:10:06 AM
Quote from: bomz on July 04, 2011, 11:35:53 PM
AGI  PPlain  PMMX -- ????

Are you really optimizing for P5?  Either way, an "Address Generation Interlock" happens when the CPU needs the value of one of the registers to calculate the address, but the result isn't ready.  Because P5's use "pairing" and not uOps, you obviously can't execute an instruction that calculates the value of a register at the same time as an instruction (i.e. LEA) that needs that value.  New CPU's don't really have it (well, just have it in other forms).

-r
Title: Re: LEA
Post by: bomz on July 05, 2011, 12:17:54 AM
Quote

AGI  PPlain  PMMX

Are you really optimizing for P5? 

(http://smiles.kolobok.us/light_skin/unknw.gif)

I see it working and more effective, but what it is this P5 AGI  PPlain  PMMX I don't now.

This not working on Pentium 4? I have Pentium 4
Title: Re: LEA
Post by: bomz on July 05, 2011, 12:20:40 AM
lea eax, [2*EAX-48+ECX]
this have sence or leave
mov ebx, eax
shl eax, 2
add eax, ebx
shl eax, 1
sub cl, 48
add eax, ecx


?????????????????????????
Title: Re: LEA
Post by: redskull on July 05, 2011, 12:40:56 AM
Quote from: bomz on July 05, 2011, 12:17:54 AM
I see it working and more effective, but what it is this P5 AGI  PPlain  PMMX I don't now.
This not working on Pentium 4? I have Pentium 4

P5 - CPU microarchitecture including the original Pentium chip (PPlain) and the Pentium with MMX extensions (PMMX).  AGI is the Address Generation Interlock previously described.

Pentium 4 (PIV) is built on the "Netburst" microarchitecture, which also includes the Pentium D.  I personally have very little experience with Netburst, so another member would be better equipped to answer the question.  I would assume it doesn't, as I believe it uses the same basic uop setup as other P6-based chips.

-r
Title: Re: LEA
Post by: raymond on July 05, 2011, 03:22:20 AM
A few suggestions.

Instead of:
   cmp cl, 0
    jz @F
Do:
   sub cl,48
   jc  @F

That would exit the conversion with any character having an ascii value lower than "0". If you ever intend that other unknown persons could use your app, you should also add the following for error checking so that conversion would stop with any non-numerical character input:
   cmp cl,9
   ja  @F

Also, instead of:
   mov edx, eax
   lea eax, [4*EAX+EDX]

You can do:
   lea eax,[eax*4+eax]

and, since cl would already be converted to binary, you would only need:
   lea eax,[eax*2+ecx]
resulting in a reduction of code size by 4 bytes. :clap:
Title: Re: LEA
Post by: bomz on July 05, 2011, 08:21:36 AM
bytes... I have HDD 500 g and memory 2.5  g. How many it's need's tick's. It's better add 10 mb but do the same 10 times quickly


lea eax, [4*EAX+EAX] - This work, I didn't know is it possible to use one register if this does not violate the rule that in previous tick's the same register call the proccesor pause. if you need bytes use mul 10


   lea eax, [4*EAX+EAX] - my proccessor do this slightly quickly than    lea eax, [4*EAX+EDX], and after shl the value of eax don't change
Title: Re: LEA
Post by: bomz on July 05, 2011, 09:50:34 AM
lea edx, String
xor eax, eax
xor ecx, ecx
@@:
mov cl, byte ptr[edx]
sub cl, 48
jc @F
lea eax, [4*EAX+EAX]
add edx, 1
lea eax, [2*EAX+ECX]


for this code enough register's which usually destroy by Windows API, so it don't need push pop in many cases
Title: Re: LEA
Post by: bomz on July 05, 2011, 06:48:38 PM
Strange but there is very little information about this LEA using. as I understand processor have arithmetic part, but command lea do the same - count real address, so we can use it for count some restricted arithmetic operation and it do it quickly because don't change flags for ex. I can't find any mentions about using the same register in operation, but really it work. How it influence - the using the same register which use in previous ticks may call processor stop - unknown. May be this work only on my old P4.
Title: Re: LEA
Post by: jj2007 on July 05, 2011, 07:02:55 PM
Quote from: raymond on July 05, 2011, 03:22:20 AM
A few suggestions.
...
You can do:
   lea eax,[eax*4+eax]

and, since cl would already be converted to binary, you would only need:
   lea eax,[eax*2+ecx]
resulting in a reduction of code size by 4 bytes. :clap:


Ray,
That looks damn close to my favourite (read: fastest) ascii to float algo. Here is its first innermost loop:

QuoteIsDot1:   inc esi
   mov ecx, edx   ; first zero, then dotpos, if any
align 8         ; this loop is align 8 by default
   .Repeat
      movzx ebx, byte ptr [esi]   ; much faster than mov bl on P4 and Celeron M
      cmp ebx, "."
      je IsDot1
      cmp ebx, "9"   ; faster than cmp bl
      ja Done
      sub ebx, "0"   ; could move up, saves one byte with test ebx, ebx below but is ca. 1% slower
      js Done
      lea eax, [eax+4*eax]   ; *5 - imul much slower
      inc edx    ; dot pos count
      lea eax, [2*eax+ebx]   ; *5, plus new byte (...+ebx-48 plus cmp instead of sub: slower on CM)
      inc esi
   .Until edx>=8   ; zero flag set
   dec esi
Done:   ...; follows FPU part
[/color]
Title: Re: LEA
Post by: bomz on July 05, 2011, 07:31:04 PM
my P4 do with lea 1.5 quicker than without

QuoteField   Value
CPU Properties   
CPU Type   Intel Pentium 4, 2266 MHz (17 x 133)
CPU Alias   Northwood
CPU Stepping   C1
Instruction Set   x86, MMX, SSE, SSE2
Original Clock   2266 MHz
Min / Max CPU Multiplier   17x / 17x
Engineering Sample   No
L1 Trace Cache   12K Instructions
L1 Data Cache   8 KB
L2 Cache   512 KB  (On-Die, ECC, ATC, Full-Speed)


lea edx, String
xor eax, eax
;xor ecx, ecx
@@:
movzx ecx, byte ptr[edx]
sub cl, 48
jc @F
lea eax, [4*EAX+EAX]
add edx, 1
lea eax, [2*EAX+ECX]


any code may be optimized (http://smiles.kolobok.us/light_skin/girl_haha.gif)
Title: Re: LEA
Post by: qWord on July 05, 2011, 07:41:32 PM
optimized for speed, but also anti-optimized for data validation  :wink
Title: Re: LEA
Post by: bomz on July 05, 2011, 07:44:14 PM
speed important , data is bigger than really need. the asm operation system Kolibri need 1 floppy disk. if Microsoft use wider asm all HDD CD DVD Flash .... manufacturers lost it's work
data was important 8086 with segment's
Title: Re: LEA
Post by: qWord on July 05, 2011, 07:50:50 PM
Data validation (http://en.wikipedia.org/wiki/Data_validation)
Title: Re: LEA
Post by: bomz on July 05, 2011, 08:06:40 PM
Hm.
in masm examples
AsciiBase proc uses  esi InPut:DWORD
;INVOKE     AsciiBase, addr szBuff0

         xor     eax, eax
         mov     esi, InPut
         xor     ecx, ecx
         xor     edx, edx
         mov     al, [esi]
         inc     esi
      .while al != 0
            sub     al, '0'          ; Convert to bcd
            lea     ecx, [ecx+ecx*4] ; ecx = ecx * 5
            lea     ecx, [eax+ecx*2] ; ecx = eax + old ecx * 10
            mov     al, [esi]
            inc     esi
      .endw
         lea     eax, [ecx+edx]     ; Move to eax
         ret

AsciiBase endp

in Izcelion tutorials
   String2Dword proc uses ecx edi edx esi String:DWORD
     LOCAL Result:DWORD

     mov Result,0
     mov edi,String
     invoke lstrlen,String
     .while eax!=0
       xor edx,edx
       mov dl,byte ptr [edi]
       sub dl,"0"
       mov esi,eax
       dec esi
       push eax
       mov eax,edx
       push ebx
       mov ebx,10
       .while esi > 0
         mul ebx
         dec esi
       .endw
       pop ebx
       add Result,eax
       pop eax
       inc edi
       dec eax
     .endw
     mov eax,Result
     ret
   String2Dword endp


Here and data and time...
Title: Re: LEA
Post by: jj2007 on July 05, 2011, 08:43:25 PM
Quote from: bomz on July 05, 2011, 07:31:04 PM
lea edx, String
xor eax, eax
;xor ecx, ecx
@@:
movzx ecx, byte ptr[edx]
sub cl, 48
jc @F
lea eax, [4*EAX+EAX]
add edx, 1
lea eax, [2*EAX+ECX]


any code may be optimized (http://smiles.kolobok.us/light_skin/girl_haha.gif)

Looks clever but is 2% slower on my CPU, which is quite a lot. "inc slower than add" is valid for old CPUs only. And as qWord remarked already, it's not so wise to drop the data check.
Title: Re: LEA
Post by: bomz on July 05, 2011, 08:46:56 PM
my processor do inc slower. so I use add. as I read inc do quicker only 386. may be. about movzx I don't know
Title: Re: LEA
Post by: dedndave on July 06, 2011, 03:28:20 AM
i am finding that DEC/JZ depends on the surrounding instructions
in some cases, it may be 1 cycle slower
in others, it may be 10 cycles slower
if you put an unrelated instruction in there, it may not be slower at all
if it makes enough difference to keep the loop under 128 bytes, it is probably faster
this is measured on my P4, which is old, nowdays
        dec     ecx
        mov     edx,SomeValue
        jnz     top_of_loop

still, there is something to be said for optimizing on an old P4
there are a lot of them still in use
and code should run faster, in general, on newer machines
Title: Re: LEA
Post by: bomz on July 07, 2011, 12:06:57 PM
lea eax, [EAX*4] - two time slower than
lea eax, [EDX*4]
Title: Re: LEA
Post by: bomz on July 07, 2011, 12:08:19 PM
Quote from: bomz on July 07, 2011, 12:06:57 PM
lea eax, [EAX*4] - two time slower than
lea eax, [EDX*4]

mov edx, String
xor eax, eax
@@:
movzx ecx, byte ptr[edx]
sub cl, 48
jc @F
lea ebx, [4*EAX+EAX]
add edx, 1
lea eax, [2*EBX+ECX]
jmp @B
@@:
Title: Re: LEA
Post by: jj2007 on July 07, 2011, 01:23:54 PM
Quote from: bomz on July 07, 2011, 12:06:57 PM
lea eax, [EAX*4] - two time slower than
lea eax, [EDX*4]

That's miraculous :U
Which CPU?

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
38      cycles for lea ebx, res=123456789
38      cycles for lea eax, res=123456789
Title: Re: LEA
Post by: bomz on July 07, 2011, 01:30:55 PM
I continue GOOGLED and can't find any example there use the same register in brackets. Why? may be that if used in lea register which was change in the previous ticks may cause processor pause.

QuoteField   Value
CPU Properties   
CPU Type   Intel Pentium 4, 2266 MHz (17 x 133)
CPU Alias   Northwood
CPU Stepping   C1
Instruction Set   x86, MMX, SSE, SSE2
Original Clock   2266 MHz
Min / Max CPU Multiplier   17x / 17x
Engineering Sample   No
L1 Trace Cache   12K Instructions
L1 Data Cache   8 KB
L2 Cache   512 KB  (On-Die, ECC, ATC, Full-Speed)
Title: Re: LEA
Post by: bomz on July 07, 2011, 02:35:49 PM
(http://xmages.net/storage/10/1/0/b/0/upload/77603a71.png)
(http://xmages.net/storage/10/1/0/b/0/upload/2d6ed4e8.png)
(http://xmages.net/storage/10/1/0/b/0/upload/7dacb40b.png)
Title: Re: LEA
Post by: dedndave on July 07, 2011, 02:59:10 PM
Pentium 4 Prescott (2005+), MMX, SSE3
1659 1748 1665 1721 1697
1646 1629 1673 1650 1697
2159 2202 2168 2134 2119
Title: Re: LEA
Post by: bomz on July 07, 2011, 03:04:19 PM
may cause processor stop - I don't know how it translate correctly and what it's mean too
(http://xmages.net/storage/10/1/0/6/e/upload/ca625841.png)
Title: Re: LEA
Post by: dedndave on July 07, 2011, 03:09:06 PM
it means the problem is worse on your itanium than on a P4
it is an interesting case
something i will watch for

in a project i am working on, i use...
        lea     edx,[esi+edx+2]
notice - no multiplication, here
i used it because the registers are full - i may juggle things around a bit   :P

note:
the empty loop stalls, so you can more or less ignore that set of numbers
Title: Re: LEA
Post by: bomz on July 07, 2011, 03:12:23 PM
try

lea     edx,[edx+esi+2]
Title: Re: LEA
Post by: bomz on July 07, 2011, 03:17:31 PM
(http://xmages.net/storage/10/1/0/c/6/upload/5901a7fa.png)
(http://xmages.net/storage/10/1/0/c/6/upload/d81509ca.png)
(http://xmages.net/storage/10/1/0/c/6/upload/871beed3.png)

not use eax in lea
Title: Re: LEA
Post by: bomz on July 07, 2011, 03:20:47 PM
*
Title: Re: LEA
Post by: ERNST on July 07, 2011, 04:15:07 PM
That's interesting.

Q6600:
QuoteCore Duo (2006+), MMX, SSE3
1033 1033 1034 1032 1033
1030 1032 1030 1032 1031
1031 1031 1032 1028 1032
Title: Re: LEA
Post by: bomz on July 07, 2011, 04:19:56 PM
code must consider old processor's first
Title: Re: LEA
Post by: dedndave on July 08, 2011, 02:09:04 AM
not everyone will agree with you on that   :P
there are good and bad things, either way
Title: Re: LEA
Post by: bomz on July 08, 2011, 12:35:17 PM
mov<movzx  ~2
Title: Re: LEA
Post by: qWord on July 08, 2011, 01:04:50 PM
Quote from: bomz on July 08, 2011, 12:35:17 PM
mov<movzx  ~2
apples <> oranges
using movzx avoid problems with partial register accesses.
You will find this information in both, Intel's and AMD's optimization manuals.
Title: Re: LEA
Post by: bomz on July 08, 2011, 03:11:02 PM
xor ecx, ecx
@@:
mov cl, byte ptr [edx]

quickly even if string 1 sing
Title: Re: LEA
Post by: hutch-- on July 08, 2011, 03:35:15 PM
bomz,

On most recent processors MOVZX is faster than XOR / MOV CL, [EDX]. You need to go back to a PIII to see MOVZX slower.
Title: Re: LEA
Post by: bomz on July 08, 2011, 03:38:28 PM
I would back, but I have P4

xor do 1 time, mov each cycle
Title: Re: LEA
Post by: MichaelW on July 08, 2011, 04:52:38 PM
In my tests on my P3 movzx is ~1.5x faster.

;==============================================================================
    include \masm32\include\masm32rt.inc
    .686
    include \masm32\macros\timers.asm
;==============================================================================
printf MACRO format:REQ, args:VARARG
    IFNB <args>
        invoke crt_printf, cfm$(format), args
    ELSE
        invoke crt_printf, cfm$(format)
    ENDIF
    EXITM <>
ENDM
;==============================================================================
    .data
        x db 100 dup(0)
    .code
;==============================================================================
start:
;==============================================================================

    mov esi, OFFSET x

    invoke Sleep, 3000

    REPEAT 3

        counter_begin 1000, HIGH_PRIORITY_CLASS
            mov edi, 16
          @@:
            sub edi, 1
            jnz @B
        counter_end
        printf( "%d cycles, loop only\n", eax )

        counter_begin 1000, HIGH_PRIORITY_CLASS
            mov edi, 16
            xor eax, eax
            xor ebx, ebx
            xor ecx, ecx
            xor edx, edx
          @@:
            mov al, BYTE PTR [esi+edi]
            mov bl, BYTE PTR [esi+edi+1]
            mov cl, BYTE PTR [esi+edi+2]
            mov dl, BYTE PTR [esi+edi+3]
            sub edi, 1
            jnz @B
        counter_end
        printf( "%d cycles, xor + mov byte ptr\n", eax )

        counter_begin 1000, HIGH_PRIORITY_CLASS
            mov edi, 16
          @@:
            movzx eax, BYTE PTR [esi+edi]
            movzx ebx, BYTE PTR [esi+edi+1]
            movzx ecx, BYTE PTR [esi+edi+2]
            movzx edx, BYTE PTR [esi+edi+3]
            sub edi, 1
            jnz @B
        counter_end
        printf( "%d cycles, movzx\n\n", eax )

    ENDM

    inkey "Press any key to exit..."
    exit
;==============================================================================
end start


37 cycles, loop only
85 cycles, xor + mov byte ptr
68 cycles, movzx

37 cycles, loop only
85 cycles, xor + mov byte ptr
68 cycles, movzx

37 cycles, loop only
85 cycles, xor + mov byte ptr
68 cycles, movzx

Title: Re: LEA
Post by: bomz on July 08, 2011, 05:29:39 PM
xor do 1 time, mov each cycle
Title: Re: LEA
Post by: MichaelW on July 08, 2011, 05:38:43 PM
OK, so movzx is still ~1.5x faster.
Title: Re: LEA
Post by: bomz on July 08, 2011, 05:47:51 PM
mov (http://i034.radikal.ru/1107/98/9e069a538617.gif)
Title: Re: LEA
Post by: dedndave on July 08, 2011, 06:56:11 PM
i had a case the other day.....

EDX was already 0 as a result of code at the bottom of the loop
i only had to zero it in loop init code
at the top of the loop, i needed the byte in CH extended to a dword, so...
        mov     dl,ch

it was faster to do this....
        movzx   edx,ch
Title: Re: LEA
Post by: jj2007 on July 08, 2011, 07:06:57 PM
My Celeron doesn't care...

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
48      cycles for REP100 mov
46      cycles for REP100 zx
172     cycles for 100*mov (with dec ecx & jne)
171     cycles for 100*zx
Title: Re: LEA
Post by: bomz on July 08, 2011, 07:13:47 PM
my results was wrong I use dedndave tocks movzx also faster

QuoteItanium 2 (2002+), MMX, SSE2
1234 1237 1234 1231 1232 - empty cycle
1839 1835 1846 1845 1833 - mov
1533 1534 1536 1538 1530 - movzx
Press any key to continue ...

QuoteItanium 2 (2002+), MMX, SSE2
1238 1231 1234 1239 1235 - empty cycle
1535 1535 1537 1531 1540 - sub cl, 48
1560 1558 1564 1553 1563 - sub ecx, 48
Press any key to continue ...
Title: Re: LEA
Post by: bomz on July 08, 2011, 11:11:41 PM
QuoteItanium 2 (2002+), MMX, SSE2
1233 1228 1223 1231 1230 - empty
4021 4021 4020 4021 4020 - shl edx, 3
2114 2109 2114 2112 2113 - lea edx,[eax*8]
Press any key to continue ...

Itanium 2 (2002+), MMX, SSE2
1228 1232 1225 1227 1226 - empty
4021 4021 4020 4020 4020 - shl edx, 3
5027 5026 5027 5027 5027 - lea eax,[eax*8]
Press any key to continue ...

Itanium 2 (2002+), MMX, SSE2
1231 1234 1236 1222 1234 - empty
4021 4021 4020 4020 4020 - shl edx, 2
2112 2106 2116 2113 2115 - lea edx,[eax*4]
Press any key to continue ...

QuoteItanium 2 (2002+), MMX, SSE2
1232 1231 1226 1229 1229 - empty
2031 2031 2028 2031 2027 - lea edi, [2*EAX-1]
6031 6030 6030 6030 6031 - shl edi, 1; sub edi, 1
Press any key to continue ...
Title: Re: LEA
Post by: qWord on July 08, 2011, 11:21:15 PM
Quote from: bomz on July 08, 2011, 11:11:41 PM
Itanium 2 (2002+), MMX, SSE2
wow! - Itanium 2 server or workstation as home PC?
Title: Re: LEA
Post by: jj2007 on July 08, 2011, 11:23:41 PM
Strange...
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
190     cycles for shl eax, 2
196     cycles for 100*lea eax, [4*eax]
205     cycles for 100*lea edx, [4*eax]
190     cycles for shl eax, 3
195     cycles for 100*lea eax, [8*eax]
205     cycles for 100*lea edx, [8*eax]
Title: Re: LEA
Post by: bomz on July 08, 2011, 11:28:46 PM
Quote from: qWord on July 08, 2011, 11:21:15 PM
Quote from: bomz on July 08, 2011, 11:11:41 PM
Itanium 2 (2002+), MMX, SSE2
wow! - Itanium 2 server or workstation as home PC?

Pentium 4 - 9 years old I don't know why program recognize it like Itanium
Title: Re: LEA
Post by: bomz on July 08, 2011, 11:30:01 PM
nothing strange
lea MAY call processor stop (pause) if use register which changed in previous ticks - not good translation
Title: Re: LEA
Post by: qWord on July 08, 2011, 11:43:08 PM
Quote from: bomz on July 08, 2011, 11:30:01 PM
nothing strange
lea MAY call processor stop (pause) if use register which changed in previous ticks - not good translation
the processor never stops :wink
Also: the test has no validity - the advantage (e.g. no flag stalls) of LEA may be only visible in context of a (sensfull) instruction-sequence.
Title: Re: LEA
Post by: bomz on July 08, 2011, 11:44:26 PM
I don't know how translate and can't find about LEA in English. In Russian use word which I can translate ~ stop or pause
Title: Re: LEA
Post by: qWord on July 08, 2011, 11:45:58 PM
mabye: wait ?
Title: Re: LEA
Post by: bomz on July 08, 2011, 11:52:51 PM
stop + pause + wait /3


   1. stop
   2. halt
   3. halting
   4. arrester
   5. dog
   6. lock

may be LOCK the best. or HALT

and "in previous ticks" use in sense in previous wave - oscillation

I am not programmer, my English very poor and I don't know special Computer English, and this back translation eng-rus-eng
Title: Re: LEA
Post by: redskull on July 09, 2011, 12:27:39 AM
"Stall". Though it keeps on executing other stuff
Title: Re: LEA
Post by: bomz on July 09, 2011, 01:09:53 AM
google says Stall is lose speed. redskull you give more prof comment you must know what call the same register without my back translation. strange I can't find anything in english about lea. in russian only few word's and this word is same on different sites

at first 'at previous oscillation' I understand like 'in previous command' so I do:
Quotelea eax, [4*EAX+EAX]
   add edx, 1
   lea eax, [2*EAX+ECX]

but now I see that lea eax, [2*EAX+ECX] also in previous
Title: Re: LEA
Post by: hutch-- on July 09, 2011, 01:19:01 AM
bomz,

LEA was only slow on a PIV, in some instances it was faster to do 2 adds than 1 LEA and this is from the Intel manual. Earlier and later Intel hardware were faster with LEA in most instances.
Title: Re: LEA
Post by: bomz on July 09, 2011, 01:27:56 AM
The main - how use LEA right. Now I'm measured my last code in dedndave ticks count program and optimize it

QuoteItanium 2 (2002+), MMX, SSE2
1233 1227 1229 1235 1230 - empty
8872 8886 8885 8897 8877 - mov eax, value; mov value, eax
14120 14121 14118 14119 14119 - push value; pop value
Press any key to continue ...
Title: Re: LEA
Post by: hutch-- on July 09, 2011, 04:45:47 AM
bomz,

Try a few of these and you can experiment with more.


    lea eax, [eax+eax]      ; *2
    lea eax, [eax+eax*2]    ; *3
    lea eax, [eax*4]        ; *4
    lea eax, [eax+eax*4]    ; *5

    lea eax, [eax+eax*2]    ;
    lea eax, [eax+eax]      ; *6

    mov ecx, eax
    lea eax, [eax*8]        ; *7
    sub eax, ecx

    lea eax, [eax*8]        ; *8
    lea eax, [eax+eax*8]    ; *9

    lea eax, [eax+eax*4]    ;
    lea eax, [eax+eax]      ; *10

    lea eax, [eax+eax*2]    ;
    lea eax, [eax*4]        ; *12

    lea eax, [eax+eax*4]    ;
    lea eax, [eax+eax*2]    ; *15

    lea eax, [eax*4]        ;
    lea eax, [eax*4]        ; *16

    lea eax, [eax+eax*8]    ;
    lea eax, [eax+eax]      ; *18
Title: Re: LEA
Post by: bomz on July 09, 2011, 01:26:00 PM
QuoteItanium 2 (2002+), MMX, SSE2
1231 1231 1229 1226 1229 - empty
5027 5028 5026 5027 5027 - lea eax, [eax+eax*2]; lea eax, [eax+eax]
5027 5027 5027 5027 5027 - lea edx, [eax+eax*2]; lea eax, [edx+edx]
Press any key to continue ...

Itanium 2 (2002+), MMX, SSE2
1228 1234 1231 1231 1230 - empty
5027 5028 5027 5027 5027 - lea eax, [eax+eax*2]; lea eax, [eax+eax]
5033 5032 5033 5032 5032 - mov edx, eax; lea edx, [edx+eax*2] ;lea eax, [edx+edx]      ; *6
Press any key to continue ...

Itanium 2 (2002+), MMX, SSE2
1228 1226 1227 1227 1231 - empty
5028 5028 5028 5026 5026 - lea eax, [eax+eax*2]; lea eax, [eax+eax]
5663 5662 5659 5665 5663 - mov ecx, eax; lea edx, [ecx+eax*2]; mov ecx, edx; lea eax, [ecx+edx]
Press any key to continue ...

Itanium 2 (2002+), MMX, SSE2
1234 1228 1228 1234 1229 - empty
5028 5028 5027 5026 5027 - lea eax, [eax+eax*2]; lea eax, [eax+eax]
10064 10064 10065 10064 10064 -  mov edx, eax; shl eax, 1; add eax, edx; shl eax, 1
Press any key to continue ...

Itanium 2 (2002+), MMX, SSE2
1229 1226 1230 1229 1229 - empty
5028 5028 5028 5027 5027 - lea eax, [eax+eax*2]; lea eax, [eax+eax]
9067 9067 9068 9069 9068 - lea edx, [eax+eax*2]; lea eax, [edx*2]
Press any key to continue ...


(http://smiles.kolobok.us/light_skin/unknw.gif)
QuoteItanium 2 (2002+), MMX, SSE2
1236 1232 1253 1225 1234 - empty
1497 1511 1496 1494 1502 - lea     edx,[4*eax+eax]
3997 3995 3997 3996 3996 - lea     eax,[4*eax+eax]
Press any key to continue ...

Itanium 2 (2002+), MMX, SSE2
1226 1226 1234 1229 1235 - empty
8053 8056 8057 8054 8053 - lea eax, [4*EAX+EAX]; add edx, 1; lea eax, [2*EAX+ECX]
8052 8051 8051 8051 8049 - lea ebx, [4*EAX+EAX]; add edx, 1; lea eax, [2*EBX+ECX]
Press any key to continue ...

QuoteItanium 2 (2002+), MMX, SSE2
1227 1231 1228 1238 1233 - empty
8054 8056 8053 8054 8053 - lea eax, [4*EAX+EAX]; lea eax, [4*EAX+EAX]
2109 2111 2111 2113 2112 - lea eax, [4*EBX+ECX]; lea eax, [4*EDX+EDI]
Press any key to continue ...
(http://smiles.kolobok.us/light_skin/swoon.gif)
QuoteItanium 2 (2002+), MMX, SSE2
1233 1225 1238 1226 1226 - empty
8058 8064 8059 8055 8054 - lea eax, [4*EAX+EAX]; lea eax, [2*EAX+ECX]
8052 8049 8052 8055 8049 - lea eax, [4*EAX+EAX]; lea ebx, [esi+edi]; lea eax, [2*EAX+ECX]
Press any key to continue ...

QuoteItanium 2 (2002+), MMX, SSE2
1233 1224 1232 1230 1239 - empty
8054 8057 8057 8054 8054 - lea eax, [4*EAX+EAX]; lea eax, [2*EAX+ECX]
8051 8053 8060 8051 8054 - lea eax, [4*EAX+EAX]; lea ebx, value; lea eax, [2*EAX+ECX]
Press any key to continue ...

QuoteItanium 2 (2002+), MMX, SSE2
1229 1236 1231 1232 1239 - empty
8056 8057 8055 8055 8054 - lea eax, [4*EAX+EAX]; lea eax, [2*EAX+ECX]
9058 9058 9058 9058 9058 - lea ebx, [4*EAX+EAX]; mov edx, ebx; lea eax, [2*EDX+ECX] (http://s19.radikal.ru/i192/0912/a4/38f1287302ef.gif)
Press any key to continue ...

(http://smiles.kolobok.us/light_skin/scratch_one-s_head.gif)
Title: Re: LEA
Post by: hutch-- on July 09, 2011, 02:25:53 PM
bomz,

An Itanium 2 is not a native x86 architecture, are you running an Itanium in x86 emulation mode ?
Title: Re: LEA
Post by: bomz on July 09, 2011, 02:31:17 PM
QuoteItanium 2 (2002+), MMX, SSE2
1228 1228 1229 1231 1230 - empty
8054 8057 8055 8053 8053 - lea eax, [4*EAX+EAX]; lea eax, [2*EAX+ECX]
4034 4034 4034 4034 4034 - lea eax, [4*EAX+EAX]; mov ebx, eax; lea ebx, [2*EBX+ECX]
Press any key to continue ...
(http://smiles.kolobok.us/light_skin/dance4.gif)
Title: Re: LEA
Post by: bomz on July 09, 2011, 02:33:07 PM
I have Pentium 4. Program recognize it like Itanium I don't know why

(http://xmages.net/storage/10/1/0/6/c/upload/d6e953ff.png)

QuoteItanium 2 (2002+), MMX, SSE2
1227 1234 1230 1230 1226 - empty
8062 8070 8062 8063 8114 - lea eax, [4*EAX+EAX]; lea eax, [2*EAX+ECX]
9063 9062 9058 9059 9057 - lea eax, [4*EAX+EAX]; mov ebx, eax; lea eax, [2*EBX+ECX]
Press any key to continue ...

I am not already catch regularity

Quotelea ebx, [4*EAX+EAX]
   mov edx, ebx
   lea ebx, [2*EDX+ECX]
- 3000
Title: Re: LEA
Post by: hutch-- on July 09, 2011, 03:24:20 PM
Your OS is giving the correct processor ID, its a Northwood core PIV, not an Itanium. An Itanium is a 64 bit native processor with a different instruction set. They are mainly useful for large processor count parallel processing. I still have a 2.8 gig Northwood processor running and they were a bit easier to code for than the later Prescott cores.
Title: Re: LEA
Post by: bomz on July 09, 2011, 03:27:29 PM
I begin testing all code string-to-dword. there is no difference use lea or not. any combination of registers gives the same result to convert 100 to 100 need about 18000 program ticks
Title: Re: LEA
Post by: dedndave on July 09, 2011, 03:46:23 PM
Quoteconvert 100 to 100 need about 18000 program ticks

ouch !!!!

bomz - don't be concerned about the improper CPU identification
it is an old version of Jochen's ShowCpu function   :P
Title: Re: LEA
Post by: bomz on July 09, 2011, 03:55:10 PM
need 18000 p ticks for doing it 1000 times  :lol

lea ebx, [4*EAX+EAX]
   mov edx, ebx
   lea ebx, [2*EDX+ECX] this need 3000 but when put in s2w cycle it became 8000
Title: Re: LEA
Post by: jj2007 on July 09, 2011, 04:04:14 PM
Quote from: dedndave on July 09, 2011, 03:46:23 PM
bomz - don't be concerned about the improper CPU identification
it is an old version of Jochen's ShowCpu function   :P

Bomz, what CPU does it show with the attachment of reply #50?
Title: Re: LEA
Post by: bomz on July 09, 2011, 04:09:23 PM
with the attachment of reply #50?  - ???????????????
Title: Re: LEA
Post by: dedndave on July 09, 2011, 04:19:21 PM
reply #50...

http://www.masm32.com/board/index.php?topic=16988.msg141804#msg141804
Title: Re: LEA
Post by: bomz on July 09, 2011, 04:22:36 PM
(http://xmages.net/storage/10/1/0/2/3/upload/a6344494.png)
Title: Re: LEA
Post by: bomz on July 09, 2011, 06:42:43 PM
Quote   lea ebx, [4*EAX+EAX]
   mov edx, ebx
   lea ebx, [2*EDX+ECX]  - 3000

Quote   lea ebx, [4*EAX+EAX]
   mov edx, ebx
   lea ebx, [2*EDX+ECX]
   mov eax, ebx - 10000

(http://smiles.kolobok.us/light_skin/vava.gif)
Title: Re: LEA
Post by: dedndave on July 09, 2011, 08:22:06 PM
        lea     edx,[4*eax+eax]
        lea     ebx,[2*edx+ecx]
        mov     eax,ebx


it would help if you could place unrelated instructions in between..
        lea     edx,[4*eax+eax]
        mov     esi,offset SomeData
        lea     ebx,[2*edx+ecx]
        mov     edi,offset SomeOtherData
        mov     eax,ebx
Title: Re: LEA
Post by: bomz on July 09, 2011, 08:51:04 PM
I try LEA NOP MOV ADD. (lea ebx, [ebx])
this method don't still register's for cycle
Title: Re: LEA
Post by: qWord on July 09, 2011, 08:58:18 PM
Quote from: bomz on July 09, 2011, 08:51:04 PM
I try LEA NOP MOV ADD.
you may try to test a useful algorithm!
Title: Re: LEA
Post by: bomz on July 09, 2011, 09:51:26 PM
(http://i071.radikal.ru/0903/41/dfb69c20a179.gif)(http://s02.radikal.ru/i175/0911/08/609553aad438.gif)

may be somebody with good or native English write latter to Intel?
Title: Re: LEA
Post by: qWord on July 09, 2011, 10:17:24 PM
Quote from: bomz on July 09, 2011, 09:51:26 PMmay be somebody with good or native English write latter to Intel?
and whats the message for Intel?
Title: Re: LEA
Post by: bomz on July 09, 2011, 10:27:55 PM
(http://s39.radikal.ru/i083/0811/2e/02c543209105.gif) how lea work for arithmetic
Title: Re: LEA
Post by: qWord on July 09, 2011, 10:34:29 PM
Quote from: bomz on July 09, 2011, 10:27:55 PM
(http://s39.radikal.ru/i083/0811/2e/02c543209105.gif) how lea work for arithmetic
strange question in context to your previous posts - However, you should read Intel's Documentation (http://www.intel.com/products/processor/manuals/) before asking.
Title: Re: LEA
Post by: bomz on July 09, 2011, 10:58:03 PM
if information about LEA was known I will find it with Google.
Title: Re: LEA
Post by: jj2007 on July 09, 2011, 11:01:52 PM
Quote from: bomz on July 09, 2011, 09:51:26 PM
(http://i071.radikal.ru/0903/41/dfb69c20a179.gif)(http://s02.radikal.ru/i175/0911/08/609553aad438.gif)

may be somebody with good or native English write letter to Intel?

Hi Bomz,

I admire your icons, they are really cute and funny. Do you create them yourself?
:U
Title: Re: LEA
Post by: bomz on July 09, 2011, 11:18:33 PM
http://s50.radikal.ru/i130/0908/bb/dd7c8a96f9a4.gif
http://s48.radikal.ru/i122/0908/2f/75b2c3fd84e1.gif
http://s45.radikal.ru/i109/0908/0a/595bfd0094d3.gif
http://s61.radikal.ru/i171/0908/ea/31a906ad77e3.gif

Only this I do from other's.

http://www.kolobok.us/ Home Page
http://www.en.kolobok.us/
https://addons.mozilla.org/ru/firefox/addon/kolobok-smiles-for-firefox/   Mozillla addon

http://s61.radikal.ru/i172/1107/09/eadc51a23f19.gif

http://www2.cbox.ws/box/?boxid=1920984&boxtag=4466&sec=smilies
Title: Re: LEA
Post by: bomz on July 09, 2011, 11:34:33 PM
http://www.en.kolobok.us/download.php?view.15

Support:

KOLOBOK Smiles for Firefox
KOLOBOK Smiles for Opera 9
KOLOBOK Smiles for Internet Explorer
KOLOBOK Smiles for Google Chrome
Title: Re: LEA
Post by: redskull on July 10, 2011, 11:55:37 AM
I haven't looked at the actual code, but anytime you use the same registers for input and output, you'll get a dependency chain. Basically, it can't start the second lea until it knows the output from the first, and can't start the third until its done second, all the way up to 100. With different registers, the value of eax doesn't change, it doesn't have to wait on anything to do any of the lea instructions. But again, that's based on just a cursory look at the results.

-r
Title: Re: LEA
Post by: jj2007 on July 10, 2011, 03:28:37 PM
Quote from: redskull on July 10, 2011, 11:55:37 AM
I haven't looked at the actual code, but anytime you use the same registers for input and output, you'll get a dependency chain. Basically, it can't start the second lea until it knows the output from the first

red,

that sounds plausible, and bomz' P4 behaves like that. My Celeron M, in contrast, couln't care less which regs are involved :bg
Title: Re: LEA
Post by: redskull on July 10, 2011, 03:57:01 PM
Quote from: jj2007 on July 10, 2011, 03:28:37 PM
that sounds plausible, and bomz' P4 behaves like that. My Celeron M, in contrast, couln't care less which regs are involved :bg

Like hutch said, breaking the LEA down into additions and multiplcations is probably what does it.  Per the timings, an M only has a latency of 1 for LEA, but the PIV has a latency of 4.

-r
Title: Re: LEA
Post by: bomz on July 11, 2011, 01:13:58 AM
Quote from: redskull on July 10, 2011, 11:55:37 AM
I haven't looked at the actual code, but anytime you use the same registers for input and output, you'll get a dependency chain. Basically, it can't start the second lea until it knows the output from the first, and can't start the third until its done second, all the way up to 100. With different registers, the value of eax doesn't change, it doesn't have to wait on anything to do any of the lea instructions. But again, that's based on just a cursory look at the results.

-r

and how break this chain?

Quote   lea ebx, [4*EAX+EAX]
   mov edx, ebx
   lea ebx, [2*EDX+ECX]  - 3000

Quote   lea ebx, [4*EAX+EAX]
   mov edx, ebx
   lea ebx, [2*EDX+ECX]
   mov eax, ebx - 10000

This LEA using for matrix?
Title: Re: LEA
Post by: dedndave on July 11, 2011, 03:14:24 AM
sometimes, you can't   :(
sometimes, you can interweave instructions for one operation with another