LEA

qWord · July 05, 2011, 07:41:32 PM

optimized for speed, but also anti-optimized for data validation :wink

bomz · July 05, 2011, 07:44:14 PM

speed important , data is bigger than really need. the asm operation system Kolibri need 1 floppy disk. if Microsoft use wider asm all HDD CD DVD Flash .... manufacturers lost it's work
data was important 8086 with segment's

qWord · July 05, 2011, 07:50:50 PM

Data validation

bomz · July 05, 2011, 08:06:40 PM

Hm.
in masm examples

Code Select

AsciiBase proc uses  esi InPut:DWORD
;INVOKE     AsciiBase, addr szBuff0

         xor     eax, eax
         mov     esi, InPut
         xor     ecx, ecx
         xor     edx, edx
         mov     al, [esi]
         inc     esi
      .while al != 0
            sub     al, '0'          ; Convert to bcd
            lea     ecx, [ecx+ecx*4] ; ecx = ecx * 5
            lea     ecx, [eax+ecx*2] ; ecx = eax + old ecx * 10
            mov     al, [esi]
            inc     esi
      .endw
         lea     eax, [ecx+edx]     ; Move to eax
         ret

AsciiBase endp

in Izcelion tutorials

Code Select

   String2Dword proc uses ecx edi edx esi String:DWORD
     LOCAL Result:DWORD

     mov Result,0
     mov edi,String
     invoke lstrlen,String
     .while eax!=0
       xor edx,edx
       mov dl,byte ptr [edi]
       sub dl,"0"
       mov esi,eax
       dec esi
       push eax
       mov eax,edx
       push ebx
       mov ebx,10
       .while esi > 0
         mul ebx
         dec esi
       .endw
       pop ebx
       add Result,eax
       pop eax
       inc edi
       dec eax
     .endw
     mov eax,Result
     ret
   String2Dword endp

Here and data and time...

jj2007 · July 05, 2011, 08:43:25 PM

Quote from: bomz on July 05, 2011, 07:31:04 PM
lea edx, String
xor eax, eax
;xor ecx, ecx
@@:
movzx ecx, byte ptr[edx]
sub cl, 48
jc @F
lea eax, [4*EAX+EAX]
add edx, 1
lea eax, [2*EAX+ECX]

any code may be optimized

Looks clever but is 2% slower on my CPU, which is quite a lot. "inc slower than add" is valid for old CPUs only. And as qWord remarked already, it's not so wise to drop the data check.

bomz · July 05, 2011, 08:46:56 PM

my processor do inc slower. so I use add. as I read inc do quicker only 386. may be. about movzx I don't know

dedndave · July 06, 2011, 03:28:20 AM

i am finding that DEC/JZ depends on the surrounding instructions
in some cases, it may be 1 cycle slower
in others, it may be 10 cycles slower
if you put an unrelated instruction in there, it may not be slower at all
if it makes enough difference to keep the loop under 128 bytes, it is probably faster
this is measured on my P4, which is old, nowdays

Code Select

        dec     ecx
        mov     edx,SomeValue
        jnz     top_of_loop

still, there is something to be said for optimizing on an old P4
there are a lot of them still in use
and code should run faster, in general, on newer machines

bomz · July 07, 2011, 12:06:57 PM

lea eax, [EAX*4] - two time slower than
lea eax, [EDX*4]

bomz · July 07, 2011, 12:08:19 PM

Quote from: bomz on July 07, 2011, 12:06:57 PM
lea eax, [EAX*4] - two time slower than
lea eax, [EDX*4]

Code Select

	mov edx, String
	xor eax, eax
@@:
	movzx ecx, byte ptr[edx]
	sub cl, 48
	jc @F
	lea ebx, [4*EAX+EAX]
	add edx, 1
	lea eax, [2*EBX+ECX]
	jmp @B
@@:

jj2007 · July 07, 2011, 01:23:54 PM

Quote from: bomz on July 07, 2011, 12:06:57 PM
lea eax, [EAX*4] - two time slower than
lea eax, [EDX*4]

That's miraculous :U
Which CPU?

Code Select

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
38      cycles for lea ebx, res=123456789
38      cycles for lea eax, res=123456789

bomz · July 07, 2011, 01:30:55 PM

I continue GOOGLED and can't find any example there use the same register in brackets. Why? may be that if used in lea register which was change in the previous ticks may cause processor pause.

QuoteField   Value
CPU Properties
CPU Type   Intel Pentium 4, 2266 MHz (17 x 133)
CPU Alias   Northwood
CPU Stepping   C1
Instruction Set   x86, MMX, SSE, SSE2
Original Clock   2266 MHz
Min / Max CPU Multiplier   17x / 17x
Engineering Sample   No
L1 Trace Cache   12K Instructions
L1 Data Cache   8 KB
L2 Cache   512 KB (On-Die, ECC, ATC, Full-Speed)

bomz · July 07, 2011, 02:35:49 PM

dedndave · July 07, 2011, 02:59:10 PM

Code Select

Pentium 4 Prescott (2005+), MMX, SSE3
1659 1748 1665 1721 1697
1646 1629 1673 1650 1697
2159 2202 2168 2134 2119

bomz · July 07, 2011, 03:04:19 PM

may cause processor stop - I don't know how it translate correctly and what it's mean too

dedndave · July 07, 2011, 03:09:06 PM

it means the problem is worse on your itanium than on a P4
it is an interesting case
something i will watch for

in a project i am working on, i use...

Code Select

lea edx,[esi+edx+2]
notice - no multiplication, here
i used it because the registers are full - i may juggle things around a bit :P

note:
the empty loop stalls, so you can more or less ignore that set of numbers

News:

LEA