News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

LEA

Started by bomz, July 04, 2011, 10:00:07 PM

Previous topic - Next topic

qWord

optimized for speed, but also anti-optimized for data validation  :wink
FPU in a trice: SmplMath
It's that simple!

bomz

speed important , data is bigger than really need. the asm operation system Kolibri need 1 floppy disk. if Microsoft use wider asm all HDD CD DVD Flash .... manufacturers lost it's work
data was important 8086 with segment's

qWord

FPU in a trice: SmplMath
It's that simple!

bomz

Hm.
in masm examples
AsciiBase proc uses  esi InPut:DWORD
;INVOKE     AsciiBase, addr szBuff0

         xor     eax, eax
         mov     esi, InPut
         xor     ecx, ecx
         xor     edx, edx
         mov     al, [esi]
         inc     esi
      .while al != 0
            sub     al, '0'          ; Convert to bcd
            lea     ecx, [ecx+ecx*4] ; ecx = ecx * 5
            lea     ecx, [eax+ecx*2] ; ecx = eax + old ecx * 10
            mov     al, [esi]
            inc     esi
      .endw
         lea     eax, [ecx+edx]     ; Move to eax
         ret

AsciiBase endp

in Izcelion tutorials
   String2Dword proc uses ecx edi edx esi String:DWORD
     LOCAL Result:DWORD

     mov Result,0
     mov edi,String
     invoke lstrlen,String
     .while eax!=0
       xor edx,edx
       mov dl,byte ptr [edi]
       sub dl,"0"
       mov esi,eax
       dec esi
       push eax
       mov eax,edx
       push ebx
       mov ebx,10
       .while esi > 0
         mul ebx
         dec esi
       .endw
       pop ebx
       add Result,eax
       pop eax
       inc edi
       dec eax
     .endw
     mov eax,Result
     ret
   String2Dword endp


Here and data and time...

jj2007

Quote from: bomz on July 05, 2011, 07:31:04 PM
lea edx, String
xor eax, eax
;xor ecx, ecx
@@:
movzx ecx, byte ptr[edx]
sub cl, 48
jc @F
lea eax, [4*EAX+EAX]
add edx, 1
lea eax, [2*EAX+ECX]


any code may be optimized

Looks clever but is 2% slower on my CPU, which is quite a lot. "inc slower than add" is valid for old CPUs only. And as qWord remarked already, it's not so wise to drop the data check.

bomz

my processor do inc slower. so I use add. as I read inc do quicker only 386. may be. about movzx I don't know

dedndave

i am finding that DEC/JZ depends on the surrounding instructions
in some cases, it may be 1 cycle slower
in others, it may be 10 cycles slower
if you put an unrelated instruction in there, it may not be slower at all
if it makes enough difference to keep the loop under 128 bytes, it is probably faster
this is measured on my P4, which is old, nowdays
        dec     ecx
        mov     edx,SomeValue
        jnz     top_of_loop

still, there is something to be said for optimizing on an old P4
there are a lot of them still in use
and code should run faster, in general, on newer machines

bomz

lea eax, [EAX*4] - two time slower than
lea eax, [EDX*4]

bomz

Quote from: bomz on July 07, 2011, 12:06:57 PM
lea eax, [EAX*4] - two time slower than
lea eax, [EDX*4]

mov edx, String
xor eax, eax
@@:
movzx ecx, byte ptr[edx]
sub cl, 48
jc @F
lea ebx, [4*EAX+EAX]
add edx, 1
lea eax, [2*EBX+ECX]
jmp @B
@@:

jj2007

Quote from: bomz on July 07, 2011, 12:06:57 PM
lea eax, [EAX*4] - two time slower than
lea eax, [EDX*4]

That's miraculous :U
Which CPU?

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
38      cycles for lea ebx, res=123456789
38      cycles for lea eax, res=123456789

bomz

I continue GOOGLED and can't find any example there use the same register in brackets. Why? may be that if used in lea register which was change in the previous ticks may cause processor pause.

QuoteField   Value
CPU Properties   
CPU Type   Intel Pentium 4, 2266 MHz (17 x 133)
CPU Alias   Northwood
CPU Stepping   C1
Instruction Set   x86, MMX, SSE, SSE2
Original Clock   2266 MHz
Min / Max CPU Multiplier   17x / 17x
Engineering Sample   No
L1 Trace Cache   12K Instructions
L1 Data Cache   8 KB
L2 Cache   512 KB  (On-Die, ECC, ATC, Full-Speed)

bomz




dedndave

Pentium 4 Prescott (2005+), MMX, SSE3
1659 1748 1665 1721 1697
1646 1629 1673 1650 1697
2159 2202 2168 2134 2119

bomz

may cause processor stop - I don't know how it translate correctly and what it's mean too

dedndave

it means the problem is worse on your itanium than on a P4
it is an interesting case
something i will watch for

in a project i am working on, i use...
        lea     edx,[esi+edx+2]
notice - no multiplication, here
i used it because the registers are full - i may juggle things around a bit   :P

note:
the empty loop stalls, so you can more or less ignore that set of numbers