(http://smiles.kolobok.us/he_and_she/girl_impossible.gif)
Somebode use it? Say a few word about LEA.
String2Dword proc uses ecx edi ebx edx esi String:DWORD
mov esi, String
xor eax, eax
xor ecx, ecx
@@:
mov cl, byte ptr[esi]
cmp cl, 0
jz @F
mov ebx, eax
shl eax, 2
add eax, ebx
shl eax, 1
sub cl, 48
add eax, ecx
add esi, 1
jmp @B
@@:
ret
String2Dword endp
.386
.model flat, stdcall
option casemap :none
include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib
.data
form db "Number: %u", 0
String db "4294967295",0
.data?
buffer db 512 dup(?)
.code
start:
lea esi, String
xor eax, eax
xor ecx, ecx
@@:
mov edx, eax
mov cl, byte ptr[esi]
cmp cl, 0
jz @F
lea eax, [4*EAX+EDX] ;1 tick!!!
add esi, 1
lea eax, [2*EAX-48+ECX] ;1 tick!!!
jmp @B
@@:
invoke wsprintf,ADDR buffer,ADDR form,eax
invoke MessageBox,0,ADDR buffer,0,MB_ICONASTERISK
invoke ExitProcess,0
end start
1.5 times quicker in ticks
mov esi, String
lea esi, String
not the same thing
mov esi, String
loads ESI with the dword value at String
lea esi, String
loads ESI with the address of String
mov esi, offset String
loads ESI with the address of String
This I know. I am talking about that LEA may do for 1 tick one shl-shr, two add-sub, and move result to register. I never know about this and not sure how to use. I am tring - it working. but correctly?
may be not for one tick but quickly
lea eax, [2*EAX-48+ECX]
LEA is just a way to make the CPU perform its fancy effective address calculation, i.e. "mov eax, [displacement+base+index*scale]", without actually moving anything in or out of memory. It's most effective when the address requires the fancy calulations; without it, you would have to manually perform the additions and multiplcations using the ADD and MUL commands. If you are just talking about regular, non-stack, non-array, data section values, LEA is equivelent to a MOV OFFSET.
The "trick" is that if you have to perform two additions and a multiply (by the allowed values), you don't have to be "calculating" and address at all; you can use it to calculate any result you care to know, and use it as a "super adder" instruction.
-r
the situation becomes clear
AGI PPlain PMMX -- ????
Quote from: bomz on July 04, 2011, 11:35:53 PM
AGI PPlain PMMX -- ????
Are you really optimizing for P5? Either way, an "Address Generation Interlock" happens when the CPU needs the value of one of the registers to calculate the address, but the result isn't ready. Because P5's use "pairing" and not uOps, you obviously can't execute an instruction that calculates the value of a register at the same time as an instruction (i.e. LEA) that needs that value. New CPU's don't really have it (well, just have it in other forms).
-r
Quote
AGI PPlain PMMX
Are you really optimizing for P5?
(http://smiles.kolobok.us/light_skin/unknw.gif)
I see it working and more effective, but what it is this P5 AGI PPlain PMMX I don't now.
This not working on Pentium 4? I have Pentium 4
lea eax, [2*EAX-48+ECX]
this have sence or leave
mov ebx, eax
shl eax, 2
add eax, ebx
shl eax, 1
sub cl, 48
add eax, ecx
?????????????????????????
Quote from: bomz on July 05, 2011, 12:17:54 AM
I see it working and more effective, but what it is this P5 AGI PPlain PMMX I don't now.
This not working on Pentium 4? I have Pentium 4
P5 - CPU microarchitecture including the original Pentium chip (PPlain) and the Pentium with MMX extensions (PMMX). AGI is the Address Generation Interlock previously described.
Pentium 4 (PIV) is built on the "Netburst" microarchitecture, which also includes the Pentium D. I personally have very little experience with Netburst, so another member would be better equipped to answer the question. I would assume it doesn't, as I believe it uses the same basic uop setup as other P6-based chips.
-r
A few suggestions.
Instead of:
cmp cl, 0
jz @F
Do:
sub cl,48
jc @F
That would exit the conversion with any character having an ascii value lower than "0". If you ever intend that other unknown persons could use your app, you should also add the following for error checking so that conversion would stop with any non-numerical character input:
cmp cl,9
ja @F
Also, instead of:
mov edx, eax
lea eax, [4*EAX+EDX]
You can do:
lea eax,[eax*4+eax]
and, since cl would already be converted to binary, you would only need:
lea eax,[eax*2+ecx]
resulting in a reduction of code size by 4 bytes. :clap:
bytes... I have HDD 500 g and memory 2.5 g. How many it's need's tick's. It's better add 10 mb but do the same 10 times quickly
lea eax, [4*EAX+EAX]
- This work, I didn't know is it possible to use one register if this does not violate the rule that in previous tick's the same register call the proccesor pause. if you need bytes use mul 10
lea eax, [4*EAX+EAX] - my proccessor do this slightly quickly than lea eax, [4*EAX+EDX], and after shl the value of eax don't change
lea edx, String
xor eax, eax
xor ecx, ecx
@@:
mov cl, byte ptr[edx]
sub cl, 48
jc @F
lea eax, [4*EAX+EAX]
add edx, 1
lea eax, [2*EAX+ECX]
for this code enough register's which usually destroy by Windows API, so it don't need push pop in many cases
Strange but there is very little information about this LEA using. as I understand processor have arithmetic part, but command lea do the same - count real address, so we can use it for count some restricted arithmetic operation and it do it quickly because don't change flags for ex. I can't find any mentions about using the same register in operation, but really it work. How it influence - the using the same register which use in previous ticks may call processor stop - unknown. May be this work only on my old P4.
Quote from: raymond on July 05, 2011, 03:22:20 AM
A few suggestions.
...
You can do:
lea eax,[eax*4+eax]
and, since cl would already be converted to binary, you would only need:
lea eax,[eax*2+ecx]
resulting in a reduction of code size by 4 bytes. :clap:
Ray,
That looks damn close to my favourite (read: fastest) ascii to float algo. Here is its first innermost loop:
QuoteIsDot1: inc esi
mov ecx, edx ; first zero, then dotpos, if any
align 8 ; this loop is align 8 by default
.Repeat
movzx ebx, byte ptr [esi] ; much faster than mov bl on P4 and Celeron M
cmp ebx, "."
je IsDot1
cmp ebx, "9" ; faster than cmp bl
ja Done
sub ebx, "0" ; could move up, saves one byte with test ebx, ebx below but is ca. 1% slower
js Done
lea eax, [eax+4*eax] ; *5 - imul much slower
inc edx ; dot pos count
lea eax, [2*eax+ebx] ; *5, plus new byte (...+ebx-48 plus cmp instead of sub: slower on CM)
inc esi
.Until edx>=8 ; zero flag set
dec esi
Done: ...; follows FPU part
[/color]
my P4 do with lea 1.5 quicker than without
QuoteField Value
CPU Properties
CPU Type Intel Pentium 4, 2266 MHz (17 x 133)
CPU Alias Northwood
CPU Stepping C1
Instruction Set x86, MMX, SSE, SSE2
Original Clock 2266 MHz
Min / Max CPU Multiplier 17x / 17x
Engineering Sample No
L1 Trace Cache 12K Instructions
L1 Data Cache 8 KB
L2 Cache 512 KB (On-Die, ECC, ATC, Full-Speed)
lea edx, String
xor eax, eax
;xor ecx, ecx
@@:
movzx ecx, byte ptr[edx]
sub cl, 48
jc @F
lea eax, [4*EAX+EAX]
add edx, 1
lea eax, [2*EAX+ECX]
any code may be optimized (http://smiles.kolobok.us/light_skin/girl_haha.gif)
optimized for speed, but also anti-optimized for data validation :wink
speed important , data is bigger than really need. the asm operation system Kolibri need 1 floppy disk. if Microsoft use wider asm all HDD CD DVD Flash .... manufacturers lost it's work
data was important 8086 with segment's
Data validation (http://en.wikipedia.org/wiki/Data_validation)
Hm.
in masm examples
AsciiBase proc uses esi InPut:DWORD
;INVOKE AsciiBase, addr szBuff0
xor eax, eax
mov esi, InPut
xor ecx, ecx
xor edx, edx
mov al, [esi]
inc esi
.while al != 0
sub al, '0' ; Convert to bcd
lea ecx, [ecx+ecx*4] ; ecx = ecx * 5
lea ecx, [eax+ecx*2] ; ecx = eax + old ecx * 10
mov al, [esi]
inc esi
.endw
lea eax, [ecx+edx] ; Move to eax
ret
AsciiBase endp
in Izcelion tutorials
String2Dword proc uses ecx edi edx esi String:DWORD
LOCAL Result:DWORD
mov Result,0
mov edi,String
invoke lstrlen,String
.while eax!=0
xor edx,edx
mov dl,byte ptr [edi]
sub dl,"0"
mov esi,eax
dec esi
push eax
mov eax,edx
push ebx
mov ebx,10
.while esi > 0
mul ebx
dec esi
.endw
pop ebx
add Result,eax
pop eax
inc edi
dec eax
.endw
mov eax,Result
ret
String2Dword endp
Here and data and time...
Quote from: bomz on July 05, 2011, 07:31:04 PM
lea edx, String
xor eax, eax
;xor ecx, ecx
@@:
movzx ecx, byte ptr[edx]
sub cl, 48
jc @F
lea eax, [4*EAX+EAX]
add edx, 1
lea eax, [2*EAX+ECX]
any code may be optimized (http://smiles.kolobok.us/light_skin/girl_haha.gif)
Looks clever but is 2% slower on my CPU, which is quite a lot. "inc slower than add" is valid for old CPUs only. And as qWord remarked already, it's not so wise to drop the data check.
my processor do inc slower. so I use add. as I read inc do quicker only 386. may be. about movzx I don't know
i am finding that DEC/JZ depends on the surrounding instructions
in some cases, it may be 1 cycle slower
in others, it may be 10 cycles slower
if you put an unrelated instruction in there, it may not be slower at all
if it makes enough difference to keep the loop under 128 bytes, it is probably faster
this is measured on my P4, which is old, nowdays
dec ecx
mov edx,SomeValue
jnz top_of_loop
still, there is something to be said for optimizing on an old P4
there are a lot of them still in use
and code should run faster, in general, on newer machines
lea eax, [EAX*4] - two time slower than
lea eax, [EDX*4]
Quote from: bomz on July 07, 2011, 12:06:57 PM
lea eax, [EAX*4] - two time slower than
lea eax, [EDX*4]
mov edx, String
xor eax, eax
@@:
movzx ecx, byte ptr[edx]
sub cl, 48
jc @F
lea ebx, [4*EAX+EAX]
add edx, 1
lea eax, [2*EBX+ECX]
jmp @B
@@:
Quote from: bomz on July 07, 2011, 12:06:57 PM
lea eax, [EAX*4] - two time slower than
lea eax, [EDX*4]
That's miraculous :U
Which CPU?
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
38 cycles for lea ebx, res=123456789
38 cycles for lea eax, res=123456789
I continue GOOGLED and can't find any example there use the same register in brackets. Why? may be that if used in lea register which was change in the previous ticks may cause processor pause.
QuoteField Value
CPU Properties
CPU Type Intel Pentium 4, 2266 MHz (17 x 133)
CPU Alias Northwood
CPU Stepping C1
Instruction Set x86, MMX, SSE, SSE2
Original Clock 2266 MHz
Min / Max CPU Multiplier 17x / 17x
Engineering Sample No
L1 Trace Cache 12K Instructions
L1 Data Cache 8 KB
L2 Cache 512 KB (On-Die, ECC, ATC, Full-Speed)
(http://xmages.net/storage/10/1/0/b/0/upload/77603a71.png)
(http://xmages.net/storage/10/1/0/b/0/upload/2d6ed4e8.png)
(http://xmages.net/storage/10/1/0/b/0/upload/7dacb40b.png)
Pentium 4 Prescott (2005+), MMX, SSE3
1659 1748 1665 1721 1697
1646 1629 1673 1650 1697
2159 2202 2168 2134 2119
may cause processor stop - I don't know how it translate correctly and what it's mean too
(http://xmages.net/storage/10/1/0/6/e/upload/ca625841.png)
it means the problem is worse on your itanium than on a P4
it is an interesting case
something i will watch for
in a project i am working on, i use...
lea edx,[esi+edx+2]
notice - no multiplication, here
i used it because the registers are full - i may juggle things around a bit :P
note:
the empty loop stalls, so you can more or less ignore that set of numbers
try
lea edx,[edx+esi+2]
(http://xmages.net/storage/10/1/0/c/6/upload/5901a7fa.png)
(http://xmages.net/storage/10/1/0/c/6/upload/d81509ca.png)
(http://xmages.net/storage/10/1/0/c/6/upload/871beed3.png)
not use eax in lea
*
That's interesting.
Q6600:
QuoteCore Duo (2006+), MMX, SSE3
1033 1033 1034 1032 1033
1030 1032 1030 1032 1031
1031 1031 1032 1028 1032
code must consider old processor's first
not everyone will agree with you on that :P
there are good and bad things, either way
mov<movzx ~2
Quote from: bomz on July 08, 2011, 12:35:17 PM
mov<movzx ~2
apples <> oranges
using movzx avoid problems with partial register accesses.
You will find this information in both, Intel's and AMD's optimization manuals.
xor ecx, ecx
@@:
mov cl, byte ptr [edx]
quickly even if string 1 sing
bomz,
On most recent processors MOVZX is faster than XOR / MOV CL, [EDX]. You need to go back to a PIII to see MOVZX slower.
I would back, but I have P4
xor do 1 time, mov each cycle
In my tests on my P3 movzx is ~1.5x faster.
;==============================================================================
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
;==============================================================================
printf MACRO format:REQ, args:VARARG
IFNB <args>
invoke crt_printf, cfm$(format), args
ELSE
invoke crt_printf, cfm$(format)
ENDIF
EXITM <>
ENDM
;==============================================================================
.data
x db 100 dup(0)
.code
;==============================================================================
start:
;==============================================================================
mov esi, OFFSET x
invoke Sleep, 3000
REPEAT 3
counter_begin 1000, HIGH_PRIORITY_CLASS
mov edi, 16
@@:
sub edi, 1
jnz @B
counter_end
printf( "%d cycles, loop only\n", eax )
counter_begin 1000, HIGH_PRIORITY_CLASS
mov edi, 16
xor eax, eax
xor ebx, ebx
xor ecx, ecx
xor edx, edx
@@:
mov al, BYTE PTR [esi+edi]
mov bl, BYTE PTR [esi+edi+1]
mov cl, BYTE PTR [esi+edi+2]
mov dl, BYTE PTR [esi+edi+3]
sub edi, 1
jnz @B
counter_end
printf( "%d cycles, xor + mov byte ptr\n", eax )
counter_begin 1000, HIGH_PRIORITY_CLASS
mov edi, 16
@@:
movzx eax, BYTE PTR [esi+edi]
movzx ebx, BYTE PTR [esi+edi+1]
movzx ecx, BYTE PTR [esi+edi+2]
movzx edx, BYTE PTR [esi+edi+3]
sub edi, 1
jnz @B
counter_end
printf( "%d cycles, movzx\n\n", eax )
ENDM
inkey "Press any key to exit..."
exit
;==============================================================================
end start
37 cycles, loop only
85 cycles, xor + mov byte ptr
68 cycles, movzx
37 cycles, loop only
85 cycles, xor + mov byte ptr
68 cycles, movzx
37 cycles, loop only
85 cycles, xor + mov byte ptr
68 cycles, movzx
xor do 1 time, mov each cycle
OK, so movzx is still ~1.5x faster.
mov (http://i034.radikal.ru/1107/98/9e069a538617.gif)
i had a case the other day.....
EDX was already 0 as a result of code at the bottom of the loop
i only had to zero it in loop init code
at the top of the loop, i needed the byte in CH extended to a dword, so...
mov dl,ch
it was faster to do this....
movzx edx,ch
My Celeron doesn't care...
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
48 cycles for REP100 mov
46 cycles for REP100 zx
172 cycles for 100*mov (with dec ecx & jne)
171 cycles for 100*zx
my results was wrong I use
dedndave tocks movzx also faster
QuoteItanium 2 (2002+), MMX, SSE2
1234 1237 1234 1231 1232 - empty cycle
1839 1835 1846 1845 1833 - mov
1533 1534 1536 1538 1530 - movzx
Press any key to continue ...
QuoteItanium 2 (2002+), MMX, SSE2
1238 1231 1234 1239 1235 - empty cycle
1535 1535 1537 1531 1540 - sub cl, 48
1560 1558 1564 1553 1563 - sub ecx, 48
Press any key to continue ...
QuoteItanium 2 (2002+), MMX, SSE2
1233 1228 1223 1231 1230 - empty
4021 4021 4020 4021 4020 - shl edx, 3
2114 2109 2114 2112 2113 - lea edx,[eax*8]
Press any key to continue ...
Itanium 2 (2002+), MMX, SSE2
1228 1232 1225 1227 1226 - empty
4021 4021 4020 4020 4020 - shl edx, 3
5027 5026 5027 5027 5027 - lea eax,[eax*8]
Press any key to continue ...
Itanium 2 (2002+), MMX, SSE2
1231 1234 1236 1222 1234 - empty
4021 4021 4020 4020 4020 - shl edx, 2
2112 2106 2116 2113 2115 - lea edx,[eax*4]
Press any key to continue ...
QuoteItanium 2 (2002+), MMX, SSE2
1232 1231 1226 1229 1229 - empty
2031 2031 2028 2031 2027 - lea edi, [2*EAX-1]
6031 6030 6030 6030 6031 - shl edi, 1; sub edi, 1
Press any key to continue ...
Quote from: bomz on July 08, 2011, 11:11:41 PM
Itanium 2 (2002+), MMX, SSE2
wow! - Itanium 2 server or workstation as home PC?
Strange...
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
190 cycles for shl eax, 2
196 cycles for 100*lea eax, [4*eax]
205 cycles for 100*lea edx, [4*eax]
190 cycles for shl eax, 3
195 cycles for 100*lea eax, [8*eax]
205 cycles for 100*lea edx, [8*eax]
Quote from: qWord on July 08, 2011, 11:21:15 PM
Quote from: bomz on July 08, 2011, 11:11:41 PM
Itanium 2 (2002+), MMX, SSE2
wow! - Itanium 2 server or workstation as home PC?
Pentium 4 - 9 years old I don't know why program recognize it like Itanium
nothing strange
lea MAY call processor stop (pause) if use register which changed in previous ticks - not good translation
Quote from: bomz on July 08, 2011, 11:30:01 PM
nothing strange
lea MAY call processor stop (pause) if use register which changed in previous ticks - not good translation
the processor never stops :wink
Also: the test has no validity - the advantage (e.g. no flag stalls) of LEA may be only visible in context of a (sensfull) instruction-sequence.
I don't know how translate and can't find about LEA in English. In Russian use word which I can translate ~ stop or pause
mabye: wait ?
stop + pause + wait /3
1. stop
2. halt
3. halting
4. arrester
5. dog
6. lock
may be LOCK the best. or HALT
and "in previous ticks" use in sense in previous wave - oscillation
I am not programmer, my English very poor and I don't know special Computer English, and this back translation eng-rus-eng
"Stall". Though it keeps on executing other stuff
google says Stall is lose speed.
redskull you give more prof comment you must know what call the same register without my back translation. strange I can't find anything in english about lea. in russian only few word's and this word is same on different sites
at first 'at previous oscillation' I understand like 'in previous command' so I do:
Quotelea eax, [4*EAX+EAX]
add edx, 1
lea eax, [2*EAX+ECX]
but now I see that
lea eax, [2*EAX+ECX] also in previous
bomz,
LEA was only slow on a PIV, in some instances it was faster to do 2 adds than 1 LEA and this is from the Intel manual. Earlier and later Intel hardware were faster with LEA in most instances.
The main - how use LEA right. Now I'm measured my last code in dedndave ticks count program and optimize it
QuoteItanium 2 (2002+), MMX, SSE2
1233 1227 1229 1235 1230 - empty
8872 8886 8885 8897 8877 - mov eax, value; mov value, eax
14120 14121 14118 14119 14119 - push value; pop value
Press any key to continue ...
bomz,
Try a few of these and you can experiment with more.
lea eax, [eax+eax] ; *2
lea eax, [eax+eax*2] ; *3
lea eax, [eax*4] ; *4
lea eax, [eax+eax*4] ; *5
lea eax, [eax+eax*2] ;
lea eax, [eax+eax] ; *6
mov ecx, eax
lea eax, [eax*8] ; *7
sub eax, ecx
lea eax, [eax*8] ; *8
lea eax, [eax+eax*8] ; *9
lea eax, [eax+eax*4] ;
lea eax, [eax+eax] ; *10
lea eax, [eax+eax*2] ;
lea eax, [eax*4] ; *12
lea eax, [eax+eax*4] ;
lea eax, [eax+eax*2] ; *15
lea eax, [eax*4] ;
lea eax, [eax*4] ; *16
lea eax, [eax+eax*8] ;
lea eax, [eax+eax] ; *18
QuoteItanium 2 (2002+), MMX, SSE2
1231 1231 1229 1226 1229 - empty
5027 5028 5026 5027 5027 - lea eax, [eax+eax*2]; lea eax, [eax+eax]
5027 5027 5027 5027 5027 - lea edx, [eax+eax*2]; lea eax, [edx+edx]
Press any key to continue ...
Itanium 2 (2002+), MMX, SSE2
1228 1234 1231 1231 1230 - empty
5027 5028 5027 5027 5027 - lea eax, [eax+eax*2]; lea eax, [eax+eax]
5033 5032 5033 5032 5032 - mov edx, eax; lea edx, [edx+eax*2] ;lea eax, [edx+edx] ; *6
Press any key to continue ...
Itanium 2 (2002+), MMX, SSE2
1228 1226 1227 1227 1231 - empty
5028 5028 5028 5026 5026 - lea eax, [eax+eax*2]; lea eax, [eax+eax]
5663 5662 5659 5665 5663 - mov ecx, eax; lea edx, [ecx+eax*2]; mov ecx, edx; lea eax, [ecx+edx]
Press any key to continue ...
Itanium 2 (2002+), MMX, SSE2
1234 1228 1228 1234 1229 - empty
5028 5028 5027 5026 5027 - lea eax, [eax+eax*2]; lea eax, [eax+eax]
10064 10064 10065 10064 10064 - mov edx, eax; shl eax, 1; add eax, edx; shl eax, 1
Press any key to continue ...
Itanium 2 (2002+), MMX, SSE2
1229 1226 1230 1229 1229 - empty
5028 5028 5028 5027 5027 - lea eax, [eax+eax*2]; lea eax, [eax+eax]
9067 9067 9068 9069 9068 - lea edx, [eax+eax*2]; lea eax, [edx*2]
Press any key to continue ...
(http://smiles.kolobok.us/light_skin/unknw.gif)
QuoteItanium 2 (2002+), MMX, SSE2
1236 1232 1253 1225 1234 - empty
1497 1511 1496 1494 1502 - lea edx,[4*eax+eax]
3997 3995 3997 3996 3996 - lea eax,[4*eax+eax]
Press any key to continue ...
Itanium 2 (2002+), MMX, SSE2
1226 1226 1234 1229 1235 - empty
8053 8056 8057 8054 8053 - lea eax, [4*EAX+EAX]; add edx, 1; lea eax, [2*EAX+ECX]
8052 8051 8051 8051 8049 - lea ebx, [4*EAX+EAX]; add edx, 1; lea eax, [2*EBX+ECX]
Press any key to continue ...
QuoteItanium 2 (2002+), MMX, SSE2
1227 1231 1228 1238 1233 - empty
8054 8056 8053 8054 8053 - lea eax, [4*EAX+EAX]; lea eax, [4*EAX+EAX]
2109 2111 2111 2113 2112 - lea eax, [4*EBX+ECX]; lea eax, [4*EDX+EDI]
Press any key to continue ...
(http://smiles.kolobok.us/light_skin/swoon.gif)
QuoteItanium 2 (2002+), MMX, SSE2
1233 1225 1238 1226 1226 - empty
8058 8064 8059 8055 8054 - lea eax, [4*EAX+EAX]; lea eax, [2*EAX+ECX]
8052 8049 8052 8055 8049 - lea eax, [4*EAX+EAX]; lea ebx, [esi+edi]; lea eax, [2*EAX+ECX]
Press any key to continue ...
QuoteItanium 2 (2002+), MMX, SSE2
1233 1224 1232 1230 1239 - empty
8054 8057 8057 8054 8054 - lea eax, [4*EAX+EAX]; lea eax, [2*EAX+ECX]
8051 8053 8060 8051 8054 - lea eax, [4*EAX+EAX]; lea ebx, value; lea eax, [2*EAX+ECX]
Press any key to continue ...
QuoteItanium 2 (2002+), MMX, SSE2
1229 1236 1231 1232 1239 - empty
8056 8057 8055 8055 8054 - lea eax, [4*EAX+EAX]; lea eax, [2*EAX+ECX]
9058 9058 9058 9058 9058 - lea ebx, [4*EAX+EAX]; mov edx, ebx; lea eax, [2*EDX+ECX] (http://s19.radikal.ru/i192/0912/a4/38f1287302ef.gif)
Press any key to continue ...
(http://smiles.kolobok.us/light_skin/scratch_one-s_head.gif)
bomz,
An Itanium 2 is not a native x86 architecture, are you running an Itanium in x86 emulation mode ?
QuoteItanium 2 (2002+), MMX, SSE2
1228 1228 1229 1231 1230 - empty
8054 8057 8055 8053 8053 - lea eax, [4*EAX+EAX]; lea eax, [2*EAX+ECX]
4034 4034 4034 4034 4034 - lea eax, [4*EAX+EAX]; mov ebx, eax; lea ebx, [2*EBX+ECX]
Press any key to continue ...
(http://smiles.kolobok.us/light_skin/dance4.gif)
I have Pentium 4. Program recognize it like Itanium I don't know why
(http://xmages.net/storage/10/1/0/6/c/upload/d6e953ff.png)
QuoteItanium 2 (2002+), MMX, SSE2
1227 1234 1230 1230 1226 - empty
8062 8070 8062 8063 8114 - lea eax, [4*EAX+EAX]; lea eax, [2*EAX+ECX]
9063 9062 9058 9059 9057 - lea eax, [4*EAX+EAX]; mov ebx, eax; lea eax, [2*EBX+ECX]
Press any key to continue ...
I am not already catch regularity
Quotelea ebx, [4*EAX+EAX]
mov edx, ebx
lea ebx, [2*EDX+ECX]
- 3000
Your OS is giving the correct processor ID, its a Northwood core PIV, not an Itanium. An Itanium is a 64 bit native processor with a different instruction set. They are mainly useful for large processor count parallel processing. I still have a 2.8 gig Northwood processor running and they were a bit easier to code for than the later Prescott cores.
I begin testing all code string-to-dword. there is no difference use lea or not. any combination of registers gives the same result to convert 100 to 100 need about 18000 program ticks
Quoteconvert 100 to 100 need about 18000 program ticks
ouch !!!!
bomz - don't be concerned about the improper CPU identification
it is an old version of Jochen's ShowCpu function :P
need 18000 p ticks for doing it 1000 times :lol
lea ebx, [4*EAX+EAX]
mov edx, ebx
lea ebx, [2*EDX+ECX] this need 3000 but when put in s2w cycle it became 8000
Quote from: dedndave on July 09, 2011, 03:46:23 PM
bomz - don't be concerned about the improper CPU identification
it is an old version of Jochen's ShowCpu function :P
Bomz, what CPU does it show with the attachment of reply #50?
with the attachment of reply #50? - ???????????????
reply #50...
http://www.masm32.com/board/index.php?topic=16988.msg141804#msg141804
(http://xmages.net/storage/10/1/0/2/3/upload/a6344494.png)
Quote lea ebx, [4*EAX+EAX]
mov edx, ebx
lea ebx, [2*EDX+ECX] - 3000
Quote lea ebx, [4*EAX+EAX]
mov edx, ebx
lea ebx, [2*EDX+ECX]
mov eax, ebx - 10000
(http://smiles.kolobok.us/light_skin/vava.gif)
lea edx,[4*eax+eax]
lea ebx,[2*edx+ecx]
mov eax,ebx
it would help if you could place unrelated instructions in between..
lea edx,[4*eax+eax]
mov esi,offset SomeData
lea ebx,[2*edx+ecx]
mov edi,offset SomeOtherData
mov eax,ebx
I try LEA NOP MOV ADD. (lea ebx, [ebx])
this method don't still register's for cycle
Quote from: bomz on July 09, 2011, 08:51:04 PM
I try LEA NOP MOV ADD.
you may try to test a useful algorithm!
(http://i071.radikal.ru/0903/41/dfb69c20a179.gif)(http://s02.radikal.ru/i175/0911/08/609553aad438.gif)
may be somebody with good or native English write latter to Intel?
Quote from: bomz on July 09, 2011, 09:51:26 PMmay be somebody with good or native English write latter to Intel?
and whats the message for Intel?
(http://s39.radikal.ru/i083/0811/2e/02c543209105.gif) how lea work for arithmetic
Quote from: bomz on July 09, 2011, 10:27:55 PM
(http://s39.radikal.ru/i083/0811/2e/02c543209105.gif) how lea work for arithmetic
strange question in context to your previous posts - However, you should read Intel's Documentation (http://www.intel.com/products/processor/manuals/) before asking.
if information about LEA was known I will find it with Google.
Quote from: bomz on July 09, 2011, 09:51:26 PM
(http://i071.radikal.ru/0903/41/dfb69c20a179.gif)(http://s02.radikal.ru/i175/0911/08/609553aad438.gif)
may be somebody with good or native English write letter to Intel?
Hi Bomz,
I admire your icons, they are really cute and funny. Do you create them yourself?
:U
http://s50.radikal.ru/i130/0908/bb/dd7c8a96f9a4.gif
http://s48.radikal.ru/i122/0908/2f/75b2c3fd84e1.gif
http://s45.radikal.ru/i109/0908/0a/595bfd0094d3.gif
http://s61.radikal.ru/i171/0908/ea/31a906ad77e3.gif
Only this I do from other's.
http://www.kolobok.us/ Home Page
http://www.en.kolobok.us/
https://addons.mozilla.org/ru/firefox/addon/kolobok-smiles-for-firefox/ Mozillla addon
http://s61.radikal.ru/i172/1107/09/eadc51a23f19.gif
http://www2.cbox.ws/box/?boxid=1920984&boxtag=4466&sec=smilies
http://www.en.kolobok.us/download.php?view.15
Support:
KOLOBOK Smiles for Firefox
KOLOBOK Smiles for Opera 9
KOLOBOK Smiles for Internet Explorer
KOLOBOK Smiles for Google Chrome
I haven't looked at the actual code, but anytime you use the same registers for input and output, you'll get a dependency chain. Basically, it can't start the second lea until it knows the output from the first, and can't start the third until its done second, all the way up to 100. With different registers, the value of eax doesn't change, it doesn't have to wait on anything to do any of the lea instructions. But again, that's based on just a cursory look at the results.
-r
Quote from: redskull on July 10, 2011, 11:55:37 AM
I haven't looked at the actual code, but anytime you use the same registers for input and output, you'll get a dependency chain. Basically, it can't start the second lea until it knows the output from the first
red,
that sounds plausible, and bomz' P4 behaves like that. My Celeron M, in contrast, couln't care less which regs are involved :bg
Quote from: jj2007 on July 10, 2011, 03:28:37 PM
that sounds plausible, and bomz' P4 behaves like that. My Celeron M, in contrast, couln't care less which regs are involved :bg
Like hutch said, breaking the LEA down into additions and multiplcations is probably what does it. Per the timings, an M only has a latency of 1 for LEA, but the PIV has a latency of 4.
-r
Quote from: redskull on July 10, 2011, 11:55:37 AM
I haven't looked at the actual code, but anytime you use the same registers for input and output, you'll get a dependency chain. Basically, it can't start the second lea until it knows the output from the first, and can't start the third until its done second, all the way up to 100. With different registers, the value of eax doesn't change, it doesn't have to wait on anything to do any of the lea instructions. But again, that's based on just a cursory look at the results.
-r
and how break this chain?
Quote lea ebx, [4*EAX+EAX]
mov edx, ebx
lea ebx, [2*EDX+ECX] - 3000
Quote lea ebx, [4*EAX+EAX]
mov edx, ebx
lea ebx, [2*EDX+ECX]
mov eax, ebx - 10000
This LEA using for matrix?
sometimes, you can't :(
sometimes, you can interweave instructions for one operation with another