News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Testing an odd number

Started by frktons, August 27, 2010, 09:40:15 AM

Previous topic - Next topic

dedndave

MUL is pretty fast on newer processors
and - this code has a lot of dependancies
        mov     ebx,eax
        shl     eax,2
        add     eax,ebx
        shl     eax,1

not to mention code bytes
in a loop, MUL might be the better choice

also interesting....
on a P4, MUL wants the register multiplier, EAX, and EDX "settled"
it's a good idea to perform some unrelated instruction just before the MUL
IMUL is not so sensitive - it even seems to like it   :P
also probably not true on newer cores

bomz

mul more universal. it multiply not only eax but put result to edx. also it mul not only 10. but in many cases we don't need mul edx and it possible simplify code

bomz

my code is not very correct. command CLD and STD get 48 ticks. PUSHF POPF too more than 50. may be thats true . of course CLD don't need so many ticks, but call some process.

dedndave

yah - those instructions are slow
i think the OS has to check for priviledge level changes or something for POPFD   :P
for the direction flag, it must keep track of the current state because API's expect it to be cleared
PUSHFD isn't too bad, because it does not alter flags
but CLD, STD, and POPFD are slow - about 100 cycles on my machine
CLC and STC are fast, at least

bomz


;std
;stosw
;cld

;mov word ptr[edi], ax
;sub edi, 2

stosw
sub edi, 4

dedndave

 :bg
        mov     [edi],ax

i suspect which is fastest depends on which generation of CPU it's running on

bomz


bomz

I think STOS use special line

bomz

word ptr - seems no differences Pentium 4

STOS slowly

bomz



strange that all "combine" directives slowly. because than cpu execute code it never now the next command....may be my cpu very old


QuoteNEXT:

loop NEXT

slowly than 25%

QuoteNEXT:
sub ecx, 1
cmp ecx, 0
jnz NEXT


Quoteinc eax

slowly than

Quoteadd eax, 1


in my code was interesting situation lea quickly than sub

scasb lodsb movsb stosb cmpsb - slow

dedndave

NEXT:
sub ecx, 1
cmp ecx, 0
jnz NEXT


no need to CMP
SUB will set or clear the zero flag   :P
NEXT:
sub ecx, 1
jnz NEXT


DEC works, too - and is a single-byte instruction for dword general registers
NEXT:
dec ecx
jnz NEXT

bomz