News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

IMUL and flags

Started by jj2007, November 08, 2010, 09:25:39 PM

Previous topic - Next topic

jj2007

Here is an algo that yields the share of source expressed in % - a variant of the Masm32 library GetPercent function discussed here.

My problem is that I do not understand why the two options below do not behave identically. If the result of the multiplication does not fit into eax alone, edx will be nonzero, and the carry and overflow flags should be set. That is my understanding of the not so clear documentation.

Where am I wrong?

QuoteGetPercentInt proc source:DWORD, percent:DWORD
   mov eax, [esp+8]   ; source
   imul dword ptr [esp+4]   ; multiply first with percent, result in edx::eax, then div 100
   if 0
      jc @F   ; fails although carry flag should be set if edx!=0
   else
      test edx, edx   ; works fine
      jne @F
   endif
   inc eax
   mov edx, 0a3d70a3dh   ; Agner Fog
   mul edx
   shr edx, 6
   mov eax, edx
   ret 8
@@:
   push 100
   idiv dword ptr [esp]   ; yields correct result but slow
   pop edx
   ret 8
GetPercentInt endp

ToutEnMasm


to made a signed multiply you need to modify the proc as this

GetPercentInt PROTO :Sdword,:Sdword

other suggest ,masm optimize very well this sort of test,you can verify it by used of:

.if ZERO?
...
.endif

redskull

don't you want esp+8 and esp+C, not 4 and 8?  +4 is the return address, which will never fit in just one register

EDIT - on second thought, it probably would.
-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

jj2007

red,
+4 is arg1. The proggie works fine, and is blazing fast, on a P4 faster than the SSE2 version, but I'd like to understand why jc @F does not work.

jj2007

Quote from: ToutEnMasm on November 08, 2010, 09:41:21 PM

to made a signed multiply you need to modify the proc as this

GetPercentInt PROTO :Sdword,:Sdword


Yves, there is no stack frame.

theunknownguy

Quote from: jj2007 on November 08, 2010, 09:25:39 PM
Here is an algo that yields the share of source expressed in % - a variant of the Masm32 library GetPercent function discussed here.

My problem is that I do not understand why the two options below do not behave identically. If the result of the multiplication does not fit into eax alone, edx will be nonzero, and the carry and overflow flags should be set. That is my understanding of the not so clear documentation.

Where am I wrong?

QuoteGetPercentInt proc source:DWORD, percent:DWORD
   mov eax, [esp+8]   ; source
   imul dword ptr [esp+4]   ; multiply first with percent, result in edx::eax, then div 100
   if 0
      jc @F   ; fails although carry flag should be set if edx!=0
   else
      test edx, edx   ; works fine
      jne @F
   endif
   inc eax
   mov edx, 0a3d70a3dh   ; Agner Fog
   mul edx
   shr edx, 6
   mov eax, edx
   ret 8
@@:
   push 100
   idiv dword ptr [esp]   ; yields correct result but slow
   pop edx
   ret 8
GetPercentInt endp

Oh nvm is what ToutEnASM give on its description  :lol


ToutEnMasm

No proto Don't be stopped by this:

Quote
GetPercentInt proc source:SDWORD, percent:SDWORD

Explain for cf and Of flags cleared are here

Quote
Multiplies two signed operands. The number of operands determines the form of the instruction.
If a single operand is specified, the instruction multiplies the value in the specified general-purpose
register or memory location by the value in the AL, AX, EAX, or RAX register (depending on the
operand size) and stores the product in AX, DX:AX, EDX:EAX, or RDX:RAX, respectively.
If two operands are specified, the instruction multiplies the value in a general-purpose register (first
operand) by an immediate value or the value in a general-purpose register or memory location (second
operand) and stores the product in the first operand location.
If three operands are specified, the instruction multiplies the value in a general-purpose register or
memory location (second operand), by an immediate value (third operand) and stores the product in a
register (first operand).
The IMUL instruction sign-extends an immediate operand to the length of the other register/memory
operand.
The CF and OF flags are set if, due to integer overflow, the double-width multiplication result cannot
be represented in the half-width destination register. Otherwise the CF and OF flags are cleared.

Antariy

Quote from: jj2007 on November 08, 2010, 09:49:21 PM
blazing fast, on a P4 faster than the SSE2 versiondoes not work.

Jochen, it is very slow. ??? On your CPU it is faster than others? On mine - it slower that others. integer division very slow anyway.



Alex

jj2007

Quote from: Antariy on November 08, 2010, 09:59:49 PM
Quote from: jj2007 on November 08, 2010, 09:49:21 PM
blazing fast, on a P4 faster than the SSE2 versiondoes not work.

Jochen, it is very slow. ??? On your CPU it is faster than others? On mine - it slower that others. integer division very slow anyway.

Just run the executable in post #1, Alex. It is damn fast.
QuoteIntel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
11      cycles for GetPercentSSE
36      cycles for GetPercent
14      cycles for GetPercent2c
14      cycles for GetPercent2nc
15      cycles for GetPercentJJ1
14      cycles for GetPercentJJ2
12      cycles for GetPercentInt

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
21      cycles for GetPercentSSE
16      cycles for GetPercentInt

@theunknownguy: if 0 means conditional assembly, that's why you don't see it.

theunknownguy

Quote from: jj2007 on November 08, 2010, 10:03:50 PM
Quote from: Antariy on November 08, 2010, 09:59:49 PM
Quote from: jj2007 on November 08, 2010, 09:49:21 PM
blazing fast, on a P4 faster than the SSE2 versiondoes not work.

Jochen, it is very slow. ??? On your CPU it is faster than others? On mine - it slower that others. integer division very slow anyway.

Just run the executable in post #1, Alex. It is damn fast.

@theunknownguy: if 0 means conditional assembly, that's why you don't see it.

Rolf sorry i read it fast (its late here):

7FFFC350 * 0 = CF (0) OF (0)
7FFFC350 * 1 = CF (0) OF (0)
7FFFC350 * >1 = CF (1) OF (1)

The CF and OF flags are set if, due to integer overflow, the double-width multiplication result cannot
be represented in the half-width destination register. Otherwise the CF and OF flags are cleared.


ToutEnASM explain...

Thats all i can do for today lol at this hour i am watching little ASM goblins on my bead  :lol

Antariy

Quote from: jj2007 on November 08, 2010, 10:03:50 PM
Just run the executable in post #1, Alex. It is damn fast.
It fast until result is fit into 32bit after imul, otherwise...

Antariy

Quote from: jj2007 on November 08, 2010, 10:03:50 PM
Just run the executable in post #1, Alex. It is damn fast.

Run it with getting of 50% of the number 85899346 ;)

I got this:

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
22      cycles for GetPercentSSE
49      cycles for GetPercent
25      cycles for GetPercent2c
27      cycles for GetPercent2nc
25      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
53      cycles for GetPercentInt

22      cycles for GetPercentSSE
49      cycles for GetPercent
24      cycles for GetPercent2c
27      cycles for GetPercent2nc
25      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
53      cycles for GetPercentInt

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=-2147483648
41      bytes for GetPercent2c, result=-2147483648
45      bytes for GetPercent2nc, result=6790123
37      bytes for GetPercentJJ1, result=6790123
32      bytes for GetPercentJJ2, result=6790123
37      bytes for GetPercentInt, result=6790122


GetPercent(2c) have nice results.
Note: I have changed only Algo macro, so the CodeSize "result" is not affected.

Antariy

For this short algo I get 42 clocks with nubers mentinioned in previous post:


align 16
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
AxGetPercent proc source:DWORD, percent:DWORD
mov ecx,100
mov eax,[esp+4]
imul dword ptr [esp+8]
idiv ecx
ret 8
AxGetPercent endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef


Edited: code size is something about 19-20 bytes, I guess.



Alex

jj2007

 :bg
AxGetPercent proc source:DWORD, percent:DWORD
AxVersion = 2
if AxVersion eq 1
push 100 ; 17 bytes, 39 cycles, preserves ecx
mov eax, [esp+8]
imul dword ptr [esp+12]
idiv dword ptr [esp]
pop edx
ret 8
elseif AxVersion eq 2
pop ecx ; only 13 bytes, 58 cycles, trashes ecx
pop eax
pop edx
imul edx
push 100
idiv dword ptr [esp]
pop edx
jmp ecx
else
mov ecx, 100 ; 18 bytes, 37 cycles, trashes ecx
mov eax, [esp+4]
imul dword ptr [esp+8]
idiv ecx
ret 8
endif
AxGetPercent endp

Antariy


elseif AxVersion eq 2
pop ecx ; only 13 bytes, 58 cycles, trashes ecx
pop eax
pop edx
push ecx
imul edx
mov ecx,100
idiv ecx
ret
else


~14 bytes, 3 clocks faster  :bg
but still many clocks slower than esp-frame based