News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

magic number division for 64, 32 and 16 bit

Started by qWord, September 18, 2008, 12:07:32 PM

Previous topic - Next topic

Tedd

Sorry, bad wording on my part.

"INC and DEC will usually cause a stall if the next instruction tests depends on the flags" -- e.g. a conditional jump.

It's an out-of-order execution effect. The conditional instruction can't complete until the state of the flags is known, and partially modifying the flags causes an uncertainty (the full state also depends on the outcome of previous instructions.) So it's not something that can be worked around easily.
ADD and SUB fully modify the flags, so their outcome can be predicted, allowing the jump to be taken before the previous instructions have necessarily completed. So prefer ADD or SUB in place of INC or DEC when the next instruction is conditional.
No snowflake in an avalanche feels responsible.

jj2007

Tedd,

The theory sounds plausible, but the effect is 0.2 cycles on my P4 and 0.0 on my Celeron...

dedndave

i have seen cases where it is advantageous to put an unrelated instruction between the 2
        cmp     eax,5
        mov     SomePlace,SomeThing
        ja      SomeLabel

but, i find this applies to ADD, SUB, CMP, INC, TEST - any instruction where the flags are tested, really

hutch--

I have just finished reading this thread as I was after qWords technique but in relation to the last comment of Ted, Intel have been recommending ADD and SUB over INC and DEC for a long time in optimisation manuals. Often it does not matter but in enough places I have found it faster using their recommendation that INC DEC so I rarely ever use INC DEC any longer.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

don't toss them out altogether
they are great if the next instruction does not examine the flags, such as conditional branches
inc and dec do not alter the carry flag
that is a sometimes a desirable feature
also - inc/dec on a dword register is a single byte   :P

KeepingRealBusy

Quote from: qWord on September 18, 2008, 12:07:32 PM
Hi all,

In the last days I've written an tool for determining the "magic number" for unsigned and signed division of 64, 32 and 16 bit numbers.
The algorithm based on c-source that can be found in AMD's  "Software Optimization Guide".

I've test the program randomly and it looks like (for me  :bg) that it works propper - but there is no guarantee at the moment.
Hope people here can help me to improve the program - so, feel free and post helpful suggestions, criticism and of course failure reports.

In the attached file you found the executable and source.

regards, qWord

-----
28-09-2009: new version attached
02-10-2009: fixed some bugs - new version uploaded

qWord,

Thank you! Thank You! Thank You!

I had been using the old magic number routine and suddenly found that it failed for a value of 100 and so I was faced with resolving what was wrong with the old one, or getting your new one. Your new one works!

Dave.