In my pgm I have to do a lot of divisions by 100.
I use the next instructions :
xor edx,edx
mov eax,myValue
mov ebx,100
div ebx :tdown
I know that it is very slow. So I would like to replace them with a mul or others instructions but I don't know which ?
Could someone :boohoo: help me.
Thanks
Hi
Use this snippet
MagicNumber = 2748779069
mov eax,X
mov edx, MagicNumber
inc eax
mul edx
SHR edx, 6
Use "Magic divider" (coded by The Svin) to generate other divisors .
Regards,
Biterider
Quote from: Biterider on September 29, 2006, 09:15:49 AMUse "Magic divider" (coded by The Svin) to generate other divisors .
Is there a current link to this that's not in russian?
Regards, P1 :8)
Hi,
... not that I know. If you google for "Magic divider" and "The Svin" you can find a link tu a russian page to download the app and the source.
Regards
Biterider
I'm sure I have a copy of it lying around somewhere -- maybe I'll dig it out.
Anyway, it employs a little trick that's mentioned in one of the intel manuals (or it may have been the amd manuals), so if you dig around in there you might be able to find it. I know I've definitely seen it, but as to which manual and which version is another matter; I'd start with the optimisation ones.
Quote from: Tedd on September 29, 2006, 08:21:21 PMI'm sure I have a copy of it lying around somewhere -- maybe I'll dig it out.
One of the Google link had a translate link. I have it now.
Thank Anyway :U
Regards, P1 :8)
One simple way to compute a "Magic Number" for any divisor is the following (using 100 as an example):
.data?
recip100 dd ?
.code
xor eax,eax
mov edx,1
mov ecx,100
div ecx
inc eax
mov recip100,eax
And then, whenever you need to divide any 32-bit integer by 100,
mov eax,whatever
mul recip100
;if your whatever can be positive or negative, use the imul opcode instead of mul
;your truncated answer is in the EDX register
The above will give you an answer just as accurate as with any full 32-bit magic number. And there's no need to worry about how much to shift the multiplication result. (For example, if you should use a full 32-bit magic number to divide by 25, you would have to shift the EDX register by 4 instead of 6 as with the 100 divisor.)
And the above is also more accurate than the suggested code from Biterider which increments the whatever before the multiplication instead of having incremented the magic number itself.
Raymond
The amd optimization guide also contains source code for a magic number generator ;)
not sure if this can outperform the imul instruction ::)
; x * 100
mov eax, x
mov ecx, eax
mov edx, eax
shl eax, 2
shl ecx, 5
shl edx, 6
add eax, ecx
add eax, edx
Edited: Uuups, forget about this post. Divide :dazzled:
Im afraid i dont have the original zip. :red
[attachment deleted by admin]
The problem with these methods is that they do not duplicate the results of DIV over the full range of values. In my tests, using only the divisor 100, the magic number code failed when the dividend reached 4294967295 (FFFFFFFFh), and Raymond's code failed when the dividend reached 1073741899. Agner Fog's DIVIDE_FIXED (reciprocal divisor) procedure duplicates the results of DIV over the full range of values, but at the cost of more code and slower execution (but still much faster than DIV).
[attachment deleted by admin]
Here is a version with source code that I saved. It's by The Svin / Ewayne.
[attachment deleted by admin]
I use this program to calculate the magic. Found the source code in the amd optimization manual, so I guess it has no errors and results in fast code.
Divide by 100:
MOV EAX, 0A3D70A3Dh
MUL dividend
ADD EAX, 0A3D70A3Dh
ADC EDX, 0
SHR EDX, 6
; result in EDX
Edit: I wanted to know the speed difference on my amd:
div: 35
reciprocal multiplication: 0
It's clearly fast.
ps. what's with the timing macro's? (timers.asm) I regularly get zero cycles, and sometimes even -1.
[attachment deleted by admin]
Thanks for everyone.
All of this helps me a lot.
I need to divide by 100 a value between 0 and 255.
Because I need to compute a percentage.
I had done a big array with percentage of all the numbers between 0 and 255.
And for each number I have the result in 100 values.
250 : 1%, 2%, 3%.. 100%
I made this before the post.
Now I have just a multiply and a "magic" divide.
Which is the best method ?
Merci
If you have the memory for it, then a table lookup is going to be faster. However, a large table will possibly have other issues concerning cache..
The question is whether you really need to be calculating percentages as fast as possible. Whether it takes 2 cycles or 10, is it that important? If not, then don't really waste your time on it.
Magic divide by 100 and then multiply should work well.
Multiply first, then divide -- you lose bits the other way :P
Simple example:
{integer maths}
(3 / 2) * 5 = 5
(3 * 5) / 2 = 7
(of course the real answer should be 7.5 in both cases, but you get truncation with intgers)
Yeah, of course. Anyway, don't worry about the extra multiplication, it should run parallel with magic multiplication.
If you are interested in the underlying math of the 'magic numbers', read following article:
"Integer Division Using Reciprocals" by Robert Alverson:
http://tinyurl.com/qc4qr
It is very understandable and not too long.
Kind regards
Eddy
HIME - Huge Integer Math and Encryption library
www.devotechs.com
Quote from: Stan Hebben on October 02, 2006, 07:27:01 PM
ps. what's with the timing macro's? (timers.asm) I regularly get zero cycles, and sometimes even -1.
The problem is that there is no way (that I know of) to isolate the timed instructions from the timing instructions. If you are trying to time a single relatively fast instruction and it executes in parallel with one or more of the timing instructions, then the cycle count for the timed instruction is lost in the timing overhead. And if, in addition, the presence of the timed instruction reduces the timing overhead for the timing loop, relative to the empty reference loop, then you can end up with a small negative cycle count. The newer macros in counter2.zip work better for timing single instructions, but you still cannot get reliable cycle counts for instructions that execute in less than a few cycles. The newer macros also do not require large samples (high loop counts), so the results are more repeatable, and in most cases available almost immediately.