have an app whose bottleneck, on a 32-bit Pentium4, is _lldiv, This is VS2003's standard "long long" division routine. That is, a signed 64-bit integer is divided by another signed 64-bit integer.
Can anyone point me to a faster routine that does the same thing? This code will always be run on machines with at least SSE, so SIMD instruction are OK.
I've attached MS's code for comparison.
Thanks.
[attachment deleted by admin]
If you only need the quotient of the division as the result, it can be handled very easily with floats, and almost as fast as a division by a 32-bit number.
It would be slightly more complicated and a bit slower if you also want the remainder as a 64-bit integer.
Let me know if you are interested in this approach.
Raymond
Hi
I have found an old snippet with a macro that implements a qword division. It has to be checked, since I'm not sure if it works.
Regards,
Biterider
; Macro: qqdiv
; Purpose: Divides 2 unsigned QuadWords. Uses edi, esi.
; Output: edx::eax = quotient of division of dividend by divisor
; ecx::ebx = remainder of division of dividend by divisor
; Arguments: edx::eax = dividend
; ecx::ebx = divisor
qqdiv macro
local @@Exit
or ecx, ecx ;;check if ecx is zero
jnz @F
qddiv ;;Perform qddiv
xor ecx, ecx ;;ecx = remainder-hi = zero
jmp @@Exit
@@:
push edx ;;save
push eax ;; dividend
mov esi, ebx ;;divisor now in
mov edi, ecx ;; edi:ebx and ecx::esi
shr edx, 1 ;;shift both
rcr eax, 1 ;; divisor and
ror edi, 1 ;; and dividend
rcr ebx, 1 ;; right by 1 bit
bsr ecx, ecx ;;ecx = number of remaining shifts
shrd ebx, edi, cl ;;scale down divisor and
shrd eax, edx, cl ;; dividend such that divisor
shr edx, cl ;; less than 2^32 (i.e. fits in ebx)
rol edi, 1 ;;restore original divisor (edi::esi)
div ebx ;;compute quotient
pop ebx ;;get dividend lo-word
mov ecx, eax ;;save quotient
imul edi, eax ;;quotient * divisor hi-word (low only)
mul esi ;;quotient * divisor lo-word
add edx, edi ;;edx:eax = quotient * divisor
sub ebx, eax ;;dividend-lo - (quot.*divisor)-lo
mov eax, ecx ;;get quotient
pop ecx ;;restore dividend hi-word
sbb ecx, edx ;;subtract divisor * quot. from dividend
sbb edx, edx ;;0 if remainder > 0, else FFFFFFFFh
and esi, edx ;;nothing to add
and edi, edx ;; back if remainder positive
add ebx, esi ;;correct remaider
adc ecx, edi ;; and quotient if
add eax, edx ;; necessary
xor edx, edx ;;clear hi-word of quot (eax<=FFFFFFFFh)
@@Exit:
endm
On further examination (this isn't my code) I see that that it is only a 32-bit signed quotient that is needed for output:
typedef struct point {
int32_t x;
int32_t y;
} point_t;
typedef struct line {
point_t p1;
point_t p2;
} line_t;
int32_t compute_x (line_t *line, int32_t y)
{
int32_t dx = line->p2.x - line->p1.x;
int64_t ex = (int64_t) (y - line->p1.y) * (int64_t) dx;
int32_t dy = line->p2.y - line->p1.y;
return ( line->p1.x + (ex / dy) );
}
So the real work is to divide a 64-bit signed int by a 32-bit signed int and return the 32-bit signed quotient.
In other words, the use of VS2003's _lldiv() is total overkill for this. The division that so dominates the profiling can be done with a single idiv instruction.
Sigh. Sorry for wasting everybody's time.