News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Pointer to faster 32-bit signed long divide?

Started by swsnyder, January 21, 2007, 04:28:39 AM

Previous topic - Next topic

swsnyder

 have an app whose bottleneck, on a 32-bit Pentium4, is _lldiv, This is VS2003's standard "long long" division routine.  That is, a signed 64-bit integer is divided by another signed 64-bit integer.

Can anyone point me to a faster routine that does the same thing?  This code will always be run on machines with at least SSE, so SIMD instruction are OK.

I've attached MS's code for comparison.

Thanks.


[attachment deleted by admin]

raymond

If you only need the quotient of the division as the result, it can be handled very easily with floats, and almost as fast as a division by a 32-bit number.
It would be slightly more complicated and a bit slower if you also want the remainder as a 64-bit integer.

Let me know if you are interested in this approach.

Raymond
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

Biterider

Hi
I have found an old snippet with a macro that implements a qword division. It has to be checked, since I'm not sure if it works.

Regards,

Biterider

; Macro:     qqdiv
; Purpose:   Divides 2 unsigned QuadWords. Uses edi, esi.
;            Output:  edx::eax = quotient of division of dividend by divisor
;                     ecx::ebx = remainder of division of dividend by divisor
; Arguments: edx::eax = dividend
;            ecx::ebx = divisor

qqdiv macro
    local @@Exit

    or ecx, ecx             ;;check if ecx is zero
    jnz @F
    qddiv                   ;;Perform qddiv
    xor ecx, ecx            ;;ecx = remainder-hi = zero
    jmp @@Exit
@@:
    push edx                ;;save
    push eax                ;; dividend
    mov esi, ebx            ;;divisor now in
    mov edi, ecx            ;; edi:ebx and ecx::esi
    shr edx, 1              ;;shift both
    rcr eax, 1              ;; divisor and
    ror edi, 1              ;;  and dividend
    rcr ebx, 1              ;;   right by 1 bit
    bsr ecx, ecx            ;;ecx = number of remaining shifts
    shrd ebx, edi, cl       ;;scale down divisor and
    shrd eax, edx, cl       ;;  dividend such that divisor
    shr edx, cl             ;;   less than 2^32 (i.e. fits in ebx)
    rol edi, 1              ;;restore original divisor (edi::esi)
    div ebx                 ;;compute quotient
    pop ebx                 ;;get dividend lo-word
    mov ecx, eax            ;;save quotient
    imul edi, eax           ;;quotient * divisor hi-word (low only)
    mul esi                 ;;quotient * divisor lo-word
    add edx, edi            ;;edx:eax = quotient * divisor
    sub ebx, eax            ;;dividend-lo - (quot.*divisor)-lo
    mov eax, ecx            ;;get quotient
    pop ecx                 ;;restore dividend hi-word
    sbb ecx, edx            ;;subtract divisor * quot. from dividend
    sbb edx, edx            ;;0 if remainder > 0, else FFFFFFFFh
    and esi, edx            ;;nothing to add
    and edi, edx            ;; back if remainder positive
    add ebx, esi            ;;correct remaider
    adc ecx, edi            ;; and quotient if
    add eax, edx            ;;  necessary
    xor edx, edx            ;;clear hi-word of quot (eax<=FFFFFFFFh)
@@Exit:
endm


swsnyder

On further examination (this isn't my code)  I see that that it is only a 32-bit signed quotient that is needed for output:


typedef struct point {
    int32_t x;
    int32_t y;
} point_t;

typedef struct line {
    point_t p1;
    point_t p2;
} line_t;

int32_t compute_x (line_t *line, int32_t y)
{
    int32_t dx = line->p2.x - line->p1.x;
    int64_t ex = (int64_t) (y - line->p1.y) * (int64_t) dx;
    int32_t dy = line->p2.y - line->p1.y;

    return ( line->p1.x + (ex / dy) );
}


So the real work is to divide a 64-bit signed int by a 32-bit signed int and return the 32-bit signed quotient.

In other words, the use of VS2003's _lldiv() is total overkill for this.  The division that so dominates the profiling can be done with a single idiv instruction.

Sigh.  Sorry for wasting everybody's time.