Print Page - fast find

Title: fast find
Post by: siddys on August 24, 2011, 11:13:14 PM

Hi everyone,
I am trying to speedup the following:

given a vector of values, I need to weed out values that are smaller than a low_limit
and higher than a hi_limit, and then get a new vector with only valid values (i.e ones
in the range). Basically, I am trying to re-implement the find() function of Matlab, for those
of you who are familiar with it. Consider my knowledge in assembly to be very minimal, but
aware :-) I use core duo Intel, that can support uptill SSE4.1.
I was wondering if someone may help me with some basic asm snippets / ideas that
will help me in writing a fast version of the above problem statement.
Thanks,
s.

Title: Re: fast find
Post by: dedndave on August 24, 2011, 11:40:35 PM

a while back, qWord posted some sse code that you may be able to adapt to your needs
we were scanning a string for a few characters, like carriage return, line feed, tab, null

with a little effort, you could use something similar...
http://www.masm32.com/board/index.php?topic=15474.msg134886#msg134886

my knowledge of sse is very limited :P

Title: Re: fast find
Post by: baltoro on August 25, 2011, 01:11:44 AM

SIDDYS,
When you say: "...given a vector of values,..."
What do you mean exactly ???
How is the data represented ???
"In physics, as well as mathematics, a vector is often identified with a tuple, or list of numbers, which depend on some auxiliary coordinate system or reference frame."

Title: Re: fast find
Post by: MichaelW on August 25, 2011, 03:45:56 AM

You could sort the values using some efficient sorting algorithm and then use some sort of binary search to find the target range.

Title: Re: fast find
Post by: siddys on August 25, 2011, 06:25:24 PM

By a vector of values, I mean that the data values are in a 1D array, and stores in byte
aligned contiguous memory locations.

Thanks a lot folks for the tips, I will try them, but please keep the
ideas flowing my way, as and when they come....

sid.

Title: Re: fast find
Post by: dedndave on August 25, 2011, 06:46:19 PM

assuming the array contains signed values...

Code Select

        push    edi
        push    esi
        mov     dh,Xmin
        mov     dl,Xmax
        mov     edi,offset VectorOut
        mov     esi,offset VectorIn
        mov     ecx,sizeof VectorIn

loop00: mov     al,[esi]
        inc     esi
        cmp     al,dl
        jg      loop01

        cmp     al,dh
        jl      loop01

        mov     [edi],al
        inc     edi

loop01: sub     ecx,1
        jnz     loop00

        xchg    eax,edi
        pop     esi
        sub     eax,offset VectorOut
        pop     edi

it's not SSE, and there are probably some speed-ups :P
for instance, if we knew more about the length of the array, you might be able to unroll the loop

Title: Re: fast find
Post by: siddys on August 25, 2011, 07:01:00 PM

Wow, thanks.... seems to be aligned with what I was looking for. Regarding the length
of the array, the maximum size can be 1024 x 1024 elements. The dimensions will always
be power of 2.
Thanks again guys... I should have registered on this group sooner.....
sid.

Title: Re: fast find
Post by: dedndave on August 26, 2011, 01:49:55 AM

well, if it's a power of 2, you can unroll the loop nicely
the above code can be modified to load words, for example and do 2 bytes per pass
the more you unroll the loop, the faster it is likely to be, within certain limitations
it is good to keep the total length of the loop under 128 bytes so that the branch back to the top is SHORT rather than NEAR
also - important to know that memory reads and writes are faster on 32-bit machines if they are aligned dwords

anyways, here is the above loop modified to do 2 bytes per pass

Code Select

        push    edi
        push    esi
        mov     dh,Xmin
        mov     dl,Xmax
        mov     edi,offset VectorOut
        mov     esi,offset VectorIn
        mov     ecx,sizeof VectorIn

loop00: mov     ax,[esi]
        add     esi,2
        cmp     al,dl
        jg      loop01

        cmp     al,dh
        jl      loop01

        mov     [edi],al
        inc     edi

loop01: cmp     ah,dl
        jg      loop02

        cmp     ah,dh
        jl      loop02

        mov     [edi],ah
        inc     edi

loop02: sub     ecx,2
        ja      loop00

        xchg    eax,edi
        pop     esi
        sub     eax,offset VectorOut
        pop     edi

found a typo in the first loop...
was

Code Select

xchg eax,esi
should be

Code Select

xchg eax,edi

The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: siddys on August 24, 2011, 11:13:14 PM