News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

fast find

Started by siddys, August 24, 2011, 11:13:14 PM

Previous topic - Next topic

siddys

Hi everyone,
   I am trying to speedup the following:

given a vector of values, I need to weed out values that are smaller than a low_limit
and higher than a hi_limit, and then get a new vector with only valid values (i.e ones
in the range). Basically, I am trying to re-implement the find() function of Matlab, for those
of you who are familiar with it. Consider my knowledge in assembly to be very minimal, but
aware :-) I use core duo Intel, that can support uptill SSE4.1.
      I was wondering if someone may help me with some basic asm snippets /  ideas that
will help me in writing a fast version of the above problem statement.
Thanks,
s.

dedndave

a while back, qWord posted some sse code that you may be able to adapt to your needs
we were scanning a string for a few characters, like carriage return, line feed, tab, null

with a little effort, you could use something similar...
http://www.masm32.com/board/index.php?topic=15474.msg134886#msg134886

my knowledge of sse is very limited   :P

baltoro

SIDDYS,   
When you say: "...given a vector of values,..."
What do you mean exactly ???
How is the data represented ???
"In physics, as well as mathematics, a vector is often identified with a tuple, or list of numbers, which depend on some auxiliary coordinate system or reference frame."
Baltoro

MichaelW

You could sort the values using some efficient sorting algorithm and then use some sort of binary search to find the target range.
eschew obfuscation

siddys

By a vector of values, I mean that the data values are in a 1D array, and stores in byte
aligned contiguous memory locations.

Thanks a lot folks for the tips, I will try them, but please keep the
ideas flowing my way, as and when they come....

sid.

dedndave

#5
assuming the array contains signed values...
        push    edi
        push    esi
        mov     dh,Xmin
        mov     dl,Xmax
        mov     edi,offset VectorOut
        mov     esi,offset VectorIn
        mov     ecx,sizeof VectorIn

loop00: mov     al,[esi]
        inc     esi
        cmp     al,dl
        jg      loop01

        cmp     al,dh
        jl      loop01

        mov     [edi],al
        inc     edi

loop01: sub     ecx,1
        jnz     loop00

        xchg    eax,edi
        pop     esi
        sub     eax,offset VectorOut
        pop     edi


it's not SSE, and there are probably some speed-ups   :P
for instance, if we knew more about the length of the array, you might be able to unroll the loop

siddys

Wow, thanks.... seems to be aligned with what I was looking for. Regarding the length
of the array, the maximum size can be 1024 x 1024 elements. The dimensions will always
be power of 2.
Thanks again guys... I should have registered on this group sooner.....
sid.

dedndave

well, if it's a power of 2, you can unroll the loop nicely
the above code can be modified to load words, for example and do 2 bytes per pass
the more you unroll the loop, the faster it is likely to be, within certain limitations
it is good to keep the total length of the loop under 128 bytes so that the branch back to the top is SHORT rather than NEAR
also - important to know that memory reads and writes are faster on 32-bit machines if they are aligned dwords

anyways, here is the above loop modified to do 2 bytes per pass
        push    edi
        push    esi
        mov     dh,Xmin
        mov     dl,Xmax
        mov     edi,offset VectorOut
        mov     esi,offset VectorIn
        mov     ecx,sizeof VectorIn

loop00: mov     ax,[esi]
        add     esi,2
        cmp     al,dl
        jg      loop01

        cmp     al,dh
        jl      loop01

        mov     [edi],al
        inc     edi

loop01: cmp     ah,dl
        jg      loop02

        cmp     ah,dh
        jl      loop02

        mov     [edi],ah
        inc     edi

loop02: sub     ecx,2
        ja      loop00

        xchg    eax,edi
        pop     esi
        sub     eax,offset VectorOut
        pop     edi


found a typo in the first loop...
was
        xchg    eax,esi
should be
        xchg    eax,edi