The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: James Ladd on April 05, 2009, 06:34:51 AM

Title: Find zero/empty slot quickly ...
Post by: James Ladd on April 05, 2009, 06:34:51 AM
I have the need to find a specific slot in an array quickly.

I could just iterate from one end to the other, but I'm wondering if some of you know a faster way, possibly even using MMX etc.

So given a sequence of 32bit values, how can I find the first zero the fastest ?

Rgs, James.

Title: Re: Find zero/empty slot quickly ...
Post by: jj2007 on April 05, 2009, 07:23:28 AM
James,
I don't have the time right now, but the answer is in that inner loop coming from the strlen32s proc in the strlen thread. You will need the  pcmpeqd variants.
lea eax, [edx+16] ; aligned pointer + 16 (first 0..15 dealt with by movups above)

@@: pcmpeqb xmm0, [eax] ; ---- inner loop inspired by Lingo, with adaptions -----
pcmpeqb xmm1, [eax+16] ; compare packed bytes in [m128] and xmm1 for equality
lea eax, [eax+32] ; len counter (moving up lea or add costs 3 cycles for the 191 byte string)
por xmm1, xmm0 ; or them: one of the mem locations may contain a nullbyte
pmovmskb edx, xmm1 ; set byte mask in edx
test edx, edx
jz @B
@@: sub eax, [esp-4] ; subtract original src pointer
Title: Re: Find zero/empty slot quickly ...
Post by: Tedd on April 05, 2009, 03:16:42 PM
Or, just keep track of where the next empty slot is.
Initially it will be the first, then second, and so on as the array fills up.
When you remove an item, it then becomes the next empty slot. The next slot after that will require some search, but only from that point onwards.
(Or even make it a linked list of empty slot pointers - each empty slot points to the next empty slot.)
Title: Re: Find zero/empty slot quickly ...
Post by: James Ladd on April 06, 2009, 07:29:02 AM
Thanks Ted & JJ,

I should have mentioned that the array of 32 bit values can have random slots made empty, so keeping a track of
next free isnt possible. However, Ill think about the free slot pointing to next free slot approach.

Rgs, James.
Title: Re: Find zero/empty slot quickly ...
Post by: Mark Jones on April 09, 2009, 02:14:23 AM
Good to see you're still alive James. :bg

I know it's not what you're looking for, but my vote is for the linked-list. It is powerful and fast. One modification could be made, in that since you'd only be concerned with whether or not a DWORD contains a value or not, then one bit could represent a DWORD in the array. Thus, the entire array "full flags" could fit in 2n bits, where n is the size of the array. (Thus the "list" could be very small in relation to the array.) Of course, this would take a little thinking to implement effectively - just an idea.
Title: Re: Find zero/empty slot quickly ...
Post by: redskull on April 09, 2009, 10:53:12 PM
REP SCASD, Baby!  It's probably not the fastest, but you don't get smaller than a single instruction (some of us still irresponsibly optimize for size)

-ac
Title: Re: Find zero/empty slot quickly ...
Post by: KeepingRealBusy on April 10, 2009, 03:26:14 AM
redskull,

QuoteREP SCASD, Baby!  It's probably not the fastest, but you don't get smaller than a single instruction (some of us still irresponsibly optimize for size)

All depends on whether the array is an array of DWORDS, and if the indexed entry just happens to be an actual value of 0. If the array is an array of QWORDS, a short (<32 bit) QWORD entry would give you a false stop on the second DWORD of the QWORD.

Use a linked list! It only needs a Root DWORD (initially 0). Adding an entry amounts to putting the Root value in the new entry to add, and saving the OFFSET of the new entry in Root. Removing an entry simply gets the Root value and uses it as the entry to use, then moves the first DWORD of that new entry to the Root.

Dave.
Title: Re: Find zero/empty slot quickly ...
Post by: NightWare on April 10, 2009, 04:01:28 AM
Quote from: KeepingRealBusy on April 10, 2009, 03:26:14 AM
a short (<32 bit) QWORD entry would give you a false stop on the second DWORD of the QWORD.
:P you know someone who will read the seconds dwords of a dword*2 list ? scasd should be avoided here, yes, but a dword read is still interesting...
Title: Re: Find zero/empty slot quickly ...
Post by: dedndave on April 10, 2009, 10:33:10 AM
you might keep a long binary value and set a bit whenever a slot is filled
use the low order bit for the first slot
SHR on the binary until the CF is clear
Title: Re: Find zero/empty slot quickly ...
Post by: jj2007 on April 10, 2009, 10:37:57 AM
Have fun :bg

              Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)


findslotSSE2   45095 kilocycles, result: slot 10000000
findslotScasd  47280 kilocycles, result: slot 10000000

findslotSSE2   4536 kilocycles, result: slot 1000000
findslotScasd  4712 kilocycles, result: slot 1000000

findslotSSE2   162 kilocycles, result: slot 100000
findslotScasd  395 kilocycles, result: slot 100000



[attachment deleted by admin]
Title: Re: Find zero/empty slot quickly ...
Post by: Mark Jones on April 10, 2009, 05:30:30 PM
AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ (SSE3)
findslotSSE2   32452 kilocycles, result: slot 10000000
findslotScasd  40024 kilocycles, result: slot 10000000

findslotSSE2   3155 kilocycles, result: slot 1000000
findslotScasd  3905 kilocycles, result: slot 1000000

findslotSSE2   208 kilocycles, result: slot 100000
findslotScasd  300 kilocycles, result: slot 100000
Title: Re: Find zero/empty slot quickly ...
Post by: jj2007 on April 10, 2009, 06:00:46 PM
So far the biggest gain for large slot tables:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

findslotSSE2   22805 kilocycles, result: slot 10000000
findslotScasd  41255 kilocycles, result: slot 10000000

findslotSSE2   2264 kilocycles, result: slot 1000000
findslotScasd  4127 kilocycles, result: slot 1000000

findslotSSE2   88 kilocycles, result: slot 100000
findslotScasd  388 kilocycles, result: slot 100000
Title: Re: Find zero/empty slot quickly ...
Post by: MichaelW on April 11, 2009, 02:04:08 AM
This is the best I could do without SSE2 capability on my P3:

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 16
FindSlot1 proc base:DWORD

    mov edx, [esp+4]
    mov eax, -2
    jmp @F
    align 16
  @@:
    add eax, 2
    mov ecx, -1
    and ecx, [edx+eax*4]
    test ecx, [edx+eax*4+4]
    jnz @B
    test DWORD PTR [edx+eax*4], -1
    jz  @F
    add eax, 1
  @@:
    ret 4

FindSlot1 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef


BTW, this code depends on a zeroed sentinel at the end, and returns a slot index instead of a byte index.

EDIT: To handle even as well as odd length arrays, I think there should be two zeroed sentinels at the end.
Title: Re: Find zero/empty slot quickly ...
Post by: jj2007 on April 11, 2009, 04:02:29 AM
Quote from: MichaelW on April 11, 2009, 02:04:08 AM
This is the best I could do without SSE2 capability on my P3:

Pretty fast, a big improvement over rep scasd:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

Code sizes:
findslotSSE2   66
findslotMW     48
findslotScasd  26

Timings:
findslotSSE2   22036   kilocycles, result: slot 10000000
findslotMW     25308   kilocycles, result: slot 10000000
findslotScasd  40990   kilocycles, result: slot 10000000

findslotSSE2   2194    kilocycles, result: slot 1000000
findslotMW     2534    kilocycles, result: slot 1000000
findslotScasd  4130    kilocycles, result: slot 1000000

findslotSSE2   88      kilocycles, result: slot 100000
findslotMW     152     kilocycles, result: slot 100000
findslotScasd  388     kilocycles, result: slot 100000

Quote

BTW, this code depends on a zeroed sentinel at the end, and returns a slot index instead of a byte index.


Same for the others in the new version.

[attachment deleted by admin]
Title: Re: Find zero/empty slot quickly ...
Post by: Mark Jones on April 11, 2009, 03:54:36 PM
AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ (SSE3)

Code sizes:
findslotSSE2   66
findslotMW     48
findslotScasd  26

Timings:
findslotSSE2   33843   kilocycles, result: slot 10000000
findslotMW     42792   kilocycles, result: slot 10000000
findslotScasd  42301   kilocycles, result: slot 10000000

findslotSSE2   3278    kilocycles, result: slot 1000000
findslotMW     4190    kilocycles, result: slot 1000000
findslotScasd  4097    kilocycles, result: slot 1000000

findslotSSE2   138     kilocycles, result: slot 100000
findslotMW     272     kilocycles, result: slot 100000
findslotScasd  258     kilocycles, result: slot 100000
Title: Re: Find zero/empty slot quickly ...
Post by: redskull on April 11, 2009, 04:54:18 PM

Intel(R) Core(TM)2 Duo CPU     E4500  @ 2.20GHz (SSE4)

Code sizes:
findslotSSE2   66
findslotMW     48
findslotScasd  26

Timings:
findslotSSE2   19536   kilocycles, result: slot 10000000
findslotMW     22199   kilocycles, result: slot 10000000
findslotScasd  44976   kilocycles, result: slot 10000000

findslotSSE2   1968    kilocycles, result: slot 1000000
findslotMW     2220    kilocycles, result: slot 1000000
findslotScasd  4247    kilocycles, result: slot 1000000

findslotSSE2   59      kilocycles, result: slot 100000
findslotMW     116     kilocycles, result: slot 100000
findslotScasd  391     kilocycles, result: slot 100000
Title: Re: Find zero/empty slot quickly ...
Post by: James Ladd on April 12, 2009, 12:43:37 AM
I think it is awesome that someone can post a simple question and get such great responses !
Thanks all.
Title: Re: Find zero/empty slot quickly ...
Post by: jj2007 on April 12, 2009, 06:00:06 PM
Quote from: redskull on April 09, 2009, 10:53:12 PM
REP SCASD, Baby!  It's probably not the fastest, but you don't get smaller than a single instruction (some of us still irresponsibly optimize for size)

Hard to believe, but you can get faster and smaller:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

Code sizes:
findslotSSE2   65
findslotMW     48
findslotJJ     18
findslotScasd  26

Timings:

findslotSSE2   86      cycles/100 slots, slot found at byte 500000
findslotMW     157     cycles/100 slots, slot found at byte 500000
findslotJJ     181     cycles/100 slots, slot found at byte 500000
findslotScasd  398     cycles/100 slots, slot found at byte 500000


James, perhaps it's time to reveal how big your array is... the timings show a strong dependency on whether the array fits into the cache ;-)

EDIT: A bugfix in my non-SSE algo, and 3 cycles less for the SSE2 version.

[attachment deleted by admin]
Title: Re: Find zero/empty slot quickly ...
Post by: PBrennick on April 15, 2009, 04:30:43 AM
The easiest way to find an empty slot in an array is via a mathematical progression. You continually add the current slot to the next slot and then compare the result to the first slot. If they are equal, then the next slot is empty. If not equal you update the array pointer and add again. Do this until the entire array is tested. You can also build an array of pointers to empty slots and when you use one of them you remove it from the second array. This will work for any type of array, ASCII or otherwise.

Paul