Started by yvansoftware, March 30, 2010, 07:40:20 PM

Quote"coop" and "co-op" stay together within a sorted list

Hi Dave,
You tested the old version. Here is a new one, including a case-insensitive algo.

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
String comparison: short string 10 bytes, long string 5050
3533    cycles for SSE with null check, long string
4507    cycles for SSE with null check, long string, case-insensitive
6926    cycles for Lingo, long string
14194   cycles for crt_strcmp, long string
51312   cycles for crt__stricmp, long string, case-insensitive
33241   cycles for movzx check null, long string
20460   cycles for repe cmpsb, long string
154405  cycles for lstrcmp, long string

40      cycles for SSE with null check, 10 bytes
50      cycles for SSE with null check, 10 bytes, case-insensitive
23      cycles for Lingo, 10 bytes
41      cycles for crt_strcmp, 10 bytes
242     cycles for crt__stricmp, 10 bytes, case-insensitive

Edit: Correct version is - removed, it did not always return correct results. MasmBasic StringsDiffer and FilesDiffer are more stable.


Intel Pentium 4 Prescott CPU 3.00GHz (SSE3)
String comparison: short string 10 bytes, long string 5050
3564    cycles for SSE with null check, long string
4509    cycles for SSE with null check, long string, case-insensitive
72008   cycles for Lingo, long string
14371   cycles for crt_strcmp, long string
50830   cycles for crt__stricmp, long string, case-insensitive
32967   cycles for movzx check null, long string
-44643  cycles for repe cmpsb, long string
808120  cycles for lstrcmp, long string

40      cycles for SSE with null check, 10 bytes
52      cycles for SSE with null check, 10 bytes, case-insensitive
23      cycles for Lingo, 10 bytes
41      cycles for crt_strcmp, 10 bytes
244     cycles for crt__stricmp, 10 bytes, case-insensitive
100     cycles for movzx check null, 10 bytes
123     cycles for repe cmpsb, 10 bytes
740     cycles for lstrcmp, 10 bytes

3542    cycles for SSE with null check, long string
4568    cycles for SSE with null check, long string, case-insensitive
6953    cycles for Lingo, long string
14017   cycles for crt_strcmp, long string
115829  cycles for crt__stricmp, long string, case-insensitive
98018   cycles for movzx check null, long string
85415   cycles for repe cmpsb, long string
809285  cycles for lstrcmp, long string

43      cycles for SSE with null check, 10 bytes
55      cycles for SSE with null check, 10 bytes, case-insensitive
26      cycles for Lingo, 10 bytes
47      cycles for crt_strcmp, 10 bytes
246     cycles for crt__stricmp, 10 bytes, case-insensitive
101     cycles for movzx check null, 10 bytes
125     cycles for repe cmpsb, 10 bytes
745     cycles for lstrcmp, 10 bytes


i get crazy results, Jochen
ok - closed all other applications and ran it 10 times - something is amiss...


i am not sure i understand the Dummy1 var
doesn't that misalign the data strings ?


Yep, the dummies are for misaligning the strings.

Your results are really crazy. You are not by accident running a P4?  :wink
Try changing the invoke Sleep, 100 to invoke Sleep, 200 - on my P4 it helps. And take the updated version with 285 bytes size - you were the only downloader yet, so I exchanged it silently.

here is my result for the new one - lol
30 diff at pos 4999 (zero-based)
looks like you fixed it  :bg  at least i get the same result every time

Oops, it seems I posted the one with the test for correctness activated :red
New one attached above, see


i still get crazy data, JJ


Typical behaviour for a P4 - many outliers. The lowest values are supposed to be the correct ones; for me they are 3600, 4500, 6900 cycles for the first three. Later I will test it on my Celeron.


i dunno - i have never seen numbers like "-69000" before


Celeron is more stable, as usual:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
String comparison: short string 10 bytes, long string 5050
2902    cycles for SSE with null check, long string
3672    cycles for SSE with null check, long string, case-insensitive
5834    cycles for Lingo, long string
13933   cycles for crt_strcmp, long string
16508   cycles for crt__stricmp, long string, case-insensitive
30042   cycles for movzx check null, long string
20108   cycles for repe cmpsb, long string
89881   cycles for lstrcmp, long string

18      cycles for SSE with null check, 10 bytes
25      cycles for SSE with null check, 10 bytes, case-insensitive
16      cycles for Lingo, 10 bytes
32      cycles for crt_strcmp, 10 bytes
108     cycles for crt__stricmp, 10 bytes, case-insensitive
51      cycles for movzx check null, 10 bytes
88      cycles for repe cmpsb, 10 bytes
453     cycles for lstrcmp, 10 bytes


Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
1405    cycles for LingoCMP, 1000 bytes
519     cycles for frktonsCMP , 1000 bytes

--- ok ---

I have the strange idea that this method could be quite fast:

; compare_4bytes
; -------------------------------------------------------------------------------------------------------------
; comparing 2 4 bytes strings the fast way
; Method proposed by frktons
; 16 june 2010 - masmforum

include \masm32\include\



Src1 db "This", 0
Src2 db "This", 0


mov ecx, offset Src1
mov eax, offset Src2
        mov edx, [ecx] ; Comparing two strings in edx and esi
        mov  esi,  [eax]
        xor    edx,  esi
        jnz    end_check
print   "The strings are identical",13,10

        jmp end_of_game


        print "The strings are different",13,10


       inkey chr$(13, 10, "--- ok ---", 13)

end start

If you put this method inside a loop, it could be
nice to see how it compares with the other methods.
To be optimal, the cycle has to be 125 times, and the code should
be modified like:


mov ecx, offset Src1
mov eax, offset Src2
        xor ebx, ebx         
        mov ebx, 125
mov edx, [ecx] ; Comparing two strings in edx and esi
        mov          esi, [eax]
        xor            edx, esi
        jnz             @f
add ecx, 4
        add          eax,4
        mov         edx, [ecx]
        mov         esi, [eax]
        xor           edx, esi
        jnz            @f
        add           ecx, 4
        add           eax, 4
        dec            ebx
        jnz            @b 


If anyone is kind enough to try it and let me know, I'm quite curious.  :P

Being an Assembly beginner, it could be that I've messed up things without
realizing it.  :lol


Mind is like a parachute. You know what to do in order to use it :-)