What makes fail this code

untio · May 24, 2010, 02:25:11 PM

Hi,
I have an opensource project named 'duplicadoseraser2' that searches for duplicated files and it has been some time buggy. It is written in CPP but, until yesterday it had got the next assembly code, with others one, that compares two arrays of bytes:

Code Select

arraysofbytesarediferent proc c uses ecx edi esi, pointer1:DWORD, pointer2:DWORD, len:DWORD
mov esi, pointer1
mov edi, pointer2
mov ecx, len
xor eax, eax
cld
repe cmpsb
setne al
ret
arraysofbytesarediferent endp

This procedure should return 1 if the two arrays have got some different value. I have read this code an I think that is right. If it is not right, can you tell me were is the error.
And in the case that it is right, the problem may be that it is receiving badly the parameters. May be it needs to receive a far pointer. If this is the case, how can I implement the process of the far pointers.

Thanks in advance.

dedndave · May 24, 2010, 02:57:29 PM

it looks like it should work fine
you shouldn't need to use far pointers

qWord · May 24, 2010, 03:08:31 PM

In terms of speed this one is bad. Using the runtime library would surly more efficient.
You may want to switch to SSEx instructions, which allow to compare 16 Bytes per instruction.

hutch-- · May 24, 2010, 03:12:43 PM

It will work if the files are 2 gig or under but Windows NTFS suports files larger than 4 gig. If both will fit into 32 bit memory and straight memory compare will do the job but if the files are large you will need to have both files open and read the same size chunks from each doing the comparisons on the chunks.

You may be able to multithread the code by doing the file input in one thread while doing the comparison in the other.

untio · May 25, 2010, 02:05:12 PM

Hi and thanks for your answers,
I'll explain more: this code was in its own module. I compiled it with masm and the cpp code with gcc and I linked it with gcc linker. But it always returned FALSE (I think that false equals to zero). The program is opensource. You can find it at sourceforge. Now it is completely written in cpp.
And, when I did optimize the code in size or in speed the application crashed. May be they are fully incompatible, gcc and masm?

Thanks in advance.

hutch-- · May 25, 2010, 02:10:47 PM

Vaguely I remembber a few different versions of COFF, Microsoft spec their own but I think there are others that are not full compatible. With GCC I would tend to write the code in Intel mode GAS to be safe.

untio · May 25, 2010, 02:17:47 PM

Hi and thank you once more,
I'll try to rewrite it in gas, but now I have another doubt. If I write a dll in masm, it is safe to use it with any other compiler. Because, now, I have a lot of doubts.

And another question about mmx and fpu. Can you recommend me a good book to learn mmx and fpu programming.

Thanks in advance.

clive · May 25, 2010, 02:51:31 PM

The code you want to replace/optimize?

Code Select

BOOL arraydebytessondistintos(BYTE * pprimero, BYTE *psegundo, ULONG len)
{
    ULONG x;
    BYTE *prim;
    BYTE *sec;
    prim = pprimero;
    sec = psegundo;
    for(x = len; x > 0; x--)
    {
        if(*prim != *sec)
            return TRUE;
        ++prim;
        ++sec;
    }
    return FALSE;
}

Looks to have a standard calling convention, though the code is hideously inefficient, do you have optimization enabled?

Code Select

0040625A                    fn_0040625A:                ; Xref 0040612E
0040625A 55                     push    ebp
0040625B 89E5                   mov     ebp,esp
0040625D 83EC10                 sub     esp,10h
00406260 8B4508                 mov     eax,[ebp+8]
00406263 8945F8                 mov     [ebp-8],eax
00406266 8B450C                 mov     eax,[ebp+0Ch]
00406269 8945F4                 mov     [ebp-0Ch],eax
0040626C 8B4510                 mov     eax,[ebp+10h]
0040626F 8945FC                 mov     [ebp-4],eax
00406272                    loc_00406272:               ; Xref 004062A0
00406272 837DFC00               cmp     dword ptr [ebp-4],0
00406276 742A                   jz      loc_004062A2
00406278 8B45F8                 mov     eax,[ebp-8]
0040627B 0FB610                 movzx   edx,byte ptr [eax]
0040627E 8B45F4                 mov     eax,[ebp-0Ch]
00406281 0FB600                 movzx   eax,byte ptr [eax]
00406284 39C2                   cmp     edx,eax
00406286 7409                   jz      loc_00406291
00406288 C745F001000000         mov     dword ptr [ebp-10h],1
0040628F EB18                   jmp     loc_004062A9

00406291                    loc_00406291:               ; Xref 00406286
00406291 8D45F8                 lea     eax,[ebp-8]
00406294 FF00                   inc     dword ptr [eax]
00406296 8D45F4                 lea     eax,[ebp-0Ch]
00406299 FF00                   inc     dword ptr [eax]
0040629B 8D45FC                 lea     eax,[ebp-4]
0040629E FF08                   dec     dword ptr [eax]
004062A0 EBD0                   jmp     loc_00406272

004062A2                    loc_004062A2:               ; Xref 00406276
004062A2 C745F000000000         mov     dword ptr [ebp-10h],0
004062A9                    loc_004062A9:               ; Xref 0040628F
004062A9 8B45F0                 mov     eax,[ebp-10h]
004062AC C9                     leave
004062AD C3                     ret

C doesn't modify the callers parameters, so it is safe to use the parameters directly as they are thrown away.

Code Select

BOOL arraydebytessondistintosB(BYTE * prim, BYTE *sec, ULONG len)
{
    while(len--)
    {
        if(*prim++ != *sec++)
            return TRUE;
    }
    return FALSE;
}

Here is the code Microsoft C++ generates

Code Select

Disassembly

00000050                    _arraydebytessondistintos:
00000050 55                     push    ebp
00000051 8BEC                   mov     ebp,esp
00000053 8B4D10                 mov     ecx,[ebp+10h]
00000056 8B4508                 mov     eax,[ebp+8]
00000059 53                     push    ebx
0000005A 56                     push    esi
0000005B 85C9                   test    ecx,ecx
0000005D 8BF1                   mov     esi,ecx
0000005F 7612                   jbe     loc_00000073
00000061 8B4D0C                 mov     ecx,[ebp+0Ch]
00000064 2BC8                   sub     ecx,eax
00000066                    loc_00000066:               ; Xref 00000071
00000066 8A10                   mov     dl,[eax]
00000068 8A1C01                 mov     bl,[ecx+eax]
0000006B 3AD3                   cmp     dl,bl
0000006D 750A                   jnz     loc_00000079
0000006F 40                     inc     eax
00000070 4E                     dec     esi
00000071 75F3                   jnz     loc_00000066
00000073                    loc_00000073:               ; Xref 0000005F
00000073 5E                     pop     esi
00000074 33C0                   xor     eax,eax
00000076 5B                     pop     ebx
00000077 5D                     pop     ebp
00000078 C3                     ret

00000079                    loc_00000079:               ; Xref 0000006D
00000079 5E                     pop     esi
0000007A B801000000             mov     eax,1
0000007F 5B                     pop     ebx
00000080 5D                     pop     ebp
00000081 C3                     ret

00000082 90909090909090909090.. db      14 dup (090h)

00000090                    _arraydebytessondistintosB:
00000090 55                     push    ebp
00000091 8BEC                   mov     ebp,esp
00000093 56                     push    esi
00000094 57                     push    edi
00000095 8B7D10                 mov     edi,[ebp+10h]
00000098 8BC7                   mov     eax,edi
0000009A 4F                     dec     edi
0000009B 85C0                   test    eax,eax
0000009D 7417                   jz      loc_000000B6
0000009F 8B450C                 mov     eax,[ebp+0Ch]
000000A2 8B7508                 mov     esi,[ebp+8]
000000A5                    loc_000000A5:               ; Xref 000000B4
000000A5 8A10                   mov     dl,[eax]
000000A7 8A0E                   mov     cl,[esi]
000000A9 40                     inc     eax
000000AA 46                     inc     esi
000000AB 3ACA                   cmp     cl,dl
000000AD 750D                   jnz     loc_000000BC
000000AF 8BCF                   mov     ecx,edi
000000B1 4F                     dec     edi
000000B2 85C9                   test    ecx,ecx
000000B4 75EF                   jnz     loc_000000A5
000000B6                    loc_000000B6:               ; Xref 0000009D
000000B6 5F                     pop     edi
000000B7 33C0                   xor     eax,eax
000000B9 5E                     pop     esi
000000BA 5D                     pop     ebp
000000BB C3                     ret

000000BC                    loc_000000BC:               ; Xref 000000AD
000000BC 5F                     pop     edi
000000BD B801000000             mov     eax,1
000000C2 5E                     pop     esi
000000C3 5D                     pop     ebp
000000C4 C3                     ret

Your ASM code assembled with MASM (ml -c -coff -Fl testa.asm), not sure why you save ECX, not normally preserved by MSVC/WATCOM/INTEL

Code Select

00000000                    _arraysofbytesarediferent:
00000000 55                     push    ebp
00000001 8BEC                   mov     ebp,esp
00000003 51                     push    ecx
00000004 57                     push    edi
00000005 56                     push    esi
00000006 8B7508                 mov     esi,[ebp+8]
00000009 8B7D0C                 mov     edi,[ebp+0Ch]
0000000C 8B4D10                 mov     ecx,[ebp+10h]
0000000F 33C0                   xor     eax,eax
00000011 FC                     cld
00000012 F3A6                   rep     cmpsb
00000014 0F95C0                 setne   al
00000017 5E                     pop     esi
00000018 5F                     pop     edi
00000019 59                     pop     ecx
0000001A C9                     leave
0000001B C3                     ret

TESTA.ASM

Code Select

        .386
        .MODEL FLAT
        .CODE

arraysofbytesarediferent proc c uses ecx edi esi, pointer1:DWORD, pointer2:DWORD, len:DWORD
        mov esi, pointer1
        mov edi, pointer2
        mov ecx, len
        xor eax, eax
        cld
        repe cmpsb
        setne al
        ret
arraysofbytesarediferent endp

        END

You could avoid a lot of assembler grief if you just used memcmp(), ideally as an inline/intrinsic.

That said you should only bother to compare a file if the size and CRC matches (not sure how your doing it now), personally I'd probably use SHA/MD5 or some combination and completely forget about the byte-for-byte.

untio · May 25, 2010, 03:01:44 PM

Hi once more,
I have posted only a part of the file. The full code is:

Code Select

.486

.model flat, c
option casemap:none

.code

public arraydebytessondistintos
arraydebytessondistintos proc c uses ecx edi esi, p1:DWORD, p2:DWORD, lg:DWORD
mov esi, p1
mov edi, p2
mov ecx, lg
xor eax, eax
cld
repe cmpsb
setne al
ret
arraydebytessondistintos endp
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
public arraydedwordssondistintos
arraydedwordssondistintos proc c uses ecx edi esi,  p1:DWORD, p2:DWORD, lg:DWORD
mov esi, p1
mov edi, p2
mov ecx, lg
xor eax, eax
cld
repe cmpsd
setne al
ret
arraydedwordssondistintos endp
end

As you can see, there are two functions. One that compares double words and another that compares bytes. First I compare as double words and like bytes the rest.

I have recompiled the full program with Visual C++ express 2005 and it works perfect. Actually, compiled with gcc it has remained one day.

You are right. The problem is the different coff formats.

And not. I was so afraid that I did not set any optimization.

And talking about optimizations, can this assembly be optimized.

Thanks in advance.

clive · May 25, 2010, 03:07:38 PM

Yes, it can be optimized. Generally functions like memcmp() will deal with as much of the comparison as possible with DWORD's and then switch to BYTE's. It will also try to DWORD align one of the loads. That way you only need ONE function.

CMPSB/CMPSD are not hugely efficient.

This is a reasonably efficient version of your routine. It could be optimized further to handle alignment (of data, and branch targets), and could also be redone in MMX/SSE.

Code Select

compbuffer PROC C USES edi esi, pointer1:DWORD, pointer2:DWORD, len:DWORD

        mov     esi,pointer1
        mov     edi,pointer2
        sub     edi,esi         ; Dereference

        mov     eax,len         ; Byte length
        or      eax,eax
        jz      comp_40         ; Already Zero
        shr     eax,2           ; Convert BYTE count to DWORD count (len / 4)
        jz      comp_20         ; No DWORDs

comp_10:                        ; Process DWORDs
        mov     ecx,[esi]
        cmp     ecx,[edi+esi]
        jnz     comp_50         ; Different

        add     esi,4
        sub     eax,1
        jnz     comp_10

comp_20:
        mov     eax,len         ; Byte length
        and     eax,3           ; Mask to get byte remainder (len % 4)
        jz      comp_40

comp_30:                        ; Process BYTEs
        mov     cl,[esi]
        cmp     cl,[edi+esi]
        jnz     comp_50         ; Different

        add     esi,1
        sub     eax,1
        jnz     comp_30

comp_40:                        ; eax implicitly zero
        ret

comp_50:
        mov     eax,1           ; Different
        ret

compbuffer ENDP

untio · May 25, 2010, 03:39:23 PM

Hi and thank you so much,
I did not imagine that cmps instructions were so inefficient. I'll replace them taking your code as a master example.

Thanks for your time.

News:

What makes fail this code