Hi,
I have an opensource project named 'duplicadoseraser2' that searches for duplicated files and it has been some time buggy. It is written in CPP but, until yesterday it had got the next assembly code, with others one, that compares two arrays of bytes:
arraysofbytesarediferent proc c uses ecx edi esi, pointer1:DWORD, pointer2:DWORD, len:DWORD
mov esi, pointer1
mov edi, pointer2
mov ecx, len
xor eax, eax
cld
repe cmpsb
setne al
ret
arraysofbytesarediferent endp
This procedure should return 1 if the two arrays have got some different value. I have read this code an I think that is right. If it is not right, can you tell me were is the error.
And in the case that it is right, the problem may be that it is receiving badly the parameters. May be it needs to receive a far pointer. If this is the case, how can I implement the process of the far pointers.
Thanks in advance.
it looks like it should work fine
you shouldn't need to use far pointers
In terms of speed this one is bad. Using the runtime library would surly more efficient.
You may want to switch to SSEx instructions, which allow to compare 16 Bytes per instruction.
It will work if the files are 2 gig or under but Windows NTFS suports files larger than 4 gig. If both will fit into 32 bit memory and straight memory compare will do the job but if the files are large you will need to have both files open and read the same size chunks from each doing the comparisons on the chunks.
You may be able to multithread the code by doing the file input in one thread while doing the comparison in the other.
Hi and thanks for your answers,
I'll explain more: this code was in its own module. I compiled it with masm and the cpp code with gcc and I linked it with gcc linker. But it always returned FALSE (I think that false equals to zero). The program is opensource. You can find it at sourceforge. Now it is completely written in cpp.
And, when I did optimize the code in size or in speed the application crashed. May be they are fully incompatible, gcc and masm?
Thanks in advance.
Vaguely I remembber a few different versions of COFF, Microsoft spec their own but I think there are others that are not full compatible. With GCC I would tend to write the code in Intel mode GAS to be safe.
Hi and thank you once more,
I'll try to rewrite it in gas, but now I have another doubt. If I write a dll in masm, it is safe to use it with any other compiler. Because, now, I have a lot of doubts.
And another question about mmx and fpu. Can you recommend me a good book to learn mmx and fpu programming.
Thanks in advance.
The code you want to replace/optimize?
BOOL arraydebytessondistintos(BYTE * pprimero, BYTE *psegundo, ULONG len)
{
ULONG x;
BYTE *prim;
BYTE *sec;
prim = pprimero;
sec = psegundo;
for(x = len; x > 0; x--)
{
if(*prim != *sec)
return TRUE;
++prim;
++sec;
}
return FALSE;
}
Looks to have a standard calling convention, though the code is hideously inefficient, do you have optimization enabled?
0040625A fn_0040625A: ; Xref 0040612E
0040625A 55 push ebp
0040625B 89E5 mov ebp,esp
0040625D 83EC10 sub esp,10h
00406260 8B4508 mov eax,[ebp+8]
00406263 8945F8 mov [ebp-8],eax
00406266 8B450C mov eax,[ebp+0Ch]
00406269 8945F4 mov [ebp-0Ch],eax
0040626C 8B4510 mov eax,[ebp+10h]
0040626F 8945FC mov [ebp-4],eax
00406272 loc_00406272: ; Xref 004062A0
00406272 837DFC00 cmp dword ptr [ebp-4],0
00406276 742A jz loc_004062A2
00406278 8B45F8 mov eax,[ebp-8]
0040627B 0FB610 movzx edx,byte ptr [eax]
0040627E 8B45F4 mov eax,[ebp-0Ch]
00406281 0FB600 movzx eax,byte ptr [eax]
00406284 39C2 cmp edx,eax
00406286 7409 jz loc_00406291
00406288 C745F001000000 mov dword ptr [ebp-10h],1
0040628F EB18 jmp loc_004062A9
00406291 loc_00406291: ; Xref 00406286
00406291 8D45F8 lea eax,[ebp-8]
00406294 FF00 inc dword ptr [eax]
00406296 8D45F4 lea eax,[ebp-0Ch]
00406299 FF00 inc dword ptr [eax]
0040629B 8D45FC lea eax,[ebp-4]
0040629E FF08 dec dword ptr [eax]
004062A0 EBD0 jmp loc_00406272
004062A2 loc_004062A2: ; Xref 00406276
004062A2 C745F000000000 mov dword ptr [ebp-10h],0
004062A9 loc_004062A9: ; Xref 0040628F
004062A9 8B45F0 mov eax,[ebp-10h]
004062AC C9 leave
004062AD C3 ret
C doesn't modify the callers parameters, so it is safe to use the parameters directly as they are thrown away.
BOOL arraydebytessondistintosB(BYTE * prim, BYTE *sec, ULONG len)
{
while(len--)
{
if(*prim++ != *sec++)
return TRUE;
}
return FALSE;
}
Here is the code Microsoft C++ generates
Disassembly
00000050 _arraydebytessondistintos:
00000050 55 push ebp
00000051 8BEC mov ebp,esp
00000053 8B4D10 mov ecx,[ebp+10h]
00000056 8B4508 mov eax,[ebp+8]
00000059 53 push ebx
0000005A 56 push esi
0000005B 85C9 test ecx,ecx
0000005D 8BF1 mov esi,ecx
0000005F 7612 jbe loc_00000073
00000061 8B4D0C mov ecx,[ebp+0Ch]
00000064 2BC8 sub ecx,eax
00000066 loc_00000066: ; Xref 00000071
00000066 8A10 mov dl,[eax]
00000068 8A1C01 mov bl,[ecx+eax]
0000006B 3AD3 cmp dl,bl
0000006D 750A jnz loc_00000079
0000006F 40 inc eax
00000070 4E dec esi
00000071 75F3 jnz loc_00000066
00000073 loc_00000073: ; Xref 0000005F
00000073 5E pop esi
00000074 33C0 xor eax,eax
00000076 5B pop ebx
00000077 5D pop ebp
00000078 C3 ret
00000079 loc_00000079: ; Xref 0000006D
00000079 5E pop esi
0000007A B801000000 mov eax,1
0000007F 5B pop ebx
00000080 5D pop ebp
00000081 C3 ret
00000082 90909090909090909090.. db 14 dup (090h)
00000090 _arraydebytessondistintosB:
00000090 55 push ebp
00000091 8BEC mov ebp,esp
00000093 56 push esi
00000094 57 push edi
00000095 8B7D10 mov edi,[ebp+10h]
00000098 8BC7 mov eax,edi
0000009A 4F dec edi
0000009B 85C0 test eax,eax
0000009D 7417 jz loc_000000B6
0000009F 8B450C mov eax,[ebp+0Ch]
000000A2 8B7508 mov esi,[ebp+8]
000000A5 loc_000000A5: ; Xref 000000B4
000000A5 8A10 mov dl,[eax]
000000A7 8A0E mov cl,[esi]
000000A9 40 inc eax
000000AA 46 inc esi
000000AB 3ACA cmp cl,dl
000000AD 750D jnz loc_000000BC
000000AF 8BCF mov ecx,edi
000000B1 4F dec edi
000000B2 85C9 test ecx,ecx
000000B4 75EF jnz loc_000000A5
000000B6 loc_000000B6: ; Xref 0000009D
000000B6 5F pop edi
000000B7 33C0 xor eax,eax
000000B9 5E pop esi
000000BA 5D pop ebp
000000BB C3 ret
000000BC loc_000000BC: ; Xref 000000AD
000000BC 5F pop edi
000000BD B801000000 mov eax,1
000000C2 5E pop esi
000000C3 5D pop ebp
000000C4 C3 ret
Your ASM code assembled with MASM (ml -c -coff -Fl testa.asm), not sure why you save ECX, not normally preserved by MSVC/WATCOM/INTEL
00000000 _arraysofbytesarediferent:
00000000 55 push ebp
00000001 8BEC mov ebp,esp
00000003 51 push ecx
00000004 57 push edi
00000005 56 push esi
00000006 8B7508 mov esi,[ebp+8]
00000009 8B7D0C mov edi,[ebp+0Ch]
0000000C 8B4D10 mov ecx,[ebp+10h]
0000000F 33C0 xor eax,eax
00000011 FC cld
00000012 F3A6 rep cmpsb
00000014 0F95C0 setne al
00000017 5E pop esi
00000018 5F pop edi
00000019 59 pop ecx
0000001A C9 leave
0000001B C3 ret
TESTA.ASM
.386
.MODEL FLAT
.CODE
arraysofbytesarediferent proc c uses ecx edi esi, pointer1:DWORD, pointer2:DWORD, len:DWORD
mov esi, pointer1
mov edi, pointer2
mov ecx, len
xor eax, eax
cld
repe cmpsb
setne al
ret
arraysofbytesarediferent endp
END
You could avoid a lot of assembler grief if you just used memcmp(), ideally as an inline/intrinsic.
That said you should only bother to compare a file if the size and CRC matches (not sure how your doing it now), personally I'd probably use SHA/MD5 or some combination and completely forget about the byte-for-byte.
Hi once more,
I have posted only a part of the file. The full code is:
.486
.model flat, c
option casemap:none
.code
public arraydebytessondistintos
arraydebytessondistintos proc c uses ecx edi esi, p1:DWORD, p2:DWORD, lg:DWORD
mov esi, p1
mov edi, p2
mov ecx, lg
xor eax, eax
cld
repe cmpsb
setne al
ret
arraydebytessondistintos endp
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
public arraydedwordssondistintos
arraydedwordssondistintos proc c uses ecx edi esi, p1:DWORD, p2:DWORD, lg:DWORD
mov esi, p1
mov edi, p2
mov ecx, lg
xor eax, eax
cld
repe cmpsd
setne al
ret
arraydedwordssondistintos endp
end
As you can see, there are two functions. One that compares double words and another that compares bytes. First I compare as double words and like bytes the rest.
I have recompiled the full program with Visual C++ express 2005 and it works perfect. Actually, compiled with gcc it has remained one day.
You are right. The problem is the different coff formats.
And not. I was so afraid that I did not set any optimization.
And talking about optimizations, can this assembly be optimized.
Thanks in advance.
Yes, it can be optimized. Generally functions like memcmp() will deal with as much of the comparison as possible with DWORD's and then switch to BYTE's. It will also try to DWORD align one of the loads. That way you only need ONE function.
CMPSB/CMPSD are not hugely efficient.
This is a reasonably efficient version of your routine. It could be optimized further to handle alignment (of data, and branch targets), and could also be redone in MMX/SSE.
compbuffer PROC C USES edi esi, pointer1:DWORD, pointer2:DWORD, len:DWORD
mov esi,pointer1
mov edi,pointer2
sub edi,esi ; Dereference
mov eax,len ; Byte length
or eax,eax
jz comp_40 ; Already Zero
shr eax,2 ; Convert BYTE count to DWORD count (len / 4)
jz comp_20 ; No DWORDs
comp_10: ; Process DWORDs
mov ecx,[esi]
cmp ecx,[edi+esi]
jnz comp_50 ; Different
add esi,4
sub eax,1
jnz comp_10
comp_20:
mov eax,len ; Byte length
and eax,3 ; Mask to get byte remainder (len % 4)
jz comp_40
comp_30: ; Process BYTEs
mov cl,[esi]
cmp cl,[edi+esi]
jnz comp_50 ; Different
add esi,1
sub eax,1
jnz comp_30
comp_40: ; eax implicitly zero
ret
comp_50:
mov eax,1 ; Different
ret
compbuffer ENDP
Hi and thank you so much,
I did not imagine that cmps instructions were so inefficient. I'll replace them taking your code as a master example.
Thanks for your time.