This is a little toy I've been playing around with in visual basic. Bascially it takes 2 different byte arrays (msg, key) and xors the msg array. Do you have any ideas on improving the speed of this?
SAFEARRAYBOUND struct
cElements DWORD ? ; Number of Elements
lLbound DWORD ? ; Lower Boundary
SAFEARRAYBOUND ends
OLE_SAFEARRAY struct
cDims WORD ? ; Number of dimensions
fFeatures WORD ? ; Bitfield indicating attributes
cbElements DWORD ? ; size of an element of the array
cLocks DWORD ? ; lock counter 0=Locked
pvData DWORD ? ; Pointer to data
rgsabound SAFEARRAYBOUND <> ; Contains info for dimensions
OLE_SAFEARRAY ends
xorbytes proc msg:DWORD, key:DWORD
push ebx
push esi
push edi
push ebp
mov ebx, msg ; Message Array
mov edx, key ; Key Array
mov eax, [ebx] ; reference Message Array
mov ecx, [edx] ; reference Key Array
; EBX = Len of Key
; ESI = Pointer to key data.
; EDX = Len of msg
; EDI = Pointer to msg data.
mov ebx, (OLE_SAFEARRAY ptr [ecx]).rgsabound.cElements
mov esi, (OLE_SAFEARRAY ptr [ecx]).pvData
mov edx, (OLE_SAFEARRAY ptr [eax]).rgsabound.cElements
mov edi, (OLE_SAFEARRAY ptr [eax]).pvData
cmp edx, 0
je done
cmp ebx, 0
je done
xor ecx, ecx
xor ebp, ebp
; EBX = Len of Key
; ESI = Pointer to key data.
; EDX = Len of msg
; EDI = Pointer to msg data.
; EBP = Counter
; ECX = Counter
For_Loop:
mov al, [esi + ebp]
xor byte ptr [edi + ecx], al
cmp ebp, ebx
jne L1
xor ebp, ebp
jmp L2
L1:
add ebp, 1
L2:
add ecx, 1
cmp ecx, edx
jne For_Loop
done:
pop ebp
pop edi
pop esi
pop ebx
ret
xorbytes endp
xorbytes proc msg:DWORD, key:DWORD
push ebx
push esi
push edi
push ebp
mov ebx, msg ; Message Array
mov edx, key ; Key Array
mov eax, [ebx] ; reference Message Array
mov ecx, [edx] ; reference Key Array
; EBX = Len of Key
; ESI = Pointer to key data.
; EDX = Len of msg
; EDI = Pointer to msg data.
mov ebx, (OLE_SAFEARRAY ptr [ecx]).rgsabound.cElements
mov esi, (OLE_SAFEARRAY ptr [ecx]).pvData
mov edx, (OLE_SAFEARRAY ptr [eax]).rgsabound.cElements
mov edi, (OLE_SAFEARRAY ptr [eax]).pvData
cmp edx, 0
je done
cmp ebx, 0
je done
xor ecx, ecx
xor ebp, ebp
; EBX = Len of Key
; ESI = Pointer to key data.
; EDX = Len of msg
; EDI = Pointer to msg data.
; EBP = Counter
; ECX = Counter
For_Loop:
mov al, [esi + ebp]
xor byte ptr [edi + ecx], al
cmp ebp, ebx
jne L1
xor ebp, ebp
jmp L2
L1:
add ebp, 1
L2:
add ecx, 1
cmp ecx, edx
jne For_Loop
done:
pop ebp
pop edi
pop esi
pop ebx
ret
xorbytes endp
quite a bit, but here are some things to get you started.
1) move to dwords, instead of bytes, You can XOR a whole dword at a time.
2) Make sure your data is 4 byte aligned when you do number 1
3) Non-p4 systems, you can try aligning the labels.
4) extend number 1 to use MMX or SSE2 ( look at the MOVQ/PXOR and MOVDQA/PXOR commands respectively)
5) if your data is greater than 4KB look at doing TLB priming, it might give you a speed up
6) try reading groups of data, XORing them with memory to break up depedencies and promote parallelism ( you'd need to free up more registers to do this). The example following combines 1 and 6 ( using a dword at a time)
mov eax, [esi + ebp]
mov ebx, [esi + ebp + 4] ;assuming EBX has been freed up
xor [edi + ecx], eax
xor [edi + ecx + 4], ebx ;assuming EBX has been freed up
7) look at the nrandom example in our code to see a great way to free up an extra register. You can use that extra register for #6.
8) there are other things you can do to speed it up, but this will get you started.
Take a look at my optimization webpage for further ideas. I have a copy on Mant's website and also a local copy on masmforum.
http://www.visionx.com/markl/optimization_tips.htm
http://masmforum.com/website/mark/index.htm
Is that doable when the key can be any size? The problem I run into with that is keeping up with the key as it can be any size.
example inputs:
msg = "This is the msg to xor"
key = "gumby"
odd number of bytes, you simply XOR the first value of each string, and then do it in pairs. Standard trick ;) You can also pad a 0 ( or multiple 0's) to the end of the msg. I usually do the second trick, because I don't have to do any CMP instruction to see if the number of characters is already divisible. And I don't have to do two sets of code. One to handle a byte at a time, and one to handle pairs of dwords at a time.
Quote from: Mark_Larson on December 24, 2004, 02:34:21 PM
odd number of bytes, you simply XOR the first value of each string, and then do it in pairs. Standard trick ;) You can also pad a 0 ( or multiple 0's) to the end of the msg. I usually do the second trick, because I don't have to do any CMP instruction to see if the number of characters is already divisible. And I don't have to do two sets of code. One to handle a byte at a time, and one to handle pairs of dwords at a time.
Thanks mark