News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

xor string

Started by Maven, December 23, 2004, 10:37:10 PM

Previous topic - Next topic

Maven

This is a little toy I've been playing around with in visual basic. Bascially it takes 2 different byte arrays (msg, key)  and xors the msg array. Do you have any ideas on improving the speed of this?




      SAFEARRAYBOUND struct
            cElements       DWORD ?         ; Number of Elements
            lLbound       DWORD ?         ; Lower Boundary
      SAFEARRAYBOUND ends

      OLE_SAFEARRAY struct
      cDims             WORD ?         ; Number of dimensions
      fFeatures       WORD ?         ; Bitfield indicating attributes
      cbElements       DWORD ?         ; size of an element of the array
          cLocks       DWORD ?         ; lock counter 0=Locked
      pvData       DWORD ?         ; Pointer to data
      rgsabound             SAFEARRAYBOUND <>   ; Contains info for dimensions
      OLE_SAFEARRAY ends



xorbytes proc msg:DWORD, key:DWORD
push ebx
push esi
push edi
push ebp

mov ebx, msg ; Message Array
mov edx, key ; Key Array

mov eax, [ebx] ; reference Message Array
mov ecx, [edx] ; reference Key Array

; EBX = Len of Key
; ESI = Pointer to key data.
; EDX = Len of msg
; EDI = Pointer to msg data.

mov ebx, (OLE_SAFEARRAY ptr [ecx]).rgsabound.cElements
mov esi, (OLE_SAFEARRAY ptr [ecx]).pvData
mov edx, (OLE_SAFEARRAY ptr [eax]).rgsabound.cElements
mov edi, (OLE_SAFEARRAY ptr [eax]).pvData

cmp edx, 0
je done

cmp ebx, 0
je done
     
xor ecx, ecx
xor ebp, ebp

; EBX = Len of Key
; ESI = Pointer to key data.
; EDX = Len of msg
; EDI = Pointer to msg data.
; EBP = Counter
; ECX = Counter

For_Loop:
mov al, [esi + ebp]
xor byte ptr [edi + ecx], al

cmp ebp, ebx
jne L1

xor ebp, ebp

jmp L2

L1:

add ebp, 1

L2:

add ecx, 1
cmp ecx, edx
jne For_Loop

done:
    pop ebp
    pop edi
    pop esi
    pop ebx
    ret
xorbytes endp

Mark_Larson

#1

xorbytes proc msg:DWORD, key:DWORD
push ebx
push esi
push edi
push ebp

mov ebx, msg ; Message Array
mov edx, key ; Key Array

mov eax, [ebx] ; reference Message Array
mov ecx, [edx] ; reference Key Array

; EBX = Len of Key
; ESI = Pointer to key data.
; EDX = Len of msg
; EDI = Pointer to msg data.

mov ebx, (OLE_SAFEARRAY ptr [ecx]).rgsabound.cElements
mov esi, (OLE_SAFEARRAY ptr [ecx]).pvData
mov edx, (OLE_SAFEARRAY ptr [eax]).rgsabound.cElements
mov edi, (OLE_SAFEARRAY ptr [eax]).pvData

cmp edx, 0
je done

cmp ebx, 0
je done
     
xor ecx, ecx
xor ebp, ebp

; EBX = Len of Key
; ESI = Pointer to key data.
; EDX = Len of msg
; EDI = Pointer to msg data.
; EBP = Counter
; ECX = Counter

For_Loop:
mov al, [esi + ebp]
xor byte ptr [edi + ecx], al

cmp ebp, ebx
jne L1

xor ebp, ebp

jmp L2

L1:

add ebp, 1

L2:

add ecx, 1
cmp ecx, edx
jne For_Loop

done:
    pop ebp
    pop edi
    pop esi
    pop ebx
    ret
xorbytes endp


quite a bit, but here are some things to get you started.

1) move to dwords, instead of bytes,  You can XOR a whole dword at a time.

2) Make sure your data is 4 byte aligned when you do number 1

3) Non-p4 systems, you can try aligning the labels.

4) extend number 1 to use MMX or SSE2 ( look at the MOVQ/PXOR and MOVDQA/PXOR commands respectively)

5) if your data is greater than 4KB look at doing TLB priming, it might give you a speed up

6) try reading groups of data, XORing them with memory to break up depedencies and promote parallelism ( you'd need to free up more registers to do this).  The example following combines 1 and 6 ( using a dword at a time)

mov eax, [esi + ebp]
mov ebx, [esi + ebp + 4]          ;assuming EBX has been freed up
xor [edi + ecx], eax
xor [edi + ecx + 4], ebx            ;assuming EBX has been freed up


7) look at the nrandom example in our code to see a great way to free up an extra register.  You can use that extra register for #6.

8) there are other things you can do to speed it up, but this will get you started.

Take a look at my optimization webpage for further ideas.  I have a copy on Mant's website and also a local copy on masmforum.

http://www.visionx.com/markl/optimization_tips.htm

http://masmforum.com/website/mark/index.htm
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

Maven

Is that doable when the key can be any size? The problem I run into with that is keeping up with the key as it can be any size.

example inputs:

msg = "This is the msg to xor"
key = "gumby"


Mark_Larson


odd number of bytes, you simply XOR the first value of each string, and then do it in pairs.  Standard trick ;)  You can also pad a 0 ( or multiple 0's) to the end of the msg.  I usually do the second trick, because I don't have to do any CMP instruction to see if the number of characters is already divisible.  And I don't have to do two sets of code.  One to handle a byte at a time, and one to handle pairs of dwords at a time. 
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

Maven

Quote from: Mark_Larson on December 24, 2004, 02:34:21 PM

odd number of bytes, you simply XOR the first value of each string, and then do it in pairs.  Standard trick ;)  You can also pad a 0 ( or multiple 0's) to the end of the msg.  I usually do the second trick, because I don't have to do any CMP instruction to see if the number of characters is already divisible.  And I don't have to do two sets of code.  One to handle a byte at a time, and one to handle pairs of dwords at a time. 

Thanks mark