Print Page - Optimizing 128 bit linear feedback shift register

Title: Optimizing 128 bit linear feedback shift register
Post by: akane on October 02, 2008, 02:52:50 PM

Hello, I wonder if this function could be optimized. I need to XOR 4 bits from one byte and RCL the result to 16 byte buffer.

I started with single AND instruction and used the parity flag, equal to XNOR result:

Code Select

// bits to xor: 7,6,5,0
mov al, somebyte
and al, 11100001 // PF = al.7 xnor al.6 xnor al.5 xnor al.0

// rotate PF flag to CF flag
pushf
shr byte[esp],2
popf

// rotate the buffer with carry
rcl dword [esi],1 ...

It was slow, because of the additional stack operations. Summary 432ms for 18 mega loops, on centrino M 1.86GHz
First optimization: replace push/pop with stc + jp + clc. Much better: 184ms.
Second optimization: current version with 4*xor; 157ms

I see that every instruction inside __lfsr_128 must wait for previous instruction to complete. Any ideas?

Code Select

main:
	// required subsystem: console
	// press any key to quit

	sub esp, 20 ; 128 bit buffer for LFSR + 4 bytes for counter

	; seed the buffer
	mov [esp   ],esp
	mov [esp+ 4],esp
	mov [esp+ 8],esp
	mov [esp+12],esp

	; enter main loop
	invoke GetCurrentThread
	invoke SetThreadPriority, eax, THREAD_PRIORITY_TIME_CRITICAL

	align 4

mainloop:
	invoke _kbhit
	and    al,al
	jnz    quit

	invoke GetTickCount
	mov    [esp+16],eax

	; number of loops equal to 10 mega bytes
	mov    ecx,1024*1024*8

childloop:
	call   __lfsr_128, esp
	loop   childloop

	invoke  GetTickCount
	sub     eax,[esp+16]

	cinvoke printf, "time: %d ms\n", eax
	invoke  Sleep, 1
	jmp     mainloop

quit:
	invoke GetCurrentThread
	invoke SetThreadPriority, eax, THREAD_PRIORITY_NORMAL
	add    esp, 20
	ret


;------------------------
align 4

__lfsr_128:
	; 128 bit linear feedback shift register
	; Feedback bits: 127, 126, 125, 120
	
	mov edx,[esp+4] ; m_bufer ; AL bits   7    6    5         0
	mov al,[edx+15] ; m_bufer[3];  bits 127, 126, 125, ..., 120

	mov ah,al ; ah=765....0
	add ah,ah ; ah=65....0.
	xor al,ah ;             al = al.7 xor al.6
	add ah,ah ; ah=5....0..
	xor al,ah ;             al = al.7 xor al.6 xor al.5
	shl ah,5  ; ah=0.......
	xor al,ah ;             al = al.7 xor al.6 xor al.5 xor al.0
	add al,al ; put al.7 into CY

	; now shift the 128 bits and append the bit from XOR result
	;                   ___
	; m_bufer[3].31 -> |XOR|     ___
	; m_bufer[3].30 -> |___| -> |XOR| ->--------------->-----+
	; m_bufer[3].29 -> |XOR| -> |___|                        |
	; m_bufer[3].24 -> |___|                                 |
	;                                                        |
	; m_bufer[3] <- m_bufer[2] <- m_bufer[1] <- m_bufer[0] <-+

	rcl dword [edx   ],1
	rcl dword [edx+ 4],1
	rcl dword [edx+ 8],1
	rcl dword [edx+12],1

	ret 4

Title: Re: Optimizing 128 bit linear feedback shift register
Post by: Mirno on October 02, 2008, 05:01:18 PM

I'm not entirely sure what you're doing here - but your idea of using the parity flag is a good one. The method you use to move the flag is the problem...
Secondly I'm not entirely sure why you want to do this long string of rcls. The order is deterministic, the first bit you put in will end up in bit 127, and the last will end up in bit 0... Why not put them there to start with?

Code Select


   mov ecx, 080000000h
   mov edx, 3

@@:
   xor eax, eax
   test somebyte, 0E1h
   cmovp eax, ecx
   or result[edx * 4], eax

   ror ecx, 1
   sbb edx, 0
   jl @B

Also I would look at the timing code others have posted on the board, GetTickCount isn't very good as it has too high a margin of error.

Mirno

The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: akane on October 02, 2008, 02:52:50 PM