News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

a counter of lines in mmx instructions

Started by ToutEnMasm, March 18, 2009, 09:20:18 AM

Previous topic - Next topic

ToutEnMasm

Hello,
Thinks this one could be a good explain on mmx use and perhaps could be optimize.
There is comments to follow the works in the registers.

Quote
;pmem,pointer on a file loaded in memory
;Taille ,size of this file
;return eax ,numbers of lines
;tested on windows.inc 22274 lines
;each 13,10 is tested,he couldn't have 10 alone or 13 alone (HTML..)
;---------------------------------------------------------------------
;can be used also for a string in data,alignment of 16 is not needed
;but there is little interest in this,simple method exist
;must be compile with at least ml 6.15 (Iczelion site)
;.586 and .xmm in the declarations
;------- mask for bytes ------------
;sse2entree    db 13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,0
;sse2line      db 10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,0
;################################################################
CompteurLignes PROC uses ebx edi esi pmem:DWORD,taille:DWORD
         local  theEnd:dword,retenue:DWORD
         Local  retour:DWORD
   ;init of various register
   mov retenue,0
   lea eax,sse2entree
   movups xmm6,[eax]
   lea eax,sse2line
   movups xmm7,[eax]
   mov eax,pmem
   add eax,taille
   mov theEnd,eax
   mov edx,pmem
   mov eax,0   ;line counter
   ;ready
   NewBloc:
   mov edx,pmem
   movups xmm1,[edx]         ;load 16 bytes
   movups xmm2,[edx]         ;charger les mêmes 16 bytes
   pcmpeqb  xmm1,xmm6      ;13 ,xmm1 modified
   ;ff 00 ff 00 ff 00 ff 00-00 00 00 00 00 00 00 00  ;example of modified xmm1 in the dugger
   pcmpeqb  xmm2,xmm7      ;10 ,xmm2 modified   
   ;00 ff 00 ff 00 ff 00 ff-00 00 00 00 00 00 00 00  ;modified xmm2
   ;--------- debug ---------
   ;lea edx,TamponVisu1      ;just here to view the result in the debugger
   ;movups [edx],xmm1
   ;lea edx,TamponVisu2   
   ;movups [edx],xmm2      
   ;-------- debug ------------------
   ;the ;ff 00 ff .... must be read right to left
   ;ecx and edx must be read left to right
   pmovmskb ecx, xmm1 ;1010101   ;result for 13 in ecx 
   pmovmskb edx, xmm2 ;10101010   ;result for 10 in edx
   ;each 13 are now only  a  bit in ecx
   ;each 10 are now only a bit in edx
   ;---------- search for bits and there positions -------
   ;instructions are internal and very fast
   ;ecx and edx have now her bits set to 1 as the same position
   
   .if retenue == 1
      ;the last 16 bytes loaded was ended by a 13,bit 15
      ;Test if bit zero is one (there is a 10)
      bsf  edi,edx      ;search the 10
      .if edi == 0 && edx != 0   
         btr edx,edi         ;delet the bit find in edx,10
         inc eax         ;inc the number of lines
      .endif            
   .endif
   NewEntree:
   bsf   ebx,ecx         ;give the position of one bit at 1
   .if ebx == 0 && ecx != 0
      ;le bit zero est a 1
      jmp bitzero13a1
   .endif
   .if ebx            ;not zero ,on a un 13         
      bitzero13a1:
      btr ecx,ebx         ;efface le bit trouvé dans ecx
      .if ebx != 15
         inc ebx            ;incrémente sa positon pour la comparer au 10
      .elseif
         mov retenue,1   ;le 10 va se trouver dans le chargement suivant
         jmp NewSeriede16
         ;suite a voir
      .endif
      
      traiteretenue:
      bsf  edi,edx      ;cherche le 0A
      .if edi == 0 && edx != 0
         ;le bit zero des 10 est a 1
         jmp bitzero10a1
      .endif
      .if  edi
         bitzero10a1:
         btr edx,edi         ;efface le bit trouvé dans edx,10         
         .if ebx == edi   ;un 10 suit immédiatement le 13      
            inc eax   ;////////// inc the number of lines /////////////////
         .elseif ebx >edi
            ;this 10 is alone
            ;c'est un 10 isolé avant le 13 ,HTML
            mov edi,0
            jmp traiteretenue
         .endif
      .endif
      ;------ conditions de rebouclage ---------------
      .if ecx != 0
         mov ebx,0      ;avoid an undeterminated state
         mov edi,0
         jmp NewEntree
      .endif
   .endif
   NewSeriede16:
   mov edx,pmem
   add edx,16   
   mov pmem,edx
   .if edx < theEnd
      jmp NewBloc
   .endif

FindeCompteurLignes:
         ret
CompteurLignes endp

jj2007

Pas mal mais on pourrait optimiser un peu :wink

QuoteCounting lines of \masm32\include\windows.inc (P4):

markl_CountFileLines:
515431  cycles for 22272 lines, 849759 bytes

CompteurLignes:
5803740 cycles for 22272 lines, 849759 bytes

[attachment deleted by admin]

ToutEnMasm


The one of Markl dont do the same job.
I need to be certain that a 13,10 is here,not only a 10,not only a 13.
This grant access to html and rtf text files without problems.
I have made further applications that use them.
The Markl  just read the 0a,not enough for me.

UtillMasm

C:>set path=\masm32\bin

C:>ml /c /coff CountLinesSSE2.asm
Microsoft (R) Macro Assembler Version 6.14.8444
Copyright (C) Microsoft Corp 1981-1997.  All rights reserved.

Assembling: CountLinesSSE2.asm
CountLinesSSE2.asm(3) : fatal error A1000: cannot open file : \masm32\macros\tim
ers.asm

C:>pause
Press any key to continue . . .
---------------------------------------------------
I needs your file.
Thanks.

jj2007

Quote from: UtillMasm on March 18, 2009, 12:56:44 PM
CountLinesSSE2.asm(3) : fatal error A1000: cannot open file : \masm32\macros\timers.asm
---------------------------------------------------
I needs your file.
Thanks.

Sorry, I thought everybody here in the forum has them :green
See code timing macros (downloaded 1226 times)

UtillMasm


PBrennick

The GeneSys Project is available from:
The Repository or My crappy website

ToutEnMasm


I need also a good explain on how the algo of markl work,timer.asm is on the masm 10 package.
I will make a little experiments and made a new post

jj2007

Quote from: ToutEnMasm on March 18, 2009, 12:49:32 PM

The one of Markl dont do the same job.
I need to be certain that a 13,10 is here,not only a 10,not only a 13.
This grant access to html and rtf text files without problems.

In rtf files, a CrLf is defined as \pard or \par
In html files, a CrLf is defined as <p> or <br>

Can you explain how your code counts lines in these files?

ToutEnMasm

I have made another compile of the optimize sample of jj2007 (thanks for it) with a minimal requirement and a little secure (don't find ..).

How my algo works,simple
Load 16 bytes,xmm1
load 16 bytes , xmm2    the same 16 bytes
The mask of 13(xmm6) is apply to xmm1
the mask  of 10 (xmm7) is apply to xmm2
then each masked value is put in 32 bits register, 1 byte to FF = 1 bit to 1 at same position

Then we have ecx,edx filled with the result
bsf search the first bit at 1 and return his position
btr destroy this bit that we have counted

the test is              position  13         ;we find a bit a 1 in ecx
                            position + 1  must be  a 10

If it is                    eax = eax + 1

we must take care that a 13 can be the 15 byte in xmm1 or the 15 bit in ecx
                     in this case we must find the 10 in the next bloc of 16 bytes retenue=1

continue until no mor bit a 1 in ecx
Special     bit 0 a 1 return  0      ;it is is rank , if it was bit 15 ,return 15



[attachment deleted by admin]

UtillMasm

I run the EXE file, and crashed.
WinVista SP1 32bit.

ToutEnMasm


To UtillMasm,
Which exe ?
There is mine countli.exe who answer "couldn't find" in XP and the one of jj2007 who isn't protected ?

jj2007

Quote from: ToutEnMasm on March 18, 2009, 03:39:49 PM
we must take care that a 13 can be the 15 byte in xmm1 or the 15 bit in ecx
                     in this case we must find the 10 in the next bloc of 16 bytes retenue=1
movups xmm1,[edx]         ;load 16 bytes, e..g
   movups xmm2,[edx+1]      ;charger (presque) les mêmes 16 bytes ;-)

Ascii 10 follows Ascii 13...

ToutEnMasm

Fran-glish

Quote
we must take care that the 15° bit couldn't be follow by 16° bit
                     The 16° bit don't exist so:
                     in this case we must find the 10 in the next bloc of 16 bytes retenue=1


But i have found something more simple  and faster to do the same thing:
Quote
   NewBloc:
   movups xmm1,[edx]         ;load 16 bytes
   movups xmm2,[edx+1]      ;decale of 1
   pcmpeqb  xmm1,xmm6      ;13 ,xmm1 modified
   pcmpeqb  xmm2,xmm7      ;10 ,xmm2 modified
   pand     xmm1,xmm2      ;et logique FF et 00 = 0,result in xmm1
   pmovmskb ecx, xmm1 ; result for 13,10 in ecx
   @@:
   mov edi,0         ;take care with undeterminate state
   bsf  edi,ecx      ;search the 13,10
   .if edi == 0 && ecx != 0   
      btr ecx,edi         ;delet the bit find in edx,10
      inc eax         ;inc the number of lines
   .elseif edi != 0
      btr ecx,edi         ;delet the bit find in edx,10      
      inc eax
   .endif
   .if ecx != 0
      jmp @B
   .endif               
   ;3 702 880 cycles for 22274 lines, 849788 bytes   

If i had an instruction that can count the number of bit to 1 in ecx,i can be more faster.
With a few number of lines,i have a crash with markl_CountFileLines
This one take 1024 bytes in one pass,me only 16









jj2007

Quote from: ToutEnMasm on March 18, 2009, 07:12:29 PM
But i have found something more simple

   movups xmm1,[edx]         ;load 16 bytes
   movups xmm2,[edx+1]      ;decale of 1
Congrats :bg

Quote
If i had an instruction that can count the number of bit to 1 in ecx,i can be more faster.

You need the popcount instruction