News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

a counter of lines in mmx instructions

Started by ToutEnMasm, March 18, 2009, 09:20:18 AM

Previous topic - Next topic

Antariy

The other things, I suspect, with short unaligned lines code will not exist after first unaligned (really - aligned, but fixed) read. I'll fix this to addition to unaligned PSHUFD.

Antariy

Quote from: drizz on December 18, 2010, 12:47:34 AM
Yes that's the point, it checks for either 13 or 10, not 13+10 pair. As far as i can see only getlines2 algo does this.

Well, I guess it is reliable to rely, for example on LF as only dilimiter. Because that doesn't make any matter, if data specially crafted to be wrong - nothing will prevent this.

If make CR+LF as line dilimiter, then unix-style line feed would not supported. Further, Unicode text files have tendency to use only one (LF or CR) char as line feed (I talk about Unicode with cosideration of some small changements). So, I'll use my algo :P


In short, other's variations with checking only one type of line feed (and mine variation), is much more flexible... and fast :green2
As you can see, my current algo, its inner loop, is limited by system bus / memory subsystem bandwidth only, it seems.

drizz

Don't get me wrong i like your algo  :wink
The truth cannot be learned ... it can only be recognized.

Antariy

Many thanks to all of the peoples who run the test! :bg

Antariy

Quote from: drizz on December 18, 2010, 01:01:49 AM
Don't get me wrong i like your algo  :wink

No problems :bg

By the way, it is uses popcnt algo posted by you :bg

In general, this algo can be implemented as pop-counting algo. Initially I have implemented it that (with LUT), in MMX.
Just needed to get pop-count of reg after PMOVMSKB to that reg.

drizz

Also is "pand" really neccessary? can't you do:

pxor xmm0,xmm0
@@:
movdqu xmm2,[....]
pcmpeqb xmm2,xmm1 ; xmm1=0a0a0a....
paddb xmm0,xmm2; allways negative

loop @@

pxor xmm2,xmm2
pxor xmm3,xmm3
psubb xmm2,xmm0;<- change sign
psadbw xmm2,xmm3; add
The truth cannot be learned ... it can only be recognized.

Antariy

Quote from: drizz on December 18, 2010, 01:27:31 AM
Also is "pand" really neccessary? can't you do:

pxor xmm0,xmm0
@@:
movdqu xmm2,[....]
pcmpeqb xmm2,xmm1 ; xmm1=0a0a0a....
paddb xmm0,xmm2; allways negative

loop @@

pxor xmm2,xmm2
pxor xmm3,xmm3
psubb xmm2,xmm0;<- change sign
psadbw xmm2,xmm3; add


Oh, I've not notice that thread was updated...

That's good solution. It is possible make something like this:

psubb xmm3,xmm1
jnz @B


instead of

pand xmm1,xmm5

paddd xmm3,xmm1
jnz @B


As I said already - this is first rough version which is seemed as properly worked with unaligned strings, and properly check the tail of the strings.

I'm wonder how fast it would be - at this moment only memory bus is a really limeter as I see. With program prefetching uncommented, it is only slightly faster on my system.

Antariy

Hi!

Here is update of code.

Fixed XMM PSHUFD for a lines dilimiter, and replaced with GPR PSHUFB :lol - thus, now is not needed to pass dilimiter as fully

specified DWORD (i.e. 0d0d0d0dh), needed only pass char code casted to DWORD. I.e., just:

invoke AxCountLines, offset gltestA, len(offset gltestA),13 ; or 10


Also fixed issue with small unaligned string, when the length of the string + misalignment factor is smaller that 16 bytes.

Also in inner loop removed and, and (pun :) replaced with reversed SUB. I done this in slightly different manner, but Drizz draws attention to needless of AND :wink

All other thins is changed only in case of dependency with fixed issues, or for alignment reasons.

Please test this fixed one.

Here is my results:

Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:

jj2007=         99 / 100 / 11 lines
Lingo=          99 / 100 / 11 lines
Lingo2=         100 / 100 / 11 lines
ToutEnMasm=     100 / 100 / 11 lines
AxCountLines=   100 / 100 / 11 lines

Codesizes:
getlinesJJ =            153
getlines Lingo =        217
getlines Lingo2=        278
CompteurLignes =        238
AxCountLines =          262

Counting lines of \masm32\include\winextra.inc:

getlinesJJ: (jj2007)
13741 kiloLAMPs, 1110 kilocycles for 20025 lines, 807877 bytes

getlines (Lingo):
15995 kiloLAMPs, 1085 kilocycles for 20025 lines, 807877 bytes

getlines2 (Lingo):
15041 kiloLAMPs, 902 kilocycles for 20025 lines, 807877 bytes

CompteurLignes: (ToutEnMasm)
19366 kiloLAMPs, 1255 kilocycles for 20025 lines, 807877 bytes

AxCountLines:
13708 kiloLAMPs, 846 kilocycles for 20025 lines, 807877 bytes

LAMPs = Lean And Mean Points = cycles * sqrt(codesize)


Program in attached archive.

dedndave

if i run it on a different drive, i get c0000005
if i run it on my masm32 drive...

prescott w/htt
getlinesJJ: (jj2007)
11646 kiloLAMPs, 941 kilocycles for 20025 lines, 807877 bytes

getlines (Lingo):
14042 kiloLAMPs, 953 kilocycles for 20025 lines, 807877 bytes

getlines2 (Lingo):
10848 kiloLAMPs, 650 kilocycles for 20025 lines, 807877 bytes

CompteurLignes: (ToutEnMasm)
18985 kiloLAMPs, 1230 kilocycles for 20025 lines, 807877 bytes

AxCountLines:
4660 kiloLAMPs, 287 kilocycles for 20025 lines, 807877 bytes

ToutEnMasm

An old subject,Here is the algo i use in many prog:
pmem = pointer on memory block with lines (13,10 ended and not only 10 )
Taille = size of memory block
return eax = Number of lines ended by 13,10 + One line (if exist) not ended by 13,10
Quote
CompteurLignes PROC uses ebx edi esi pmem:DWORD,taille:DWORD
         Local  Nblines:DWORD,count,reste
         local  theEnd:dword
   ;init
   mov Nblines,0
   mov reste,0   
   mov edx,pmem
   add edx,taille
   mov theEnd,edx
   mov edx,pmem
   mov esi,edx   
   and edx,0Fh
   .if edx != 0
      ;search lines in the non align memory
      mov ecx,16
      sub ecx,edx
      @@:
      .if byte ptr [esi] != 0
         .if word ptr [esi] == 0A0Dh
            inc Nblines         
         .endif
      .else
         mov eax,Nblines
         jmp FindeCompteurLignes
      .endif
      inc esi
      dec ecx
      jnz @B      
   .endif
   ;esi point on a 16 aligned memory
   ;count the number  of 32 bytes parts
   mov edx,0
   mov eax,theEnd
   sub eax,esi
   .if eax == 0
      mov eax,Nblines
      jmp FindeCompteurLignes      
   .endif
   .if eax < 32
      mov reste,eax
      mov eax,Nblines      
      jmp EndNonaligned
   .endif
   mov edx,0
   mov ecx,32
   div ecx
   mov count,eax      
   mov reste,edx
   ;--------------------------  search in aligned part -------------      
   ;init of various register
   mov eax, 0d0d0d0dh   ; Ascii 10, linefeed
   movd xmm6, eax
   pshufd xmm6, xmm6, 0   ; linefeeds for comparison in xmm2   
   mov eax,Nblines   ;line counter
   ;ready
   NewBloc:
      ;------ align 16 needed ----------
      ;1731187 cycles for 22274 lines
      movdqa xmm1,xmm6         ;charge 13
      movdqa xmm2,xmm6         ;charge 13      
      pcmpeqb  xmm1,[esi]      ;cmp with memory align 16      
      pcmpeqb  xmm2,[esi+16]      ;cmp with memory ,align 16                  
      pmovmskb ecx, xmm1 ; result in ecx
      pmovmskb edx, xmm2 ; result +16 edx
      shl edx,16
      add ecx,edx
      jz suite
      NbLineBreak:
      bsf   edx,   ecx
      jz suite
      .if    word   ptr [edx+esi] == 0A0Dh
         inc   eax
      .endif
      btr   ecx,   edx
      jmp NbLineBreak
   suite:
   lea esi,[esi+32]
   dec count
   jnz NewBloc
   
EndNonaligned:   
   .if reste != 0
      mov ecx,reste
      @@:
      .if word ptr [esi] == 0A0Dh
         inc eax
      .endif   
      inc esi
      dec ecx
      jnz @B                  
   .endif
   
FindeCompteurLignes:
ret
CompteurLignes endp

clive

Should really check if file exists before crashing (division by 0?)

Atom N450 (1.66 GHz, 512KB L1)


Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:

jj2007=         99 / 100 / 11 lines
Lingo=          99 / 100 / 11 lines
Lingo2=         100 / 100 / 11 lines
ToutEnMasm=     100 / 100 / 11 lines
AxCountLines=   100 / 100 / 11 lines

Codesizes:
getlinesJJ =            153
getlines Lingo =        217
getlines Lingo2=        278
CompteurLignes =        238
AxCountLines =          262

Counting lines of \masm32\include\winextra.inc:

getlinesJJ: (jj2007)
16686 kiloLAMPs, 1349 kilocycles for 20025 lines, 807877 bytes

getlines (Lingo):
15128 kiloLAMPs, 1026 kilocycles for 20025 lines, 807877 bytes

getlines2 (Lingo):
15152 kiloLAMPs, 908 kilocycles for 20025 lines, 807877 bytes

CompteurLignes: (ToutEnMasm)
23702 kiloLAMPs, 1536 kilocycles for 20025 lines, 807877 bytes

AxCountLines:
9001 kiloLAMPs, 556 kilocycles for 20025 lines, 807877 bytes

LAMPs = Lean And Mean Points = cycles * sqrt(codesize)
It could be a random act of randomness. Those happen a lot as well.

dedndave

no Clive - i get access violation - not div/0   :P

Antariy

Quote from: clive on December 18, 2010, 01:21:29 PM
Should really check if file exists before crashing (division by 0?)

The algo itself is not checking for file - it is not allocate memory and read file. This is done in code of main function, which is not mine. Probably, I guess, for simple testbed crashing is good enough way to say that something is wrong :lol

jj2007

Quote from: ToutEnMasm on December 18, 2010, 01:10:56 PM
An old subject,Here is the algo i use in many prog:
pmem = pointer on memory block with lines (13,10 ended and not only 10 )

Want some more options?
Quote   Recall "\masm32\include\winextra.inc", MyRec$()
   mov lc, Min(eax, 20)
   Print Str$("%i lines found", lc)
   For_ n=0 To lc-1
      Print Str$("\nRec %i\t", n)
      Print Left$(MyRec$(n), 50)
   Next

   Recall "MyFile.csv", MyRec$(), csv   ; loads a spreadsheet in comma separated values format
   Recall "MyFile.txt", MyRec$(), tab   ; loads a spreadsheet in tab-delimited format
Rem[/color]   - returns lines in eax (null if an error occurred), and the number of total bytes read in edx
   - see Store below for saving arrays
   - with the spreadsheet variants, single cells can be accessed via Let My$=MyRec$(row, column);

sinsi

Win7 pro x64, quad q6600

Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:

jj2007=         99 / 100 / 11 lines
Lingo=          99 / 100 / 11 lines
Lingo2=         100 / 100 / 11 lines
ToutEnMasm=     100 / 100 / 11 lines
AxCountLines=   100 / 100 / 11 lines

Codesizes:
getlinesJJ =            153
getlines Lingo =        217
getlines Lingo2=        278
CompteurLignes =        238
AxCountLines =          262

Counting lines of \masm32\include\winextra.inc:

getlinesJJ: (jj2007)
3874 kiloLAMPs, 313 kilocycles for 20025 lines, 807877 bytes

getlines (Lingo):
4005 kiloLAMPs, 271 kilocycles for 20025 lines, 807877 bytes

getlines2 (Lingo):
5466 kiloLAMPs, 327 kilocycles for 20025 lines, 807877 bytes

CompteurLignes: (ToutEnMasm)
12880 kiloLAMPs, 834 kilocycles for 20025 lines, 807877 bytes

AxCountLines:
2358 kiloLAMPs, 145 kilocycles for 20025 lines, 807877 bytes

LAMPs = Lean And Mean Points = cycles * sqrt(codesize)

Light travels faster than sound, that's why some people seem bright until you hear them.