Hello,
Thinks this one could be a good explain on mmx use and perhaps could be optimize.
There is comments to follow the works in the registers.
Quote
;pmem,pointer on a file loaded in memory
;Taille ,size of this file
;return eax ,numbers of lines
;tested on windows.inc 22274 lines
;each 13,10 is tested,he couldn't have 10 alone or 13 alone (HTML..)
;---------------------------------------------------------------------
;can be used also for a string in data,alignment of 16 is not needed
;but there is little interest in this,simple method exist
;must be compile with at least ml 6.15 (Iczelion site)
;.586 and .xmm in the declarations
;------- mask for bytes ------------
;sse2entree db 13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,0
;sse2line db 10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,0
;################################################################
CompteurLignes PROC uses ebx edi esi pmem:DWORD,taille:DWORD
local theEnd:dword,retenue:DWORD
Local retour:DWORD
;init of various register
mov retenue,0
lea eax,sse2entree
movups xmm6,[eax]
lea eax,sse2line
movups xmm7,[eax]
mov eax,pmem
add eax,taille
mov theEnd,eax
mov edx,pmem
mov eax,0 ;line counter
;ready
NewBloc:
mov edx,pmem
movups xmm1,[edx] ;load 16 bytes
movups xmm2,[edx] ;charger les mêmes 16 bytes
pcmpeqb xmm1,xmm6 ;13 ,xmm1 modified
;ff 00 ff 00 ff 00 ff 00-00 00 00 00 00 00 00 00 ;example of modified xmm1 in the dugger
pcmpeqb xmm2,xmm7 ;10 ,xmm2 modified
;00 ff 00 ff 00 ff 00 ff-00 00 00 00 00 00 00 00 ;modified xmm2
;--------- debug ---------
;lea edx,TamponVisu1 ;just here to view the result in the debugger
;movups [edx],xmm1
;lea edx,TamponVisu2
;movups [edx],xmm2
;-------- debug ------------------
;the ;ff 00 ff .... must be read right to left
;ecx and edx must be read left to right
pmovmskb ecx, xmm1 ;1010101 ;result for 13 in ecx
pmovmskb edx, xmm2 ;10101010 ;result for 10 in edx
;each 13 are now only a bit in ecx
;each 10 are now only a bit in edx
;---------- search for bits and there positions -------
;instructions are internal and very fast
;ecx and edx have now her bits set to 1 as the same position
.if retenue == 1
;the last 16 bytes loaded was ended by a 13,bit 15
;Test if bit zero is one (there is a 10)
bsf edi,edx ;search the 10
.if edi == 0 && edx != 0
btr edx,edi ;delet the bit find in edx,10
inc eax ;inc the number of lines
.endif
.endif
NewEntree:
bsf ebx,ecx ;give the position of one bit at 1
.if ebx == 0 && ecx != 0
;le bit zero est a 1
jmp bitzero13a1
.endif
.if ebx ;not zero ,on a un 13
bitzero13a1:
btr ecx,ebx ;efface le bit trouvé dans ecx
.if ebx != 15
inc ebx ;incrémente sa positon pour la comparer au 10
.elseif
mov retenue,1 ;le 10 va se trouver dans le chargement suivant
jmp NewSeriede16
;suite a voir
.endif
traiteretenue:
bsf edi,edx ;cherche le 0A
.if edi == 0 && edx != 0
;le bit zero des 10 est a 1
jmp bitzero10a1
.endif
.if edi
bitzero10a1:
btr edx,edi ;efface le bit trouvé dans edx,10
.if ebx == edi ;un 10 suit immédiatement le 13
inc eax ;////////// inc the number of lines /////////////////
.elseif ebx >edi
;this 10 is alone
;c'est un 10 isolé avant le 13 ,HTML
mov edi,0
jmp traiteretenue
.endif
.endif
;------ conditions de rebouclage ---------------
.if ecx != 0
mov ebx,0 ;avoid an undeterminated state
mov edi,0
jmp NewEntree
.endif
.endif
NewSeriede16:
mov edx,pmem
add edx,16
mov pmem,edx
.if edx < theEnd
jmp NewBloc
.endif
FindeCompteurLignes:
ret
CompteurLignes endp
Pas mal mais on pourrait optimiser un peu :wink
QuoteCounting lines of \masm32\include\windows.inc (P4):
markl_CountFileLines (http://www.masm32.com/board/index.php?topic=5434.msg40666#msg40666):
515431 cycles for 22272 lines, 849759 bytes
CompteurLignes:
5803740 cycles for 22272 lines, 849759 bytes
[attachment deleted by admin]
The one of Markl dont do the same job.
I need to be certain that a 13,10 is here,not only a 10,not only a 13.
This grant access to html and rtf text files without problems.
I have made further applications that use them.
The Markl just read the 0a,not enough for me.
C:>set path=\masm32\bin
C:>ml /c /coff CountLinesSSE2.asm
Microsoft (R) Macro Assembler Version 6.14.8444
Copyright (C) Microsoft Corp 1981-1997. All rights reserved.
Assembling: CountLinesSSE2.asm
CountLinesSSE2.asm(3) : fatal error A1000: cannot open file : \masm32\macros\tim
ers.asm
C:>pause
Press any key to continue . . .
---------------------------------------------------
I needs your file.
Thanks.
Quote from: UtillMasm on March 18, 2009, 12:56:44 PM
CountLinesSSE2.asm(3) : fatal error A1000: cannot open file : \masm32\macros\timers.asm
---------------------------------------------------
I needs your file.
Thanks.
Sorry, I thought everybody here in the forum has them :green
See code timing macros (downloaded 1226 times) (http://www.masm32.com/board/index.php?topic=770.msg5281#msg5281)
谢谢你!
不客氣!
I need also a good explain on how the algo of markl work,timer.asm is on the masm 10 package.
I will make a little experiments and made a new post
Quote from: ToutEnMasm on March 18, 2009, 12:49:32 PM
The one of Markl dont do the same job.
I need to be certain that a 13,10 is here,not only a 10,not only a 13.
This grant access to html and rtf text files without problems.
In rtf files, a CrLf is defined as
\pard or
\parIn html files, a CrLf is defined as
<p> or
<br>Can you explain how your code counts lines in these files?
I have made another compile of the optimize sample of jj2007 (thanks for it) with a minimal requirement and a little secure (don't find ..).
How my algo works,simple
Load 16 bytes,xmm1
load 16 bytes , xmm2 the same 16 bytes
The mask of 13(xmm6) is apply to xmm1
the mask of 10 (xmm7) is apply to xmm2
then each masked value is put in 32 bits register, 1 byte to FF = 1 bit to 1 at same position
Then we have ecx,edx filled with the result
bsf search the first bit at 1 and return his position
btr destroy this bit that we have counted
the test is position 13 ;we find a bit a 1 in ecx
position + 1 must be a 10
If it is eax = eax + 1
we must take care that a 13 can be the 15 byte in xmm1 or the 15 bit in ecx
in this case we must find the 10 in the next bloc of 16 bytes retenue=1
continue until no mor bit a 1 in ecx
Special bit 0 a 1 return 0 ;it is is rank , if it was bit 15 ,return 15
[attachment deleted by admin]
I run the EXE file, and crashed.
WinVista SP1 32bit.
To UtillMasm,
Which exe ?
There is mine countli.exe who answer "couldn't find" in XP and the one of jj2007 who isn't protected ?
Quote from: ToutEnMasm on March 18, 2009, 03:39:49 PM
we must take care that a 13 can be the 15 byte in xmm1 or the 15 bit in ecx
in this case we must find the 10 in the next bloc of 16 bytes retenue=1
movups xmm1,[edx] ;load 16 bytes, e..g
movups xmm2,[edx+1] ;charger (presque) les mêmes 16 bytes ;-)
Ascii 10 follows Ascii 13...
Fran-glish
Quote
we must take care that the 15° bit couldn't be follow by 16° bit
The 16° bit don't exist so:
in this case we must find the 10 in the next bloc of 16 bytes retenue=1
But i have found something more simple and faster to do the same thing:
Quote
NewBloc:
movups xmm1,[edx] ;load 16 bytes
movups xmm2,[edx+1] ;decale of 1
pcmpeqb xmm1,xmm6 ;13 ,xmm1 modified
pcmpeqb xmm2,xmm7 ;10 ,xmm2 modified
pand xmm1,xmm2 ;et logique FF et 00 = 0,result in xmm1
pmovmskb ecx, xmm1 ; result for 13,10 in ecx
@@:
mov edi,0 ;take care with undeterminate state
bsf edi,ecx ;search the 13,10
.if edi == 0 && ecx != 0
btr ecx,edi ;delet the bit find in edx,10
inc eax ;inc the number of lines
.elseif edi != 0
btr ecx,edi ;delet the bit find in edx,10
inc eax
.endif
.if ecx != 0
jmp @B
.endif
;3 702 880 cycles for 22274 lines, 849788 bytes
If i had an instruction that can count the number of bit to 1 in ecx,i can be more faster.
With a few number of lines,i have a crash with markl_CountFileLines
This one take 1024 bytes in one pass,me only 16
Quote from: ToutEnMasm on March 18, 2009, 07:12:29 PM
But i have found something more simple
movups xmm1,[edx] ;load 16 bytes
movups xmm2,[edx+1] ;decale of 1
Congrats :bg
Quote
If i had an instruction that can count the number of bit to 1 in ecx,i can be more faster.
You need the popcount instruction (http://popcnt.org/2007/09/magic-popcount-popcnt-command.html)
Thanks,
now
;CompteurLignes:
;3615541 cycles for 22274 lines, 849788 bytes
;Scrutation esi edi:
;5727865 cycles for 22274 lines, 849788 bytes
CountLi.exe Crash
@@:
pxor xmm5, xmm5
;i think it is the unrolling that is hosing things.
i = 0
WHILE i LT (16*128)
movdqu xmm0, [edx + i + 0]
;Memory access violation
ECX 00000001
movdqu xmm1, [edx + i + 16]
movdqu xmm2, [edx + i + 32]
movdqu xmm3, [edx + i + 48]
pcmpeqb xmm0, xmm7
pcmpeqb xmm1, xmm7
pcmpeqb xmm2, xmm7
pcmpeqb xmm3, xmm7
paddb xmm0, xmm1
paddb xmm2, xmm3
paddb xmm0, xmm2
; can't you do this step outside the loop? I am pretty sure you can.
psubb xmm5, xmm0 ; total 128*8 max = 1K
i=i+16*4
ENDM
; unpack MM5 to get sum in 1K block
pxor xmm0, xmm0
psadbw xmm5, xmm0
paddd xmm6, xmm5
dec ecx
lea edx, [edx + 128*16]
jne @B
ToutEnMasm,
You can try my code too: :wink
option prologue:none
option epilogue:none
align 16
db 8Dh, 0A4h, 24h, 0,0,0,0, 8Dh, 0A4h, 24h, 0, 0, 0,0,0
getlines proc pBuffer : dword, nSize:dword
pop ecx ; ecx -> return address
mov edx, 0a0a0a0ah ; 0Ah->second byte to looking for
pop eax ; eax->pBuffer-> eax->16 bytes aligned !!!ÿ
movd xmm0, edx ; edx = 0a0a0a0ah
mov [esp-4],esi ; preserve esi register
pop esi ; esi -> size of buffer
mov [esp-4], ebx ; preserve ebx register
add esi, eax ;
pshufd xmm0, xmm0, 0 ; xmm0=0a0a0a0a0a0a0a.....
movdqa xmm1, xmm0 ; xmm1=xmm0=0a0a0a0a0a0a0a.....
xor ebx, ebx ; ebx->counter=0
;Fast part - 1-> looking for 0Ah simbol in next 16 bytes
LoopA:
cmp eax, esi ; Is it End of Buffer?
jg ExLoop ; Yes, exit
pcmpeqb xmm0, [eax] ; [eax]->16bytes aligned!!!
add eax, 16 ; next qword
pmovmskb ecx, xmm0 ; test mask for zero
movdqa xmm0, xmm1 ; xmm0=xmm1=0a0a0a0a0a0a0a.....
test ecx, ecx ; if zero loop again
jnz Loli
;Fast part - 2
cmp eax, esi ; Is it End of Buffer?
jg ExLoop ; Yes, exit
pcmpeqb xmm0, [eax] ; [eax]->16bytes aligned!!!
add eax, 16 ; next qword
pmovmskb ecx, xmm0 ; test mask for zero
movdqa xmm0, xmm1 ; xmm0=xmm1=0a0a0a0a0a0a0a.....
test ecx, ecx ; if zero loop again
jz LoopA ; if not zero found 0Ah simbol
;Slow part-2
bsf edx, ecx
cmp word ptr [edx+eax-17], 0A0Dh
jne @f
add ebx, 1 ; inc counter
@@:
btr ecx, edx ; clear bit
test ecx, ecx ; is there more bits?
je LoopA ; no, loop again
bsf edx, ecx ; Yes, more...
cmp word ptr [edx+eax-17], 0A0Dh
jne @b
add ebx, 1 ; inc counter
jne @b
align 16
;Slow part-1
Loli:
bsf edx, ecx
cmp word ptr [edx+eax-17], 0A0Dh
jne @f
add ebx, 1 ; inc counter
@@:
btr ecx, edx ; clear bit
test ecx, ecx ; is there more bits?
je LoopA ; no, loop again
bsf edx, ecx ; Yes, more...
cmp word ptr [edx+eax-17], 0A0Dh
jne @b
add ebx, 1 ; inc counter
jne @b
;End
align 16
ExLoop:
mov esi, [esp-8]
mov eax, ebx
mov ebx, [esp-4]
jmp dword ptr [esp-12]
getlines endp
option prologue:prologuedef
option epilogue:epiloguedef
Contrary to what is frequently stated in the Campus in replies to noobs, there
are mindreaders among us. Lingo, how dare you steal my ideas :naughty:
At least, when you copy from your own code in other threads, do it properly:
Quotepmovmskb ecx, xmm0 ; test mask for zero 0a
Jokes apart, this is nice code, and getting faster
(EDIT: on a P4):
Counting lines of \masm32\include\windows.inc:
markl_CountFileLines (Mark Larson):
582 kilocycles for 22272 lines, 849759 bytes
getlines (Lingo):
1412 kilocycles for 22272 lines, 849759 bytes
CompteurLignes: (ToutEnMasm)
5361 kilocycles for 22272 lines, 849759 bytes
Unfortunately, I was not able to get ToutEnMasm's implementation of...
movups xmm1,[edx] ;load 16 bytes, e..g
movups xmm2,[edx+1] ;charger (presque) les mêmes 16 bytes ;-)
...running - see the "if 0 ; new version, to be completed" branch in attachment.
[attachment deleted by admin]
Best result i have is:
Quote
2 639 Kcycles for 22274 lines, 849788 bytes
Granted is no test crash
The memory isn't read outside the loaded file
Counted are the 13,10 (line return) ,no count is made of 10 alone or 13 alone
I have added a normal scrutation test with esi edi,gain of speed is about twice.
That is sure that without searching the couple 13,10 speed is better.
[attachment deleted by admin]
heh, never seen lingo get beat in a speed test before, but Mark Larson did write that optimization article so I guess it's not so shocking. I think Mark Larson is one of those quiet genesises, to where he doesn't post code very often but when he does it's golden. I have high hopes for agner too but haven't seen anything yet. :bg
this one is more risqued,but faster
It take the lingo method
pcmpeqb xmm1,[esi] ; dangerous,need 16 byte align
Quote
;1709746 cycles for 22274 lines, 849788 bytes
;align 16
CompteurLignes PROC uses ebx edi esi pmem:DWORD,taille:DWORD
Local valuemmx:QWORD
local theEnd:dword,GroupCount,decalage
Local retour:DWORD
;init of various register
mov GroupCount,0
mov decalage,0
lea eax,sse2entree
movups xmm6,[eax]
lea eax,sse2line
movups xmm7,[eax]
mov eax,taille ;the size is a multiple of 16
add eax,pmem ;
mov theEnd,eax
mov esi,pmem
mov eax,0 ;line counter
;ready
NewBloc:
movdqa xmm1,xmm6 ;charge 13
pcmpeqb xmm1,[esi] ;13 ,xmm1 modified
pmovmskb ecx, xmm1 ; result for 13,10 in ecx
test ecx,ecx ;no line ?,continue
jz suite
NewSeriede16:
bsf edx, ecx
cmp word ptr [edx+esi], 0A0Dh
jne @f
inc eax
@@:
btr ecx, edx ; clear bit
test ecx, ecx ; is there more bits?
je suite ; no, loop again
bsf edx, ecx ; Yes, more...
cmp word ptr [edx+esi], 0A0Dh
jne @b
inc eax ;inc counter
jne @b
suite:
lea esi,[esi+16]
.if esi < theEnd
jmp NewBloc
.endif
FindeCompteurLignes:
E^cube,
"but Mark Larson did write that optimization article so I guess it's not so shocking"
1. this code is from bitRAKE:
http://www.asmcommunity.net/board/index.php?topic=13727.0
2. I wonder why it is included in the test program because it count just 0Ah bytes in the
buffer rather than 0D+0A bytes. It is easy to look for and count just 1 byte rather than 2 bytes
Quote from: lingo on March 19, 2009, 12:51:15 PM
E^cube,
"but Mark Larson did write that optimization article so I guess it's not so shocking"
1. this code is from bitRAKE:
http://www.asmcommunity.net/board/index.php?topic=13727.0
bitRAKE's full code (not yet SSE2) can be seen here (http://www.asmcommunity.net/board/index.php?topic=13727.msg106307#msg106307). Mark apparently adapted it to SSE2.
Quote
2. I wonder why it is included in the test program because it count just 0Ah bytes in the
buffer rather than 0D+0A bytes. It is easy to look for and count just 1 byte rather than 2 bytes
It's included because for all text files on Windows and Linux, it does the job. But that question has to be negotiated with the guy who started this thread :wink
EDIT: It's included for historical reasons. In fact, it fails miserably for winextra.inc, see here (http://www.masm32.com/board/index.php?topic=11061.msg81702#msg81702).
Hello,
Quote
It's included because for all text files on Windows and Linux, it does the job. But that question has to be negotiated with the guy who started this thread
Seems i recognize me,I have a tool like helphelp who write help index reading the title of html page,i have one another who read rtf file,another that read header files,perhaps i will write another tool that read another type of text files ....
Each text files have some special particularity that made it unreadable without a rule.
The rule is simple,a line break is 13,10 that is carridge return followed by a line feed or in c++ ..(i have forgotten how they call that).The format is given in hexadecimal or decimal 13,10=0dh,0ah
Like that we can read html,rtf,header files without bad suprise like stay in a middle of a line or in the midlle of a file.
I have forgotten linux that have is own format of text.
Speed isn't all,usefull code is also something to consider.
This one is to make a choice,dangerous or not dangerous,see nb of cycles
Quote
ALIGN16 equ
CompteurLignes PROC uses ebx edi esi pmem:DWORD,taille:DWORD
local theEnd:dword
Local retour:DWORD
;init of various register
lea eax,sse2entree
movups xmm6,[eax]
mov eax,taille ;the size is a multiple of 16
add eax,pmem ;
mov theEnd,eax
mov esi,pmem
mov eax,0 ;line counter
;ready
NewBloc:
IFDEF ALIGN16
;------ align 16 needed ----------
;1731187 cycles for 22274 lines
movdqa xmm1,xmm6 ;charge 13
pcmpeqb xmm1,[esi] ;13 ,xmm1 modified ,align 16
ELSE
;1828067 cycles for 22274 lines, mem align 16
;2312768 cycles for 22274 lines, mem non align
movdqu xmm1,[esi] ;charge 13
pcmpeqb xmm1,xmm6 ;13 ,xmm1 modified ,align 16
ENDIF
pmovmskb ecx, xmm1 ; result for 13,10 in ecx
NbLineBreak:
bsf edx, ecx
jz suite
.if word ptr [edx+esi] == 0A0Dh
inc eax
.endif
btr ecx, edx
jmp NbLineBreak
suite:
lea esi,[esi+16]
.if esi < theEnd
jmp NewBloc
.endif
FindeCompteurLignes:
ret
CompteurLignes endp
There is still some room for improvement :wink
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
Tests for correctness - 2*100 lines expected:
Mark Larson= - / - (throws exception)
jj2007= 100 / 100 lines
Lingo= 105 / 102 lines
ToutEnMasm= 105 / 102 lines
Counting lines of \masm32\include\windows.inc:
markl_CountFileLines (Mark Larson):
248 kilocycles for 22274 lines, 849788 bytes
getlinesJJ: (jj2007)
525 kilocycles for 22274 lines, 849788 bytes
getlines (Lingo):
683 kilocycles for 22274 lines, 849788 bytes
CompteurLignes: (ToutEnMasm)
1956 kilocycles for 22274 lines, 849788 bytes
[attachment deleted by admin]
a small related question, is the MMX instruction set present in all CPUs after its introduction ?
t y
I don't get the same results as jj2007 on my machine with the same executable
Quote
getlinesJJ: (jj2007)
1225 kilocycles for 22274 lines, 849788 bytes
getlines (Lingo):
1376 kilocycles for 22274 lines, 849788 bytes
Quote
CompteurLignes:
1378 Kcycles for 22274 lines, 849788 bytes
seems there is nothing really new this time
I have gain a little time ,just making that
compare is made on 32 bytes,instead of 16
align 16 of memory is made by globalAlloc and it is not necessary to relign it as getlinesJJ do.
The minimum size of memory must be 32 bytes or there is read memory outside the buffer
And memory allocation must rounded by 32 bytes
Quote
;1378 Kcycles
ALIGN16 equ
CompteurLignes PROC uses ebx edi esi pmem:DWORD,taille:DWORD
local theEnd:dword
Local retour:DWORD
;init of various register
lea eax,sse2entree
movups xmm6,[eax]
mov eax,taille ;the size is a multiple of 16
add eax,pmem ;
mov theEnd,eax
mov esi,pmem
mov eax,0 ;line counter
;ready
NewBloc:
IFDEF ALIGN16
;------ align 16 needed ----------
movdqa xmm1,xmm6 ;charge 13
movdqa xmm2,xmm6 ;charge 13
pcmpeqb xmm1,[esi] ;cmp with memory align 16
pcmpeqb xmm2,[esi+16] ;cmp with memory ,align 16
ELSE
movdqu xmm1,[esi] ;charge 13
movdqu xmm2,[esi+16] ;charge 13
pcmpeqb xmm1,xmm6 ;13 ,xmm1 modified ,align 16
pcmpeqb xmm2,xmm6 ;13 ,xmm1 modified ,align 16
ENDIF
pmovmskb ecx, xmm1 ; result in ecx
pmovmskb edx, xmm2 ; result +16 edx
shl edx,16
add ecx,edx
NbLineBreak:
bsf edx, ecx
jz suite
.if word ptr [edx+esi] == 0A0Dh
inc eax
.endif
btr ecx, edx
jmp NbLineBreak
suite:
lea esi,[esi+32]
.if esi < theEnd
jmp NewBloc
.endif
FindeCompteurLignes:
ret
CompteurLignes endp
Not looking at every post, I am wondering about the title (MMX) and jj (SSE2), but anyway...
Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz (SSE4)
Tests for correctness - 2*100 lines expected:
Mark Larson= - / - (throws exception)
jj2007= 100 / 100 lines
Lingo= 105 / 102 lines
ToutEnMasm= 105 / 102 lines
Counting lines of \masm32\include\windows.inc:
markl_CountFileLines (Mark Larson):
189 kilocycles for 22274 lines, 849788 bytes
getlinesJJ: (jj2007)
383 kilocycles for 22274 lines, 849788 bytes
getlines (Lingo):
493 kilocycles for 22274 lines, 849788 bytes
CompteurLignes: (ToutEnMasm)
1664 kilocycles for 22274 lines, 849788 bytes
edit: "there are mindreaders among us" ouch! that hits home jj. :bg
Quote from: ToutEnMasm on March 20, 2009, 08:12:39 AM
I have gain a little time ,just making that
compare is made on 32 bytes,instead of 16
align 16 of memory is made by globalAlloc and it is not necessary to relign it as getlinesJJ do.
The minimum size of memory must be 32 bytes or there is read memory outside the buffer
And memory allocation must rounded by 32 bytes
The timing looks better now, only 4% slower than mine on my old P4. However, with your (and Lingo's) code you must add one more condition:
- There must not be any remainders from previous files in the buffer.
In other words, you need a fresh buffer for each file. Otherwise, the line count will be wrong if the second file is shorter and there are some CrLf's left after the zero terminator, as shown in the gltestA and gltestB strings.
Re GlobalAlloc, MSDN (http://msdn.microsoft.com/en-us/library/aa366574(VS.85).aspx): Memory allocated with this function is guaranteed to be aligned on an
8-byte boundary.
Since your code needs 16-byte alignment, this means some extra work, i.e. you must:
- allocate more space than the file length requires
- align the pointer to 16-bytes before loading the file and
- keep a copy of the original pointer for GlobalFree.
My code does not require any of these conditions.
No problem with an eventual second file reloaded in the same memory.
I never reuse the same allocated memory for an another file and I put a zero at the end of the file.
i don't want to use this in another case . an outside read is only granted if the size of the meory is known.
That is not always the case.
I hope last version,
This one can be used anywhere,without risk of outside read and stupid crash
Quote
CompteurLignes PROC uses ebx edi esi pmem:DWORD,taille:DWORD
Local Nblines:DWORD,count,reste
local theEnd:dword
;init
mov Nblines,0
mov reste,0
mov edx,pmem
add edx,taille
mov theEnd,edx
mov edx,pmem
mov esi,edx
and edx,0Fh
.if edx != 0
;search lines in the non align memory
mov ecx,16
sub ecx,edx
@@:
.if byte ptr [esi] != 0
.if word ptr [esi] == 0A0Dh
inc Nblines
.endif
.else
mov eax,Nblines
jmp FindeCompteurLignes
.endif
inc esi
dec ecx
jnz @B
.endif
;esi point on a 16 aligned memory
;count the number of 32 bytes parts
mov edx,0
mov eax,theEnd
sub eax,esi
.if eax == 0
mov eax,Nblines
jmp FindeCompteurLignes
.endif
.if eax < 32
mov reste,eax
mov eax,Nblines
jmp EndNonaligned
.endif
mov edx,0
mov ecx,32
div ecx
mov count,eax
mov reste,edx
;-------------------------- search in aligned part -------------
;init of various register
mov eax, 0d0d0d0dh ; Ascii 10, linefeed
movd xmm6, eax
pshufd xmm6, xmm6, 0 ; linefeeds for comparison in xmm2
mov eax,Nblines ;line counter
;ready
NewBloc:
;------ align 16 needed ----------
;1731187 cycles for 22274 lines
movdqa xmm1,xmm6 ;charge 13
movdqa xmm2,xmm6 ;charge 13
pcmpeqb xmm1,[esi] ;cmp with memory align 16
pcmpeqb xmm2,[esi+16] ;cmp with memory ,align 16
pmovmskb ecx, xmm1 ; result in ecx
pmovmskb edx, xmm2 ; result +16 edx
shl edx,16
add ecx,edx
jz suite
NbLineBreak:
bsf edx, ecx
jz suite
.if word ptr [edx+esi] == 0A0Dh
inc eax
.endif
btr ecx, edx
jmp NbLineBreak
suite:
lea esi,[esi+32]
dec count
jnz NewBloc
EndNonaligned:
.if reste != 0
mov ecx,reste
@@:
.if word ptr [esi] == 0A0Dh
inc eax
.endif
inc esi
dec ecx
jnz @B
.endif
FindeCompteurLignes:
ret
CompteurLignes endp
Hi jj2007,
C:\ml /c /coff /nologo CountLinesSSE2.asm
Assembling: CountLinesSSE2.asm
CountLinesSSE2.asm(3) : fatal error A1000: cannot open file : \masm32\include\Cp
uId.inc
=====================
I need your file.
Help me!!!
Quote from: UtillMasm on March 20, 2009, 12:35:18 PM
fatal error A1000: cannot open file : \masm32\include\CpuId.inc
I need your file.
Attached, together with an update that includes ToutEnMasm's latest version (it works fine, congrats).
100
+2 means 2 malformed strings found, i.e. LF only.
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
Tests for correctness - 100+2/100+6 lines expected,
first string 5-byte misaligned:
Mark Larson= - / - (throws exception)
jj2007= 100+2 / 100+6 lines
Lingo= - / 102 lines
ToutEnMasm= 100 / 100 lines
Codesizes:
Mark Larson = 2104
getlinesJJ = 177
getlines Lingo = 191
CompteurLignes = 237
Counting lines of \masm32\include\windows.inc:
markl_CountFileLines (Mark Larson):
437 kilocycles for 22272 lines, 849759 bytes
getlinesJJ: (jj2007)
1120 kilocycles for 22272 lines, 849759 bytes
getlines (Lingo):
1306 kilocycles for 22272 lines, 849759 bytes
CompteurLignes: (ToutEnMasm)
1344 kilocycles for 22272 lines, 849759 bytes
[attachment deleted by admin]
Just for fun, I tried WinExtra.inc instead of Window.inc, and found one more bug.
Somewhat polished code attached. Both ToutEnMasm's and my version seem to work just fine.
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
Tests for correctness - 100 / 100 lines expected,
first string 5-byte misaligned:
Mark Larson= --- / --- (throws exception)
jj2007= 100 / 100 lines
Lingo= --- / 102 lines
ToutEnMasm= 100 / 100 lines
Codesizes:
Mark Larson = 2104
getlinesJJ = 155
getlines Lingo = 191
CompteurLignes = 237
Counting lines of \masm32\include\winextra.inc:
markl_CountFileLines (Mark Larson):
372 kilocycles for 20001 lines, 807877 bytes <------------- INCORRECT COUNT ------
getlinesJJ: (jj2007)
903 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
1083 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
1120 kilocycles for 20025 lines, 807877 bytes
[attachment deleted by admin]
The more bad text format that i know,are headers files .H.
They are filled with extra caracters.
I have also made some tests
After search for speed,i have take a moment for a crash test.
test with a sznull db 0 ;len 1
test with this texte
Quote
Texte db 13,10,13,10,13,10,13,10
db " windowsinc FichMem <>",13,10
db "InfosFichiers WIN32_FIND_DATA <>",13,10
db 13,10
db " ;procéder à des essais sous surveillances",13,10
db " ;en cas d'exception,si la pile n'est pas détruite",13,10
db " ;------- ajouter des feuilles SDI,utiliser menu CODE --> créer SDI",13,10
db " invoke ChargerFichierMem,SADR(\masm32\include\windows.inc),addr windowsinc",13,10
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
code:
Quote
invoke lstrlen,addr Texte
mov edx,eax
invoke getlinesJJ,addr Texte,edx
My last version pass all tests without problem,i have take care of this.
Others need some little changes.
Quote from: ToutEnMasm on March 20, 2009, 04:19:23 PM
test with a sznull db 0 ;len 1
You mean len 0? No problem.
Quote
test with this texte
Quote
Texte db 13,10,13,10,13,10,13,10
db " windowsinc FichMem <>",13,10
db "InfosFichiers WIN32_FIND_DATA <>",13,10
db 13,10
db " ;procéder à des essais sous surveillances",13,10
db " ;en cas d'exception,si la pile n'est pas détruite",13,10
db " ;------- ajouter des feuilles SDI,utiliser menu CODE --> créer SDI",13,10
db " invoke ChargerFichierMem,SADR(\masm32\include\windows.inc),addr windowsinc",13,10
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
code:
Quote
invoke lstrlen,addr Texte
mov edx,eax
invoke getlinesJJ,addr Texte,edx
My last version pass all tests without problem,i have take care of this.
Others need some little changes.
Which are the others that need changes? Lingo's code crashes if it is not 16-byte aligned, but otherwise his code yields the same result as yours and mine:
Texte, getlinesJJ= 11 lines
Texte, getlines Lingo= 11 lines
Texte, CompteurLignes= 11 lines
Make more test adiing some data before
Quote
for the text:
TACK_TEXT:
0012ffb4 00401057 00404411 00000145 7c817067 minus!getlinesJJ+0x68 [F:\lignes\sse2.inc @ 318]
0012fff0 00000000 00401040 00000000 78746341 minus!start+0x17 [F:\lignes\minus.asm @ 83]
FAULTING_SOURCE_CODE:
314: pcmpeqb xmm0, [edi] ; compare packed bytes in [m128] and xmm0 for equality
315: pmovmskb edx, xmm0 ; set byte mask in edx for first 16 byte chunk
316:
317: movdqa xmm0, xmm2 ; linefeeds in xmm0 & xmm1
> 318: pcmpeqb xmm0, [edi+16] ; compare packed bytes in [m128] and xmm0 for equality
319: pmovmskb ecx, xmm0 ; set byte mask in edx for second 16 byte chunk
320:
321: lea edi, [edi+32] ; point to next chunk
322: cmp edi, esi ; test boundary
323: jae L1
For the sznull len 0 or 1 ,same trick
Quote
(d0c.c20): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000000 ebx=7ffd5000 ecx=00000000 edx=00000000 esi=00531047 edi=00132ff0
eip=00401b66 esp=0012ffa0 ebp=0012ffb4 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
*** WARNING: Unable to verify checksum for minus.exe
minus!getlinesJJ+0x68:
00401b66 660f744710 pcmpeqb xmm0,xmmword ptr [edi+10h] ds:0023:00133000=????
0:000> !analyze -v
FAULTING_IP:
minus!getlinesJJ+68 [F:\lignes\sse2.inc @ 318]
00401b66 660f744710 pcmpeqb xmm0,xmmword ptr [edi+10h]
EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 00401b66 (minus!getlinesJJ+0x00000068)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: 00133000
Attempt to read from address 00133000
FAULTING_THREAD: 00000c20
DEFAULT_BUCKET_ID: INVALID_POINTER_READ
PROCESS_NAME: minus.exe
ERROR_CODE: (NTSTATUS) 0xc0000005 - L'instruction "0x%08lx" emploie l'adresse m moire "0x%08lx". La m moire ne peut pas tre "%s".
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - L'instruction "0x%08lx" emploie l'adresse m moire "0x%08lx". La m moire ne peut pas tre "%s".
EXCEPTION_PARAMETER1: 00000000
EXCEPTION_PARAMETER2: 00133000
READ_ADDRESS: 00133000
FOLLOWUP_IP:
minus!getlinesJJ+68 [F:\lignes\sse2.inc @ 318]
00401b66 660f744710 pcmpeqb xmm0,xmmword ptr [edi+10h]
NTGLOBALFLAG: 70
APPLICATION_VERIFIER_FLAGS: 0
PRIMARY_PROBLEM_CLASS: INVALID_POINTER_READ
BUGCHECK_STR: APPLICATION_FAULT_INVALID_POINTER_READ
LAST_CONTROL_TRANSFER: from 00401057 to 00401b66
STACK_TEXT:
0012ffb4 00401057 00404410 00000000 7c817067 minus!getlinesJJ+0x68 [F:\lignes\sse2.inc @ 318]
0012fff0 00000000 00401040 00000000 78746341 minus!start+0x17 [F:\lignes\minus.asm @ 83]
FAULTING_SOURCE_CODE:
314: pcmpeqb xmm0, [edi] ; compare packed bytes in [m128] and xmm0 for equality
315: pmovmskb edx, xmm0 ; set byte mask in edx for first 16 byte chunk
316:
317: movdqa xmm0, xmm2 ; linefeeds in xmm0 & xmm1
> 318: pcmpeqb xmm0, [edi+16] ; compare packed bytes in [m128] and xmm0 for equality
319: pmovmskb ecx, xmm0 ; set byte mask in edx for second 16 byte chunk
320:
321: lea edi, [edi+32] ; point to next chunk
322: cmp edi, esi ; test boundary
323: jae L1
SYMBOL_STACK_INDEX: 0
SYMBOL_NAME: minus!getlinesJJ+68
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: minus
IMAGE_NAME: minus.exe
DEBUG_FLR_IMAGE_TIMESTAMP: 49c3d30d
STACK_COMMAND: ~0s ; kb
FAILURE_BUCKET_ID: INVALID_POINTER_READ_c0000005_minus.exe!getlinesJJ
BUCKET_ID: APPLICATION_FAULT_INVALID_POINTER_READ_minus!getlinesJJ+68
WATSON_STAGEONE_URL: http://watson.microsoft.com/StageOne/minus_exe/0_25_5_2005/49c3d30d/minus_exe/0_25_5_2005/49c3d30d/c0000005/00001b66.htm?Retriage=1
Followup: MachineOwner
---------
Quote from: ToutEnMasm on March 20, 2009, 05:36:13 PM
Make more test adiing some data before
Please make that test with the
current version posted some hours ago. Or post the string table, with alignment info, so that I can run it myself. Or even better, put source and executable into a zip file and post it here.
Found , I have a problem with the
Quote
OPTION PROLOGUE:none
OPTION EPILOGUE:none
Just a lost of time to write that.
I use only standard proc and it was not write.
Quote from: ToutEnMasm on March 20, 2009, 06:34:28 PM
Found , I have a problem with the
Quote
OPTION PROLOGUE:none
OPTION EPILOGUE:none
Just a lost of time to write that.
I use only standard proc and it was not write.
Yeah, I know it's a bad habit to remove the stack frame. But with such a simple algo, I couldn't resist. And it has its advantages, too :bg
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
Codesizes:
getlinesJJ = 155
CompteurLignes = 237
Counting lines of \masm32\include\winextra.inc:
getlinesJJ: (jj2007)
455 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
804 kilocycles for 20025 lines, 807877 bytes
If I see upper
Quote
getlinesJJ: (jj2007)
903 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
1083 kilocycles for 20025 lines, 807877 bytes
Two different machines that give so different result ?
Quote from: ToutEnMasm on March 20, 2009, 06:58:26 PM
If I see upper
Quote
getlinesJJ: (jj2007)
903 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
1083 kilocycles for 20025 lines, 807877 bytes
Two different machines that give so different result ?
Yes, that's not unusual. The P4 is a lot slower, and relative differences are smaller. My Celeron M runs getlinesJJ at 450, and Lingo's version at 600 kilocycles. It's a Core (not: Core 2) CPU. Lingo's AMD might favour his own algo again - in the szLen thread, his code was marginally (1%) slower for very long strings on my Celeron but significantly (20%+) faster on several other CPU's. These differences make optimisation increasingly difficult.
Hi jj2007:
The results on my computer.
ntel(R) Core(TM)2 Duo CPU E4600 @ 2.40GHz (SSE4)
Tests for correctness - 100+2/100+6 lines expected,
first string 5-byte misaligned:
Mark Larson= - / - (throws exception)
jj2007= 100+2 / 100+6 lines
Lingo= - / 102 lines
ToutEnMasm= 100 / 100 lines
Codesizes:
Mark Larson = 2104
getlinesJJ = 177
getlines Lingo = 191
CompteurLignes = 237
Counting lines of \masm32\include\windows.inc:
markl_CountFileLines (Mark Larson):
190 kilocycles for 22272 lines, 849759 bytes
getlinesJJ: (jj2007)
367 kilocycles for 22272 lines, 849759 bytes
getlines (Lingo):
498 kilocycles for 22272 lines, 849759 bytes
CompteurLignes: (ToutEnMasm)
653 kilocycles for 22272 lines, 849759 bytes
Hit any key
Regards herge
Thanks, Herge. Here the hopefully last version, with minor improvements in codesize and speed. I included the Texte string of ToutEnMasm, see / 11 lines.
The switch CheckBadLines = 0 can be used to test if malformed strings like "part A", LF, "part B" are present. Code size increases from 153 to 168 bytes, speed is identical.
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
Mark Larson= --- / --- (throws exception)
jj2007= 100 / 100 / 11 lines
Lingo= --- / 102 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
Codesizes:
Mark Larson = 2104
getlinesJJ = 153
getlines Lingo = 191
CompteurLignes = 237
Counting lines of \masm32\include\winextra.inc:
markl_CountFileLines (Mark Larson):
239 kilocycles for 20001 lines, 807877 bytes
getlinesJJ: (jj2007)
450 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
637 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
802 kilocycles for 20025 lines, 807877 bytes
EDIT: Make sure you launch the exe from the same drive as Masm32, otherwise it will throw an exception because it doesn't find winextra.inc
EDIT(2): Changed one line in CompteurLignes:
mov eax, theEnd
sub eax, esi
; .if eax == 0 ; threw exception for negative byte count
.if sdword ptr eax <= 0 ; foolproof ;-)
The exception happened when a file is not found, but read anyway (result: -1 bytes read), and the algo is being called anyway. No checking is certainly not a recommended way of doing things, but now at least the algo, rather than throwing an exception, returns 0 lines - which is kind of an error check, too.
[attachment deleted by admin]
Why are my results so different? I am afraid of the answer. :red
Quote
Intel(R) Celeron(R) CPU 1.70GHz (SSE2)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
jj2007= 100 / 100 / 11 lines
Lingo= --- / 102 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
Codesizes:
getlinesJJ = 153
getlines Lingo = 191
CompteurLignes = 238
Counting lines of \masm32\include\winextra.inc:
getlinesJJ: (jj2007)
1169 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
1187 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
1191 kilocycles for 20025 lines, 807877 bytes
Paul
Quote from: PBrennick on March 21, 2009, 09:49:53 PM
Why are my results so different? I am afraid of the answer. :red
Quote
Intel(R) Celeron(R) CPU 1.70GHz (SSE2)
Paul, I am not an expert in CPU architecture, it seems Intel changed something when intoducing SSE
3 - not relevant for the functioning of the code (SSE2 only), but it affects the speed. Compare to previous postings...
I edited the test program from "mov eax, alloc$(10000000)" to "mov eax, alloc$(100000000)"
to be enough for my copy of windows.inc and have a new numbers... :lol
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz (SSE4)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
jj2007= 100 / 100 / 11 lines
Lingo= 100 / 100 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
Codesizes:
getlinesJJ = 153
getlines Lingo = 217
CompteurLignes = 238
Counting lines of \masm32\include\windows.inc:
getlinesJJ: (jj2007)
517 kilocycles for 30762 lines, 1127718 bytes
getlines (Lingo):
451 kilocycles for 30762 lines, 1127718 bytes
CompteurLignes: (ToutEnMasm)
970 kilocycles for 30762 lines, 1127718 bytes
Hit any key
[attachment deleted by admin]
Thanks at all for help,
Now i have an application "cherche" (search in english) who need only 3 seconds to find a word in all the examples of masm32.
For example searching richedit:
result
Quote
F:\masm32\examples\exampl10\shuflarr\unique_riched\ectrl.asm
12 : szText EditMl,"RICHEDIT"
F:\masm32\examples\exampl10\shuflarr\unique_riched\idat.asm
1 : szDisplayName db "MASM32 Richedit",0
F:\masm32\examples\exampl09\maketbl\maketbl.asm
436 : fn CreateWindowEx,WS_EX_STATICEDGE,"RICHEDIT",0, \
F:\masm32\examples\exampl05\qeplugin\qeplugin.asm
22 : ; being read from the editor with richedit selection. If you need to
F:\masm32\examples\exampl06\regdemo\regdemo.asm
35 : RichEdit db 'RichEdit20A',0
228 : INVOKE CreateWindowEx, NULL, addr RichEdit, NULL,\
F:\masm32\examples\advanced\wrep\result.asm
7 : include Richedit.inc ; local includes for this file
11 : ; uncomment for richedit version 1 or
12 : ; comment out for richedit version 2
417 : szText RichEd,"MASM RichEdit"
580 : szText EditMl,"RICHEDIT"
F:\masm32\examples\exampl05\riched\richedit.asm
7 : include Richedit.inc ; local includes for this file
11 : ; uncomment for richedit version 1 or
12 : ; comment out for richedit version 2
417 : szText RichEd,"MASM RichEdit"
580 : szText EditMl,"RICHEDIT"
F:\masm32\examples\poasm\riched\richedit.asm
10 : include Richedit.inc ; local includes for this file
14 : ; uncomment for richedit version 1 or
15 : ; comment out for richedit version 2
409 : szText RichEd,"POASM RichEdit"
571 : szText EditMl,"RICHEDIT"
F:\masm32\examples\exampl10\shuflarr\unique_riched\richedit.asm
11 : include Richedit.inc ; local includes for this file
15 : ; uncomment for richedit version 1 or
16 : ; comment out for richedit version 2
424 : szText RichEd,"MASM RichEdit"
F:\masm32\examples\advanced\wrep\richedit.asm
7 : include Richedit.inc ; local includes for this file
11 : ; uncomment for richedit version 1 or
12 : ; comment out for richedit version 2
417 : szText RichEd,"MASM RichEdit"
580 : szText EditMl,"RICHEDIT"
F:\masm32\examples\poasm\riched\richedit.inc
81 : szDisplayName db "MASM32 Richedit",0
F:\masm32\examples\exampl05\riched\richedit.inc
82 : szDisplayName db "MASM32 Richedit",0
F:\masm32\examples\exampl06\treedemo\treedemo.asm
43 : RichEdit db 'RichEdit20A',0
223 : INVOKE CreateWindowEx, WS_EX_CLIENTEDGE, addr RichEdit, NULL,\
in all the header files of the sdk this need 10s
THIS MEAN , THAT THERE IS NOT ENOUGH SAMPLES IN MASM32
I think that you have included all header files in your SDK whereas windows.inc just includes the ones that are most used. I do not feel this is a limitation. Personally, I very seldom find anything lacking in windows.inc as it does include all that I need to do what I do. I feel this is the correct barometer for deciding whether the contents are sufficient or not. They are sufficient for me.
Paul
To PBrennick,
That's a question,but don't take my little provocation too much at the serious.
It was just to show how happy i am,to have this.
Hi lingo:
Results from my computer.
Intel(R) Core(TM)2 Duo CPU E4600 @ 2.40GHz (SSE4)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
jj2007= 100 / 100 / 11 lines
Lingo= 100 / 100 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
Codesizes:
getlinesJJ = 153
getlines Lingo = 217
CompteurLignes = 238
Counting lines of \masm32\include\windows.inc:
getlinesJJ: (jj2007)
386 kilocycles for 22272 lines, 849759 bytes
getlines (Lingo):
325 kilocycles for 22272 lines, 849759 bytes
CompteurLignes: (ToutEnMasm)
668 kilocycles for 22272 lines, 849759 bytes
Hit any key
Regards herge
Hi jj2007:
Yet again more results:
Intel(R) Core(TM)2 Duo CPU E4600 @ 2.40GHz (SSE4)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
jj2007= 100 / 100 / 11 lines
Lingo= --- / 102 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
Codesizes:
getlinesJJ = 153
getlines Lingo = 191
CompteurLignes = 238
Counting lines of \masm32\include\winextra.inc:
getlinesJJ: (jj2007)
315 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
413 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
618 kilocycles for 20025 lines, 807877 bytes
Hit any key
Regards herge
Can you run my proc too, i haven't tested it much...
.code
.mmx
; Notes:
; - PLAIN MMX, does not use SSE123
; - lpBuffer must be padded with exra 8 bytes
; - Does not handle nasty remainders! (simply zero out 8 bytes after length)
CountCRLF proc lpBuffer:ptr byte, ccLength:dword
mov eax,0A0D0A0Dh
pxor mm1,mm1
movd mm7,eax
pxor mm3,mm3
punpckldq mm7,mm7
mov eax,lpBuffer
mov edx,ccLength
xor ecx,ecx
.repeat
movq mm4,[eax+ecx]
movq mm5,[eax+ecx+1]
add ecx,8
pcmpeqw mm4,mm7
pcmpeqw mm5,mm7
paddsw mm5,mm4
pxor mm4,mm4
psubsw mm4,mm5
punpckldq mm5,mm4
paddsw mm5,mm4
punpckhwd mm5,mm3
punpckldq mm4,mm5
paddd mm5,mm4
paddd mm1,mm5
.until ecx >= edx
psrlq mm1,32
movd eax,mm1
ret
CountCRLF endp
drizz,
I like the fact that it is just .mmx, it will run on more systems.
Paul
Quote from: lingo on March 22, 2009, 03:21:36 PM
I edited the test program from "mov eax, alloc$(10000000)" to "mov eax, alloc$(100000000)"
The branch predictor slows your code down by 8 cycles, but otherwise I like it; a bit bloated but fast :bg.
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
jj2007= 100 / 100 / 11 lines
Lingo= 100 / 100 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
Codesizes:
getlinesJJ = 153
getlines Lingo = 217
CompteurLignes = 238
Counting lines of \masm32\include\winextra.inc:
getlinesJJ: (jj2007)
458 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
420 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
795 kilocycles for 20025 lines, 807877 bytes
[attachment deleted by admin]
"a bit bloated but fast"
I have more code with more speed in getlines2-SSE2.. :wink
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz (SSE4)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
jj2007= 100 / 100 / 11 lines
Lingo= 100 / 100 / 11 lines
Lingo2= 100 / 100 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
Codesizes:
getlinesJJ = 153
getlines Lingo = 217
getlines2 Lingo2 = 278
CompteurLignes = 238
Counting lines of \masm32\include\windows.inc:
getlinesJJ: (jj2007)
519 kilocycles for 30762 lines, 1127718 bytes
getlines (Lingo):
452 kilocycles for 30762 lines, 1127718 bytes
getlines2 (Lingo2):
439 kilocycles for 30762 lines, 1127718 bytes
CompteurLignes: (ToutEnMasm)
970 kilocycles for 30762 lines, 1127718 bytes
Hit any key
[attachment deleted by admin]
Quote from: lingo on March 22, 2009, 10:58:59 PM
"a loooot bloated but fast"
Counting lines of \masm32\include\winextra.inc:
getlinesJJ: (jj2007)
12760 kiloLAMPs, 1031 kilocycles for 20025 lines, 807877 bytes :dance:
getlines (Lingo):
14013 kiloLAMPs, 951 kilocycles for 20025 lines, 807877 bytes :naughty:
getlines2 (Lingo):
10946 kiloLAMPs, 656 kilocycles for 20025 lines, 807877 bytes :clap:
CompteurLignes: (ToutEnMasm)
17966 kiloLAMPs, 1164 kilocycles for 20025 lines, 807877 bytes :snooty:
LAMPs = Lean And Mean Points = cycles * sqrt(codesize)
(old politician's motto: if you can't win under these rules, just change the rules :bg)
EDIT: Changing the rules was not successful. Lingo wins this round :toothy
[attachment deleted by admin]
Interesting... WinXP 32-bit.
AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ (SSE3)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
jj2007= 100 / 100 / 11 lines
Lingo= 100 / 100 / 11 lines
Lingo2= 100 / 100 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
Codesizes:
getlinesJJ = 153
getlines Lingo = 217
getlines2 Lingo2 = 278
CompteurLignes = 238
Counting lines of \masm32\include\windows.inc:
getlinesJJ: (jj2007)
1087 kilocycles for 22274 lines, 849788 bytes
getlines (Lingo):
955 kilocycles for 22274 lines, 849788 bytes
getlines2 (Lingo2):
843 kilocycles for 22274 lines, 849788 bytes
CompteurLignes: (ToutEnMasm)
1143 kilocycles for 22274 lines, 849788 bytes
Ahh yes LAMPs, was just going to say that there was a name for the ratio of code size to execution speed. :bg
Oh, I see there is the place, where bloated algos were testing?
Can I join to the party with fat one of mine?
Nothing very interesting - just algo seems that it does not make funny errors on the badly formatted strings :green2
So, I ask to test this. Remember that program should be placed on drive where MASM32 has installed, because testing file is one from include files.
The difference of the algo is that it get lines delimiter as third parameter. So, it is support Windows/DOS/Unix text files. Also a lot of work in algo is just aligment stuff, multiple checks for good working with "badly formatted" strings. At least, it should work properly with badly formatted strings... maybe... :green2
So, not swear too much on spaghetti-style code :P
Here is timings:
Quote
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
jj2007= 99 / 100 / 11 lines
Lingo= 99 / 100 / 11 lines
Lingo2= 100 / 100 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
AxCountLines= 100 / 100 / 11 lines
Codesizes:
getlinesJJ = 153
getlines Lingo = 217
getlines Lingo2= 278
CompteurLignes = 238
AxCountLines = 262
Counting lines of \masm32\include\winextra.inc:
getlinesJJ: (jj2007)
13768 kiloLAMPs, 1113 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
15942 kiloLAMPs, 1082 kilocycles for 20025 lines, 807877 bytes
getlines2 (Lingo):
15038 kiloLAMPs, 901 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
19351 kiloLAMPs, 1254 kilocycles for 20025 lines, 807877 bytes
AxCountLines: 13991 kiloLAMPs, 864 kilocycles for 20025 lines, 807877 bytes
LAMPs = Lean And Mean Points = cycles * sqrt(codesize)
Program and sources in attached archive.
Used old testbed, since Jochen was made a big job for correctness testing.
prescott w/htt
getlinesJJ: (jj2007)
11828 kiloLAMPs, 956 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
13750 kiloLAMPs, 933 kilocycles for 20025 lines, 807877 bytes
getlines2 (Lingo):
10250 kiloLAMPs, 614 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
17835 kiloLAMPs, 1156 kilocycles for 20025 lines, 807877 bytes
AxCountLines:
5350 kiloLAMPs, 330 kilocycles for 20025 lines, 807877 bytes
i haven't seen the term "kilocycles" for years - brings back memories :P
Dammit what's my computer? :lol AMD Sempron 1800+
getlinesJJ: (jj2007)
12357 kiloLAMPs, 999 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
12094 kiloLAMPs, 821 kilocycles for 20025 lines, 807877 bytes
getlines2 (Lingo):
12362 kiloLAMPs, 741 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
15156 kiloLAMPs, 982 kilocycles for 20025 lines, 807877 bytes
AxCountLines: 10955 kiloLAMPs, 676 kilocycles for 20025 lines, 807877 bytes
LAMPs = Lean And Mean Points = cycles * sqrt(codesize)
Crashes with an access violation in win7pro x64 at 0000106b.
Prescott P4
getlinesJJ: (jj2007)
12348 kiloLAMPs, 998 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
14325 kiloLAMPs, 972 kilocycles for 20025 lines, 807877 bytes
getlines2 (Lingo):
10848 kiloLAMPs, 650 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
17664 kiloLAMPs, 1145 kilocycles for 20025 lines, 807877 bytes
AxCountLines:
5795 kiloLAMPs, 358 kilocycles for 20025 lines, 807877 bytes
Atom N270
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
jj2007= 99 / 100 / 11 lines
Lingo= 99 / 100 / 11 lines
Lingo2= 100 / 100 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
AxCountLines= 100 / 100 / 11 lines
Codesizes:
getlinesJJ = 153
getlines Lingo = 217
getlines Lingo2= 278
CompteurLignes = 238
AxCountLines = 262
Counting lines of \masm32\include\winextra.inc:
getlinesJJ: (jj2007)
16757 kiloLAMPs, 1354 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
14778 kiloLAMPs, 1003 kilocycles for 20025 lines, 807877 bytes
getlines2 (Lingo):
13690 kiloLAMPs, 821 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
20978 kiloLAMPs, 1359 kilocycles for 20025 lines, 807877 bytes
AxCountLines: 9275 kiloLAMPs, 573 kilocycles for 20025 lines, 807877 bytes
LAMPs = Lean And Mean Points = cycles * sqrt(codesize)
Alex,
I get an access violation in AxCountLines at line 121: 0xC0000005: Access violation reading location 0xffffffff.
This is the instruction: pshufd xmm0,[esp+16],0
I am running Windows Vista 32-bit on this laptop.
pshufd needs 16-byte alignment. Otherwise it's a fantastic algo:
Celeron M
getlinesJJ: (jj2007)
5555 kiloLAMPs, 449 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
6237 kiloLAMPs, 423 kilocycles for 20025 lines, 807877 bytes
getlines2 (Lingo):
8086 kiloLAMPs, 485 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
15304 kiloLAMPs, 992 kilocycles for 20025 lines, 807877 bytes
AxCountLines: 5397 kiloLAMPs, 333 kilocycles for 20025 lines, 807877 bytes
test this:
db "aaaaaaaaaaaaaaa",13,10,"aaaaaaaaaaaaaaa",0
Quote from: drizz on December 17, 2010, 11:59:36 PM
test this:
db "aaaaaaaaaaaaaaa",13,10,"aaaaaaaaaaaaaaa",0
I tried this, as aligned and misaligned, and lines count is 1. What you want to say?
Well my question is :
Is this "db 'a',10,'b',10,0" two lines or zero?
Quote from: jj2007 on December 17, 2010, 09:53:37 PM
pshufd needs 16-byte alignment. Otherwise it's a fantastic algo:
Yes, that's show what make a hurry anytime :green2
Quote from: drizz on December 18, 2010, 12:34:10 AM
Well my question is :
Is this "db 'a',10,'b',10,0" two lines or zero?
Well, my answer is: algo support line dilimiter specification. If you know that text in Unix format, then just specify LF as delimiters.
Quote
invoke AxCountLines, offset gltestA, len(offset gltestA),0d0d0d0dh
Maybe LFs as default delimiters is better choice.
Yes that's the point, it checks for either 13 or 10, not 13+10 pair. As far as i can see only getlines2 algo does this.
The other things, I suspect, with short unaligned lines code will not exist after first unaligned (really - aligned, but fixed) read. I'll fix this to addition to unaligned PSHUFD.
Quote from: drizz on December 18, 2010, 12:47:34 AM
Yes that's the point, it checks for either 13 or 10, not 13+10 pair. As far as i can see only getlines2 algo does this.
Well, I guess it is reliable to rely, for example on LF as only dilimiter. Because that doesn't make any matter, if data specially crafted to be wrong - nothing will prevent this.
If make CR+LF as line dilimiter, then unix-style line feed would not supported. Further, Unicode text files have tendency to use only one (LF or CR) char as line feed (I talk about Unicode with cosideration of some small changements). So, I'll use my algo :P
In short, other's variations with checking only one type of line feed (and mine variation), is much more flexible... and fast :green2
As you can see, my current algo, its inner loop, is limited by system bus / memory subsystem bandwidth only, it seems.
Don't get me wrong i like your algo :wink
Many thanks to all of the peoples who run the test! :bg
Quote from: drizz on December 18, 2010, 01:01:49 AM
Don't get me wrong i like your algo :wink
No problems :bg
By the way, it is uses popcnt algo posted by you :bg
In general, this algo can be implemented as pop-counting algo. Initially I have implemented it that (with LUT), in MMX.
Just needed to get pop-count of reg after PMOVMSKB to that reg.
Also is "pand" really neccessary? can't you do:
pxor xmm0,xmm0
@@:
movdqu xmm2,[....]
pcmpeqb xmm2,xmm1 ; xmm1=0a0a0a....
paddb xmm0,xmm2; allways negative
loop @@
pxor xmm2,xmm2
pxor xmm3,xmm3
psubb xmm2,xmm0;<- change sign
psadbw xmm2,xmm3; add
Quote from: drizz on December 18, 2010, 01:27:31 AM
Also is "pand" really neccessary? can't you do:
pxor xmm0,xmm0
@@:
movdqu xmm2,[....]
pcmpeqb xmm2,xmm1 ; xmm1=0a0a0a....
paddb xmm0,xmm2; allways negative
loop @@
pxor xmm2,xmm2
pxor xmm3,xmm3
psubb xmm2,xmm0;<- change sign
psadbw xmm2,xmm3; add
Oh, I've not notice that thread was updated...
That's good solution. It is possible make something like this:
psubb xmm3,xmm1
jnz @B
instead of
pand xmm1,xmm5
paddd xmm3,xmm1
jnz @B
As I said already - this is first rough version which is seemed as properly worked with unaligned strings, and properly check the tail of the strings.
I'm wonder how fast it would be - at this moment only memory bus is a really limeter as I see. With program prefetching uncommented, it is only slightly faster on my system.
Hi!
Here is update of code.
Fixed XMM PSHUFD for a lines dilimiter, and replaced with GPR PSHUFB :lol - thus, now is not needed to pass dilimiter as fully
specified DWORD (i.e. 0d0d0d0dh), needed only pass char code casted to DWORD. I.e., just:
invoke AxCountLines, offset gltestA, len(offset gltestA),13 ; or 10
Also fixed issue with small unaligned string, when the length of the string + misalignment factor is smaller that 16 bytes.
Also in inner loop removed and, and (pun :) replaced with reversed SUB. I done this in slightly different manner, but Drizz draws attention to needless of AND :wink
All other thins is changed only in case of dependency with fixed issues, or for alignment reasons.
Please test this fixed one.
Here is my results:
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
jj2007= 99 / 100 / 11 lines
Lingo= 99 / 100 / 11 lines
Lingo2= 100 / 100 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
AxCountLines= 100 / 100 / 11 lines
Codesizes:
getlinesJJ = 153
getlines Lingo = 217
getlines Lingo2= 278
CompteurLignes = 238
AxCountLines = 262
Counting lines of \masm32\include\winextra.inc:
getlinesJJ: (jj2007)
13741 kiloLAMPs, 1110 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
15995 kiloLAMPs, 1085 kilocycles for 20025 lines, 807877 bytes
getlines2 (Lingo):
15041 kiloLAMPs, 902 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
19366 kiloLAMPs, 1255 kilocycles for 20025 lines, 807877 bytes
AxCountLines:
13708 kiloLAMPs, 846 kilocycles for 20025 lines, 807877 bytes
LAMPs = Lean And Mean Points = cycles * sqrt(codesize)
Program in attached archive.
if i run it on a different drive, i get c0000005
if i run it on my masm32 drive...
prescott w/htt
getlinesJJ: (jj2007)
11646 kiloLAMPs, 941 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
14042 kiloLAMPs, 953 kilocycles for 20025 lines, 807877 bytes
getlines2 (Lingo):
10848 kiloLAMPs, 650 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
18985 kiloLAMPs, 1230 kilocycles for 20025 lines, 807877 bytes
AxCountLines:
4660 kiloLAMPs, 287 kilocycles for 20025 lines, 807877 bytes
An old subject,Here is the algo i use in many prog:
pmem = pointer on memory block with lines (13,10 ended and not only 10 )
Taille = size of memory block
return eax = Number of lines ended by 13,10 + One line (if exist) not ended by 13,10
Quote
CompteurLignes PROC uses ebx edi esi pmem:DWORD,taille:DWORD
Local Nblines:DWORD,count,reste
local theEnd:dword
;init
mov Nblines,0
mov reste,0
mov edx,pmem
add edx,taille
mov theEnd,edx
mov edx,pmem
mov esi,edx
and edx,0Fh
.if edx != 0
;search lines in the non align memory
mov ecx,16
sub ecx,edx
@@:
.if byte ptr [esi] != 0
.if word ptr [esi] == 0A0Dh
inc Nblines
.endif
.else
mov eax,Nblines
jmp FindeCompteurLignes
.endif
inc esi
dec ecx
jnz @B
.endif
;esi point on a 16 aligned memory
;count the number of 32 bytes parts
mov edx,0
mov eax,theEnd
sub eax,esi
.if eax == 0
mov eax,Nblines
jmp FindeCompteurLignes
.endif
.if eax < 32
mov reste,eax
mov eax,Nblines
jmp EndNonaligned
.endif
mov edx,0
mov ecx,32
div ecx
mov count,eax
mov reste,edx
;-------------------------- search in aligned part -------------
;init of various register
mov eax, 0d0d0d0dh ; Ascii 10, linefeed
movd xmm6, eax
pshufd xmm6, xmm6, 0 ; linefeeds for comparison in xmm2
mov eax,Nblines ;line counter
;ready
NewBloc:
;------ align 16 needed ----------
;1731187 cycles for 22274 lines
movdqa xmm1,xmm6 ;charge 13
movdqa xmm2,xmm6 ;charge 13
pcmpeqb xmm1,[esi] ;cmp with memory align 16
pcmpeqb xmm2,[esi+16] ;cmp with memory ,align 16
pmovmskb ecx, xmm1 ; result in ecx
pmovmskb edx, xmm2 ; result +16 edx
shl edx,16
add ecx,edx
jz suite
NbLineBreak:
bsf edx, ecx
jz suite
.if word ptr [edx+esi] == 0A0Dh
inc eax
.endif
btr ecx, edx
jmp NbLineBreak
suite:
lea esi,[esi+32]
dec count
jnz NewBloc
EndNonaligned:
.if reste != 0
mov ecx,reste
@@:
.if word ptr [esi] == 0A0Dh
inc eax
.endif
inc esi
dec ecx
jnz @B
.endif
FindeCompteurLignes:
ret
CompteurLignes endp
Should really check if file exists before crashing (division by 0?)
Atom N450 (1.66 GHz, 512KB L1)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
jj2007= 99 / 100 / 11 lines
Lingo= 99 / 100 / 11 lines
Lingo2= 100 / 100 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
AxCountLines= 100 / 100 / 11 lines
Codesizes:
getlinesJJ = 153
getlines Lingo = 217
getlines Lingo2= 278
CompteurLignes = 238
AxCountLines = 262
Counting lines of \masm32\include\winextra.inc:
getlinesJJ: (jj2007)
16686 kiloLAMPs, 1349 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
15128 kiloLAMPs, 1026 kilocycles for 20025 lines, 807877 bytes
getlines2 (Lingo):
15152 kiloLAMPs, 908 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
23702 kiloLAMPs, 1536 kilocycles for 20025 lines, 807877 bytes
AxCountLines:
9001 kiloLAMPs, 556 kilocycles for 20025 lines, 807877 bytes
LAMPs = Lean And Mean Points = cycles * sqrt(codesize)
no Clive - i get access violation - not div/0 :P
Quote from: clive on December 18, 2010, 01:21:29 PM
Should really check if file exists before crashing (division by 0?)
The algo itself is not checking for file - it is not allocate memory and read file. This is done in code of main function, which is not mine. Probably, I guess, for simple testbed crashing is good enough way to say that something is wrong :lol
Quote from: ToutEnMasm on December 18, 2010, 01:10:56 PM
An old subject,Here is the algo i use in many prog:
pmem = pointer on memory block with lines (13,10 ended and not only 10 )
Want some more options?
Quote Recall "\masm32\include\winextra.inc", MyRec$()
mov lc, Min(eax, 20)
Print Str$("%i lines found", lc)
For_ n=0 To lc-1
Print Str$("\nRec %i\t", n)
Print Left$(MyRec$(n), 50)
Next
Recall "MyFile.csv", MyRec$(), csv ; loads a spreadsheet in comma separated values format
Recall "MyFile.txt", MyRec$(), tab ; loads a spreadsheet in tab-delimited format
Rem[/color] - returns lines in eax (null if an error occurred), and the number of total bytes read in edx
- see Store below for saving arrays
- with the spreadsheet variants, single cells can be accessed via Let My$=MyRec$(row, column);
Win7 pro x64, quad q6600
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
jj2007= 99 / 100 / 11 lines
Lingo= 99 / 100 / 11 lines
Lingo2= 100 / 100 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
AxCountLines= 100 / 100 / 11 lines
Codesizes:
getlinesJJ = 153
getlines Lingo = 217
getlines Lingo2= 278
CompteurLignes = 238
AxCountLines = 262
Counting lines of \masm32\include\winextra.inc:
getlinesJJ: (jj2007)
3874 kiloLAMPs, 313 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
4005 kiloLAMPs, 271 kilocycles for 20025 lines, 807877 bytes
getlines2 (Lingo):
5466 kiloLAMPs, 327 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
12880 kiloLAMPs, 834 kilocycles for 20025 lines, 807877 bytes
AxCountLines:
2358 kiloLAMPs, 145 kilocycles for 20025 lines, 807877 bytes
LAMPs = Lean And Mean Points = cycles * sqrt(codesize)
Quote from: sinsi on December 18, 2010, 11:52:46 PM
Win7 pro x64, quad q6600
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
jj2007= 99 / 100 / 11 lines
Lingo= 99 / 100 / 11 lines
Lingo2= 100 / 100 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
AxCountLines= 100 / 100 / 11 lines
Codesizes:
getlinesJJ = 153
getlines Lingo = 217
getlines Lingo2= 278
CompteurLignes = 238
AxCountLines = 262
Counting lines of \masm32\include\winextra.inc:
getlinesJJ: (jj2007)
3874 kiloLAMPs, 313 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
4005 kiloLAMPs, 271 kilocycles for 20025 lines, 807877 bytes
getlines2 (Lingo):
5466 kiloLAMPs, 327 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
12880 kiloLAMPs, 834 kilocycles for 20025 lines, 807877 bytes
AxCountLines:
2358 kiloLAMPs, 145 kilocycles for 20025 lines, 807877 bytes
LAMPs = Lean And Mean Points = cycles * sqrt(codesize)
Thank you!
Intel(R) Core(TM)2 Duo CPU T5750 @ 2.00GHz
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
jj2007= 99 / 100 / 11 lines
Lingo= 99 / 100 / 11 lines
Lingo2= 100 / 100 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
AxCountLines= 100 / 100 / 11 lines
Codesizes:
getlinesJJ = 153
getlines Lingo = 217
getlines Lingo2= 278
CompteurLignes = 238
AxCountLines = 262
Counting lines of \masm32\include\winextra.inc:
getlinesJJ: (jj2007)
3989 kiloLAMPs, 322 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
4047 kiloLAMPs, 274 kilocycles for 20025 lines, 807877 bytes
getlines2 (Lingo):
5529 kiloLAMPs, 331 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
10623 kiloLAMPs, 688 kilocycles for 20025 lines, 807877 bytes
AxCountLines:
2379 kiloLAMPs, 146 kilocycles for 20025 lines, 807877 bytes
LAMPs = Lean And Mean Points = cycles * sqrt(codesize)
Win7/64 Ultimate/Core Duo 2 - 2.6 Ghz:
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:
jj2007= 99 / 100 / 11 lines
Lingo= 99 / 100 / 11 lines
Lingo2= 100 / 100 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines
AxCountLines= 100 / 100 / 11 lines
Codesizes:
getlinesJJ = 153
getlines Lingo = 217
getlines Lingo2= 278
CompteurLignes = 238
AxCountLines = 262
Counting lines of \masm32\include\winextra.inc:
getlinesJJ: (jj2007)
3874 kiloLAMPs, 313 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
4006 kiloLAMPs, 271 kilocycles for 20025 lines, 807877 bytes
getlines2 (Lingo):
5449 kiloLAMPs, 326 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
12865 kiloLAMPs, 833 kilocycles for 20025 lines, 807877 bytes
AxCountLines:
2358 kiloLAMPs, 145 kilocycles for 20025 lines, 807877 bytes
LAMPs = Lean And Mean Points = cycles * sqrt(codesize)
Thanks Greg! Thanks Frank!
getlinesJJ: (jj2007)
12431 kiloLAMPs, 1004 kilocycles for 20025 lines, 807877 bytes
getlines (Lingo):
12121 kiloLAMPs, 822 kilocycles for 20025 lines, 807877 bytes
getlines2 (Lingo):
12332 kiloLAMPs, 739 kilocycles for 20025 lines, 807877 bytes
CompteurLignes: (ToutEnMasm)
15094 kiloLAMPs, 978 kilocycles for 20025 lines, 807877 bytes
AxCountLines:
9739 kiloLAMPs, 601 kilocycles for 20025 lines, 807877 bytes
Thanks, Peter!
"An old subject,Here is the algo i use in many prog:
pmem = pointer on memory block with lines (13,10 ended and not only 10 )"
ToutEnMasm,
As you see nobody read your post carefully and didn't check the "Tubeteikin's" code
to catch the fraud: This asian lamer compares apples with oranges coz "his" lame code counts just 1 byte (0dh)
against other's algos which counts MANDATORY 2 bytes (13,10). Hence, "his" code is not for this thread! :naughty:
Quote from: lingo on December 19, 2010, 05:21:37 AM
"An old subject,Here is the algo i use in many prog:
pmem = pointer on memory block with lines (13,10 ended and not only 10 )"
ToutEnMasm,
As you see nobody read your post carefully and didn't check the "Tubeteikin's" code
to catch the fraud: This asian lamer compares apples with oranges coz "his" lame code counts just 1 byte (0dh)
against other's algos which counts MANDATORY 2 bytes (13,10). Hence, "his" code is not for this thread! :naughty:
You are contrary to yourself: "http://www.masm32.com/board/index.php?topic=14438.msg116374#msg116374"
If data specially prepared to be wrong, nothing will prevent this, only big slow checking code. As you like to say: this is fast code without checking. :P
"
I don't care because these are just a speed optimized functions "
No, just as usually, you, lingo, is a lamer which tries to "optimize" other's algos. And when your timings is bad, you can insult only, not produce new algo.
Try to beat concept of delayed acumullation which I used in my algo. And try to use your own idea, instead of great speech about your "friends" and usage of theirs algos.
If this code is to be universal you must be able to handle more combinations of ASCII 13,10.
MS DOS / Windows 13 10
UNIX 10
MAC 13 (Richedit 2 and later)
OLD DOS 13 13 10 (Extra CR to make the printer work correctly.)
and - don't forget 10,13 - lol
i have seen it plenty of times
Quote from: hutch-- on December 19, 2010, 11:34:36 PM
If this code is to be universal you must be able to handle more combinations of ASCII 13,10.
MS DOS / Windows 13 10
UNIX 10
MAC 13 (Richedit 2 and later)
OLD DOS 13 13 10 (Extra CR to make the printer work correctly.)
Yes, Hutch. For my proc you can specify lines delimiter you want to use - 13 or 10, or anything else (for IBM EBCDIC encoding, for example).
Other's code use hard-coded line delimiter, and much less flexible.
"and - don't forget 10,13 - lol"
He can compare&count just 10 OR 13 OR every other BYTE at the same time but can't compare&count two different and consecutive bytes like 10 AND 13 OR every other WORD at the same time!
Quote
He can compare&count just 10 OR 13 OR every other BYTE at the same time but can't compare&count two different and consecutive bytes like 10 AND 13 OR every other WORD at the same time!
??????????????????????????????????????
To make it universal a few changes are needed.
A version for lines breaks of one byte and one for lines breaks of two bytes.
I have some files in old dos (using 13,10) and what hutch call old dos (13,13,10) must be very very old.
With two versions of the proc (line break byte and line break word) there is just to use two constants for each forms of breakline.
WinBreakLine equ 0D0Ah
MaskXmm equ 0d0d0d0dh
;other types of break lines
XBreakLine equ 0A0Dh
XMaskXmm equ 0A0A0A0Ah
The same things can be done for line break byte
Finally a parameter can be added to the function with the value of the break line.
That all
Quote
;-------------------------- search in aligned part -------------
;init of various register
mov eax, 0d0d0d0dh ; Ascii 10, linefeed
An another way is to make a conversion of the line break in another file.