News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

a counter of lines in mmx instructions

Started by ToutEnMasm, March 18, 2009, 09:20:18 AM

Previous topic - Next topic

herge

 Hi jj2007:

The results on my computer.
ntel(R) Core(TM)2 Duo CPU     E4600  @ 2.40GHz (SSE4)
Tests for correctness - 100+2/100+6 lines expected,
first string 5-byte misaligned:
Mark Larson= - /  - (throws exception)
jj2007=      100+2 / 100+6 lines
Lingo=      - / 102 lines
ToutEnMasm= 100 / 100 lines

Codesizes:
Mark Larson =        2104
getlinesJJ =          177
getlines Lingo = 191
CompteurLignes = 237

Counting lines of \masm32\include\windows.inc:

markl_CountFileLines (Mark Larson):
190 kilocycles for 22272 lines, 849759 bytes

getlinesJJ: (jj2007)
367 kilocycles for 22272 lines, 849759 bytes

getlines (Lingo):
498 kilocycles for 22272 lines, 849759 bytes

CompteurLignes: (ToutEnMasm)
653 kilocycles for 22272 lines, 849759 bytes

Hit any key


Regards herge

// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

jj2007

#46
Thanks, Herge. Here the hopefully last version, with minor improvements in codesize and speed. I included the Texte string of ToutEnMasm, see / 11 lines.
The switch CheckBadLines = 0 can be used to test if malformed strings like "part A", LF, "part B" are present. Code size increases from 153 to 168 bytes, speed is identical.


Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:

Mark Larson=    --- / --- (throws exception)
jj2007=         100 / 100 / 11 lines
Lingo=          --- / 102 / 11 lines
ToutEnMasm=     100 / 100 / 11 lines

Codesizes:
Mark Larson =           2104
getlinesJJ =            153
getlines Lingo =        191
CompteurLignes =        237

Counting lines of \masm32\include\winextra.inc:

markl_CountFileLines (Mark Larson):
239     kilocycles for 20001 lines, 807877 bytes

getlinesJJ: (jj2007)
450     kilocycles for 20025 lines, 807877 bytes

getlines (Lingo):
637     kilocycles for 20025 lines, 807877 bytes

CompteurLignes: (ToutEnMasm)
802     kilocycles for 20025 lines, 807877 bytes


EDIT: Make sure you launch the exe from the same drive as Masm32, otherwise it will throw an exception because it doesn't find winextra.inc

EDIT(2): Changed one line in CompteurLignes:

   mov eax, theEnd
   sub eax, esi
   ; .if eax == 0               ; threw exception for negative byte count
   .if sdword ptr eax <= 0      ; foolproof ;-)

The exception happened when a file is not found, but read anyway (result: -1 bytes read), and the algo is being called anyway. No checking is certainly not a recommended way of doing things, but now at least the algo, rather than throwing an exception, returns 0 lines - which is kind of an error check, too.

[attachment deleted by admin]

PBrennick

Why are my results so different? I am afraid of the answer.  :red

Quote
                Intel(R) Celeron(R) CPU 1.70GHz (SSE2)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:

jj2007=         100 / 100 / 11 lines
Lingo=          --- / 102 / 11 lines
ToutEnMasm=     100 / 100 / 11 lines

Codesizes:
getlinesJJ =            153
getlines Lingo =        191
CompteurLignes =        238

Counting lines of \masm32\include\winextra.inc:

getlinesJJ: (jj2007)
1169    kilocycles for 20025 lines, 807877 bytes

getlines (Lingo):
1187    kilocycles for 20025 lines, 807877 bytes

CompteurLignes: (ToutEnMasm)
1191    kilocycles for 20025 lines, 807877 bytes

Paul
The GeneSys Project is available from:
The Repository or My crappy website

jj2007

Quote from: PBrennick on March 21, 2009, 09:49:53 PM
Why are my results so different? I am afraid of the answer.  :red

Quote
                Intel(R) Celeron(R) CPU 1.70GHz (SSE2)

Paul, I am not an expert in CPU architecture, it seems Intel changed something when intoducing SSE3 - not relevant for the functioning of the code (SSE2 only), but it affects the speed. Compare to previous postings...

lingo

I edited the test program from "mov eax, alloc$(10000000)" to "mov eax, alloc$(100000000)"
to be enough for my copy of windows.inc and have a new numbers... :lol
Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:

jj2007=         100 / 100 / 11 lines
Lingo=          100 / 100 / 11 lines
ToutEnMasm=     100 / 100 / 11 lines

Codesizes:
getlinesJJ =            153
getlines Lingo =        217
CompteurLignes =        238

Counting lines of \masm32\include\windows.inc:

getlinesJJ: (jj2007)
517     kilocycles for 30762 lines, 1127718 bytes

getlines (Lingo):
451     kilocycles for 30762 lines, 1127718 bytes

CompteurLignes: (ToutEnMasm)
970     kilocycles for 30762 lines, 1127718 bytes

Hit any key

[attachment deleted by admin]

ToutEnMasm


Thanks at all for help,
Now i have an application "cherche" (search in english) who need only 3 seconds to find a word in all the examples of masm32.
For example searching richedit:
result
Quote
F:\masm32\examples\exampl10\shuflarr\unique_riched\ectrl.asm
12   : szText EditMl,"RICHEDIT"

F:\masm32\examples\exampl10\shuflarr\unique_riched\idat.asm
1   : szDisplayName db "MASM32 Richedit",0

F:\masm32\examples\exampl09\maketbl\maketbl.asm
436   : fn CreateWindowEx,WS_EX_STATICEDGE,"RICHEDIT",0, \

F:\masm32\examples\exampl05\qeplugin\qeplugin.asm
22   : ; being read from the editor with richedit selection. If you need to

F:\masm32\examples\exampl06\regdemo\regdemo.asm
35   : RichEdit         db  'RichEdit20A',0
228   : INVOKE     CreateWindowEx, NULL, addr RichEdit, NULL,\

F:\masm32\examples\advanced\wrep\result.asm
7   : include Richedit.inc      ; local includes for this file
11   : ; uncomment for richedit version 1 or
12   : ; comment out for richedit version 2
417   : szText RichEd,"MASM RichEdit"
580   : szText EditMl,"RICHEDIT"

F:\masm32\examples\exampl05\riched\richedit.asm
7   : include Richedit.inc      ; local includes for this file
11   : ; uncomment for richedit version 1 or
12   : ; comment out for richedit version 2
417   : szText RichEd,"MASM RichEdit"
580   : szText EditMl,"RICHEDIT"

F:\masm32\examples\poasm\riched\richedit.asm
10   : include Richedit.inc      ; local includes for this file
14   : ; uncomment for richedit version 1 or
15   : ; comment out for richedit version 2
409   : szText RichEd,"POASM RichEdit"
571   : szText EditMl,"RICHEDIT"

F:\masm32\examples\exampl10\shuflarr\unique_riched\richedit.asm
11   : include Richedit.inc      ; local includes for this file
15   : ; uncomment for richedit version 1 or
16   : ; comment out for richedit version 2
424   : szText RichEd,"MASM RichEdit"

F:\masm32\examples\advanced\wrep\richedit.asm
7   : include Richedit.inc      ; local includes for this file
11   : ; uncomment for richedit version 1 or
12   : ; comment out for richedit version 2
417   : szText RichEd,"MASM RichEdit"
580   : szText EditMl,"RICHEDIT"

F:\masm32\examples\poasm\riched\richedit.inc
81   : szDisplayName db "MASM32 Richedit",0

F:\masm32\examples\exampl05\riched\richedit.inc
82   : szDisplayName db "MASM32 Richedit",0

F:\masm32\examples\exampl06\treedemo\treedemo.asm
43   : RichEdit       db  'RichEdit20A',0
223   : INVOKE     CreateWindowEx, WS_EX_CLIENTEDGE, addr RichEdit, NULL,\

in all the header files of the sdk this need 10s

THIS MEAN , THAT THERE IS NOT ENOUGH SAMPLES IN MASM32

PBrennick

I think that you have included all header files in your SDK whereas windows.inc just includes the ones that are most used. I do not feel this is a limitation. Personally, I very seldom find anything lacking in windows.inc as it does include all that I need to do what I do. I feel this is the correct barometer for deciding whether the contents are sufficient or not. They are sufficient for me.

Paul
The GeneSys Project is available from:
The Repository or My crappy website

ToutEnMasm

To PBrennick,
That's a question,but don't take my little provocation too much at the serious.
It was just to show how happy i am,to have this.

herge

 Hi  lingo:

Results from my computer.

Intel(R) Core(TM)2 Duo CPU     E4600  @ 2.40GHz (SSE4)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:

jj2007=      100 / 100 / 11 lines
Lingo=      100 / 100 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines

Codesizes:
getlinesJJ =          153
getlines Lingo = 217
CompteurLignes = 238

Counting lines of \masm32\include\windows.inc:

getlinesJJ: (jj2007)
386 kilocycles for 22272 lines, 849759 bytes

getlines (Lingo):
325 kilocycles for 22272 lines, 849759 bytes

CompteurLignes: (ToutEnMasm)
668 kilocycles for 22272 lines, 849759 bytes

Hit any key


Regards herge
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

herge

 Hi jj2007:

Yet again more results:

Intel(R) Core(TM)2 Duo CPU     E4600  @ 2.40GHz (SSE4)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:

jj2007=      100 / 100 / 11 lines
Lingo=      --- / 102 / 11 lines
ToutEnMasm= 100 / 100 / 11 lines

Codesizes:
getlinesJJ =          153
getlines Lingo = 191
CompteurLignes = 238

Counting lines of \masm32\include\winextra.inc:

getlinesJJ: (jj2007)
315 kilocycles for 20025 lines, 807877 bytes

getlines (Lingo):
413 kilocycles for 20025 lines, 807877 bytes

CompteurLignes: (ToutEnMasm)
618 kilocycles for 20025 lines, 807877 bytes

Hit any key


Regards herge
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

drizz

Can you run my proc too, i haven't tested it much...


.code
.mmx
; Notes:
; - PLAIN MMX, does not use SSE123
; - lpBuffer must be padded with exra 8 bytes
; - Does not handle nasty remainders! (simply zero out 8 bytes after length)
CountCRLF proc lpBuffer:ptr byte, ccLength:dword
mov eax,0A0D0A0Dh
pxor mm1,mm1
movd mm7,eax
pxor mm3,mm3
punpckldq mm7,mm7
mov eax,lpBuffer
mov edx,ccLength
xor ecx,ecx
.repeat
movq mm4,[eax+ecx]
movq mm5,[eax+ecx+1]
add ecx,8
pcmpeqw mm4,mm7
pcmpeqw mm5,mm7
paddsw mm5,mm4
pxor mm4,mm4
psubsw mm4,mm5
punpckldq mm5,mm4
paddsw mm5,mm4
punpckhwd mm5,mm3
punpckldq mm4,mm5
paddd mm5,mm4
paddd mm1,mm5
.until ecx >= edx
psrlq mm1,32
movd eax,mm1
ret
CountCRLF endp
The truth cannot be learned ... it can only be recognized.

PBrennick

drizz,

I like the fact that it is just .mmx, it will run on more systems.

Paul
The GeneSys Project is available from:
The Repository or My crappy website

jj2007

Quote from: lingo on March 22, 2009, 03:21:36 PM
I edited the test program from "mov eax, alloc$(10000000)" to "mov eax, alloc$(100000000)"


The branch predictor slows your code down by 8 cycles, but otherwise I like it; a bit bloated but fast :bg.
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:

jj2007=         100 / 100 / 11 lines
Lingo=          100 / 100 / 11 lines
ToutEnMasm=     100 / 100 / 11 lines

Codesizes:
getlinesJJ =            153
getlines Lingo =        217
CompteurLignes =        238

Counting lines of \masm32\include\winextra.inc:

getlinesJJ: (jj2007)
458     kilocycles for 20025 lines, 807877 bytes

getlines (Lingo):
420     kilocycles for 20025 lines, 807877 bytes

CompteurLignes: (ToutEnMasm)
795     kilocycles for 20025 lines, 807877 bytes

[attachment deleted by admin]

lingo

"a bit bloated but fast"

I have more code with more speed in getlines2-SSE2.. :wink

Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)
Tests for correctness - 100 / 100 / 11 lines
expected, first string 5-byte misaligned:

jj2007=         100 / 100 / 11 lines
Lingo=          100 / 100 / 11 lines
Lingo2=         100 / 100 / 11 lines
ToutEnMasm=     100 / 100 / 11 lines

Codesizes:
getlinesJJ =            153
getlines Lingo =        217
getlines2 Lingo2 =      278
CompteurLignes =        238

Counting lines of \masm32\include\windows.inc:

getlinesJJ: (jj2007)
519     kilocycles for 30762 lines, 1127718 bytes

getlines (Lingo):
452     kilocycles for 30762 lines, 1127718 bytes

getlines2 (Lingo2):
439     kilocycles for 30762 lines, 1127718 bytes

CompteurLignes: (ToutEnMasm)
970     kilocycles for 30762 lines, 1127718 bytes

Hit any key

[attachment deleted by admin]

jj2007

#59
Quote from: lingo on March 22, 2009, 10:58:59 PM
"a loooot bloated but fast"

Counting lines of \masm32\include\winextra.inc:

getlinesJJ: (jj2007)
12760 kiloLAMPs, 1031 kilocycles for 20025 lines, 807877 bytes :dance:

getlines (Lingo):
14013 kiloLAMPs, 951 kilocycles for 20025 lines, 807877 bytes :naughty:

getlines2 (Lingo):
10946 kiloLAMPs, 656 kilocycles for 20025 lines, 807877 bytes :clap:

CompteurLignes: (ToutEnMasm)
17966 kiloLAMPs, 1164 kilocycles for 20025 lines, 807877 bytes :snooty:

LAMPs = Lean And Mean Points = cycles * sqrt(codesize)

(old politician's motto: if you can't win under these rules, just change the rules :bg)

EDIT: Changing the rules was not successful. Lingo wins this round :toothy

[attachment deleted by admin]