News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

szLen optimize...

Started by denise_amiga, May 31, 2005, 07:42:44 PM

Previous topic - Next topic

jj2007

Quote from: Jimg on March 17, 2009, 06:07:48 PM
Sure-

Thanks. Remarkably fast, and remarkably incorrect :green

ToutEnMasm


Viewing the post,I see that he can be a possible problem with the location of the proc.
Proc with SSE code must be in a separate module to work.
I experiment with this,And find it as the only soluce.
That is put all the sse code in the slensse2.inc.
I don't know why it is like that ( i have ml 9.0),but i am certain of what is the problem.

a useful macro can be also added to the slensse2.inc.

Quote
numeroversion equ < @Version>
IF numeroversion LT 615
   %ECHO MASM numeroversion impossible de compiler SSE2
   .ERR  <Version Masm must be at least 6.15 to compile SSE2>
ENDIF

jj2007

Quote from: ToutEnMasm on March 17, 2009, 08:00:53 PM

Viewing the post,I see that he can be a possible problem with the location of the proc.
Proc with SSE code must be in a separate module to work.
I experiment with this,And find it as the only soluce.
That is put all the sse code in the slensse2.inc.
I don't know why it is like that ( i have ml 9.0),but i am certain of what is the problem.
strlenSSE2.asc uses SSE2 code in the main module and the slenSSE2.inc. JimG has an old SSE1 CPU - I tried to dig out my oldest 6 year old puter, but it's SSE2 already. Nonetheless I have a suspicion that the algo could work with SSE1 - but I cannot test it...

Quote
a useful macro can be also added to the slensse2.inc.

Quote
numeroversion equ < @Version>
IF numeroversion LT 615
   %ECHO MASM numeroversion impossible de compiler SSE2
   .ERR  <Version Masm must be at least 6.15 to compile SSE2>
ENDIF

From the package  (downloaded 10 times right now):

TestMasmVersion MACRO
  ifidn @Version, <614>
echo ####################################################
echo
echo You cannot use the SSE2 library with ml.exe version 614, sorry
echo
echo ####################################################
.err
  endif
ENDM
...
.code
TestMasmVersion


But thanks anyway, ToutEnMasm :U

Jimg

Is this better?
Timings for strlen32s:

25      cycles for len=3
29      cycles for len=3


Timings for Masm32lib szLen:

27      cycles for len=15
24882   cycles for len=16384

                                -- hit any key --

herge

 
Hi All:

A picture of some slow response on my computer while

debugging with winDbg. It would appear it's hanging on

the CPUID instruction, and taking it's sweet time.

See the pretty picture Note the Very Large Cycle times.

Also note strslensse2 and WinDbg Cpu time Useage.

Regards herge

[attachment deleted by admin]
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

jj2007

Quote from: Jimg on March 17, 2009, 08:44:49 PM
Is this better?


I am afraid the string lengths should be the same as for szLen...
Still trying to find a reliable database giving info which SSE version corresponds to which instruction.   This file documents NASM, but it's not that clear. ::)

PBrennick

JJ,

Take a look at my Opcode Database Project. SSE2 instructions are listed as same. SSE1 instructions are listed as SSE. It is not a fancy app but it has the info you need.

hth,
Paul


[attachment deleted by admin]
The GeneSys Project is available from:
The Repository or My crappy website

jj2007

Quote from: PBrennick on March 17, 2009, 10:35:24 PM
JJ,

Take a look at my Opcode Database Project. SSE2 instructions are listed as same. SSE1 instructions are listed as SSE. It is not a fancy app but it has the info you need.

hth,
Paul


Thanks, Paul, much appreciated. The problem is indeed that pcmpeqb and pmovskb exist as MMX and SSE2 versions. Which means that JimG has no luck - his SSE1 CPU does not throw an exception, but it cannot interpret the 66h prefix... sorry!

ToutEnMasm


Take care with the entry point of your code.
The slenSSE2.inc that repeat .686 .data and so on,is a bad thing.

jj2007

Quote from: ToutEnMasm on March 18, 2009, 07:32:31 AM

Take care with the entry point of your code.
The slenSSE2.inc that repeat .686 .data and so on,is a bad thing.


Why?

ToutEnMasm


This explain why there is bad results given by the function .
This explain also why the program seems to works slowly.
I have tested that with windbg.Put your include file with code in the section code (The slenSSE2.inc  was in the declare section) ,without the repeat of .686...,and you will have a code that run faster and don't give random results.
That only the fact of an undeterminate entry point,that can be solve randomlly on various machines.

BlackVortex

Quote from: herge on March 17, 2009, 09:50:13 PM

Hi All:

A picture of some slow response on my computer while

debugging with winDbg. It would appear it's hanging on

the CPUID instruction, and taking it's sweet time.

See the pretty picture Note the Very Large Cycle times.

Also note strslensse2 and WinDbg Cpu time Useage.

Regards herge
Maybe use OllyDbg ?    :U

herge

 Hi  BlackVortex:

I tried to download ollydebug three times and all you get

ia a corrupt Zip file. If you can't download it one piece it's

Not going to be used!

Regards herge

// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

BlackVortex

Quote from: herge on March 18, 2009, 08:55:59 AM
Hi  BlackVortex:

I tried to download ollydebug three times and all you get

ia a corrupt Zip file. If you can't download it one piece it's

Not going to be used!

Regards herge


http://www.ollydbg.de/odbg110.zip
This link ?  It works fine.

jj2007

Quote from: herge on March 18, 2009, 08:55:59 AM
I tried to download ollydebug three times and all you get is a corrupt Zip file.

Try this link with Firefox and IE. For me, it always works fine (with both browsers).

Just for fun, I also downloaded WinDbg, >17 MB, and tried it. The user interface is disgusting, but it has no problem with the CPUID opcode. Googling for WinDbg CPUID is not very successful, either, so it might be something specific to your CPU ::)

Re entry points etc:

include \masm32\include\masm32rt.inc
include \masm32\include\slenSSE2.inc
txt50 equ <"Just some stüpid text containing exäctly 50 bytes ">
.data
szTest_1 db "My short string", 0
...
.code
start:
tmp$ CATSTR <chr$("Timings for >, StrLenAlgo$, <:")>
print tmp$, 13, 10, 10


I cannot see what could possibly wrong here, and more specifically I would like to see an example where the entry point is being determined by the machine rather than the code. What I see in Olly is that the SSE2 code starts at 00401000 (start of code section), while execution starts at 004010AC, called <ModuleEntryPoint>. That works fine, many coders put procedures before start in order to save the PROTO's.

Reviewing the posts above, it seems that the slenSSE2.inc works fine unless
a) you have a CPU that does not support SSE2 or
b) you use WinDbg and get hung at the CPUID instruction.

Is that correct? Is there any case where the code did not work properly on an SSE2 machine in normal (non-debugged) execution?