News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

szLen optimize...

Started by denise_amiga, May 31, 2005, 07:42:44 PM

Previous topic - Next topic

herge

 Hi jj2007:

Volume 2A:
Instruction Set Reference, A-M

NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual
consists of five volumes: Basic Architecture, Order Number 253665;
Instruction Set Reference A-M, Order Number 253666; Instruction Set
Reference N-Z, Order Number 253667; System Programming Guide,
Part 1, Order Number 253668; System Programming Guide, Part 2,
Order Number 253669. Refer to all five volumes when evaluating your
design needs.

Order Number: 253666-029US

CPUID—CPU Identification
Description
The ID flag (bit 21) in the EFLAGS register indicates support for the CPUID instruction.
If a software procedure can set and clear this flag, the processor executing the
procedure supports the CPUID instruction. This instruction operates the same in non-
64-bit modes and 64-bit mode.
CPUID returns processor identification and feature information in the EAX, EBX, ECX,
and EDX registers.1 The instruction's output is dependent on the contents of the EAX
register upon execution (in some cases, ECX as well). For example, the following
pseudocode loads EAX with 00H and causes CPUID to return a Maximum Return
Value and the Vendor Identification String in the appropriate registers:


Go to Intel!

You want CPUID page 228 thru 261.
VOL 2A 3-180 > VOL 2A 3-213



It's about thirty-three pages long and this
manual has 812 pages.

You also need adobe Reader to read it.

Regards herge
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

jj2007

Herge, there is no problem with the CPUID instruction. It seems you have a very specific problem with your machine. Can you WinDbg the sample posted here (SSE2 but totally unrelated to szLen), and maybe insert just for fun the CPUID code to see if it makes any difference?

start:
   pushad
   push 1
   pop eax
   db 0Fh, 0A2h   ; cpuid 1
   xor eax, eax
   xor esi, esi
   bt edx, 25      ; edx bit 25, SSE1
   adc eax, esi
   bt edx, 26      ; edx bit 26, SSE2
   adc eax, esi
   bt ecx, esi      ; ecx bit 0, SSE3 (esi=0)
   adc eax, esi
   bt ecx, 9      ; ecx bit 9, SSE4
   adc eax, esi
   mov Win$, alloc$(1000000)

herge

 Hi jj2007:

It's got something to do with windbg and my computer.

It does not crash but iit is slow if you t or p a CPUID
instruction. It will take most of the day to run.

All cycles times are seven digits long.

The g works great!


00401325 6880000000       push    0x80
0040132a 50               push    eax
0040132b e82e320000       call    strslensse2!SetPriorityClass (0040455e)
strslensse2!start+0x1ab [C:\Program Files\Microsoft Visual Studio 8\VC\bin\strslensse2.asm @ 235]:
00401330 33c0            xor     eax,eax
00401332 0fa2            cpuid <<;; Don't t or P a CPUID
00401334 0f31            rdtsc
00401336 52              push    edx
00401337 50              push    eax
00401338 c705dcb7400020a10700 mov dword ptr [strslensse2!__counter__loop__counter__ (0040b7dc)],7A120h
00401342 33c0            xor     eax,eax
00401344 0fa2            cpuid



Regards herge
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

herge

 Hi jj2007:

I had no problems with CountLinesSSE2.exe
in WinDbg. I have two versions of Windbg and
both choke on CPUID.

Regards herge
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

herge

 Hi All:

Well we finally got OllyDEbug with Firefox and my internet radio OFF.
Olly ran Strslensse2.exe with no problem.
So it's either my computer or Windbg, or it's some
software I am running.

Regards herge
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

ToutEnMasm


To jj2007,
Quote
and more specifically I would like to see an example where the entry point is being determined by the machine rather than the code.

Perhaps did you search a machine wo write the code at your place ?.
The Entry point is always fixed by the code.

i will repeat  my upper post about the soluce.

Quote
You could'nt include code in the declare section
the includelib is just read by the linker and code is added at link time.

No need of special debugger or special machine to run SSE2 instructions.


consoles applications are irrelevent because they are finished before start when lauched with windows.Your one don't make that because there is bad writing in it.

I use windbg and he works perfectly with well written code.

herge


Hi All:

And the loser at 04.75 hrs is strslensse2.exe with WinDbg.

See Attachment.

Regards herge

[attachment deleted by admin]
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

BlackVortex

@ herge

Why do you insist on using WinDbg ?  It's next to useless (except maybe as a system debugger)

jj2007

Quote from: ToutEnMasm on March 18, 2009, 12:38:30 PM

To jj2007,
Quote
and more specifically I would like to see an example where the entry point is being determined by the machine rather than the code.

Perhaps did you search a machine wo write the code at your place ?.
The Entry point is always fixed by the code.

YES, that's perfectly correct. I knew that before, but a certain ToutEnMasm insisted that machines might fumble with the entry point:

Quote from: ToutEnMasm on March 18, 2009, 08:06:08 AM
That only the fact of an undeterminate entry point,that can be solve randomlly on various machines.

Quote
No need of special debugger or special machine to run SSE2 instructions.
You need a "special machine" that is capable of SSE2. If you had read the source, you would have discovered the macro that throws an error if you try to assemble it on an SSE1 machine. And there is run-time check in my code that reverts to crt_strlen if SSE1 is detected. That should be fool-proof, right?

Quote
consoles applications are irrelevent because they are finished before start when lauched with windows.Your one don't make that because there is bad writing in it.

Your phrase does not make sense at all, probably a language problem. Please explain, and use a code example.

SlenSSE2.inc works perfectly with console and GUI applications. The only change I had to make to my 9,500 lines RichMasm source was one line:
include \masm32\Gfa2Masm\Gfa2Masm.inc
include \masm32\include\slenSSE2.inc

Quote
I use windbg and he works perfectly with well written code.
Me too. It works perfectly well with all my code.

jj2007

Quote from: herge on March 18, 2009, 01:57:55 PM
And the loser at 04.75 hrs is strslensse2.exe with WinDbg.

Impressing :bg

And I am very glad that for the 100 byte strings my algo is over 1000 cycles faster than Lingo's :cheekygreen:

herge

 Hi jj2007:

A small sugestion:
To protect against operator stupity.


   ENDM
@@:    
   inkey chr$(9, 9, 9, 9, 9, "-- Hit X Key --")
   cmp AL,"X"
   jnz @B
   exit


But make sure the inkey MACRO in C:\masm32\macros\macros.asm
is updated.


inkey MACRO user_text:VARARG
     IFDIF <user_text>,<NULL>                  ;; if user text not "NULL"
       IFNB <user_text>                        ;; if user text not blank
         print user_text                       ;; print user defined text
       ELSE                                    ;; else
         print "Press any key to continue ..." ;; print default text
       ENDIF
     ENDIF
     call wait_key
     push eax;; < Note push
     print chr$(13,10)
     pop eax;; < Note pop
   ENDM



Note the push and pop, it was
always returning 2 the length
of CRLF which explains why
a CMP AL,? was always
failing after a inkey call.

Regards herge
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

jj2007

Quote from: herge on March 18, 2009, 07:46:05 PM
Hi jj2007:
A small sugestion:

Access violation when reading [herge's suggestion] - Shift+Run/Step to pass exception to the owner of the Masm32 macros :wink

herge

 
Hi jj2007:

I know the inkey MACRO has nothing to do with you.
But my suggestion will not work with the present
inkey MACRO which I suspect is not working
right at present.

I did mention it in another forum.

Regards herge
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

PBrennick

herge,

The inkey macro works correctly within the boundaries of what it was designed to do. The inkey function should not be expected to function in a polcat sort of way.

Use the getkey macro which calls ret_key if you expect to receive a value.

JJ,
About CPUID, not all parameters of this instruction are supported on all CPUs. CPUID should be called with EAX = 0 first, as this will return the highest calling parameter that the CPU supports. To obtain extended function information CPUID should be called with bit 31 of EAX set. To determine the highest extended function calling parameter, call CPUID with EAX = 80000000h. If the particular parameter you are trying to use is higher than that number, then report this to the user and do not use the instruction.

You reported in an earlier posting that you had trouble with CPUID, now you know why.

Paul
The GeneSys Project is available from:
The Repository or My crappy website

jj2007

Quote from: PBrennick on March 18, 2009, 09:40:33 PM
JJ,
About CPUID, not all parameters of this instruction are supported on all CPUs.

You reported in an earlier posting that you had trouble with CPUID, now you know why.

Paul

Paul,

1. Open \masm32\include\slenSSE2.inc in GeneSys.exe
2. Search for CPUID
3. All you will find is:
ChkSSE2 proc         ; exactly 40 bytes
   pushad
   push 1
   pop eax
   db 0Fh, 0A2h   ; cpuid 1
   xor eax, eax
   xor esi, esi
   bt edx, 25      ; edx bit 25, SSE1
   adc eax, esi
   bt edx, 26      ; edx bit 26, SSE2
   adc eax, esi
   bt ecx, esi      ; ecx bit 0, SSE3 (esi=0)
   adc eax, esi
   bt ecx, 9      ; ecx bit 9, SSE4
   adc eax, esi
   mov MbSSE2, eax
   popad
   ret
ChkSSE2 endp

As you can see, none of the extended functions are being used.

4. Herge's program works perfectly when launched normally. It's WindDbg that has a problem, not my code.