News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

This is too slow

Started by frktons, November 18, 2010, 03:10:21 AM

Previous topic - Next topic

Antariy

Quote from: dedndave on November 22, 2010, 01:07:21 AM
well - i count 6 spaces at the end of that string
in the pasted text, i only count 5   :P

let me see if i can find it

Would be better to calculate string length dinamically, as I made at my version of the manager.

Of course do the search  :U

But I guess that found a culprit, or the main reason at least.  :lol That's something strange in the treatment of the results by the console in defferent systems - on my system, it seems - zero byte is replaced to space by system, on your - it is leaves as is, and terminate string.

dedndave

this may not be the end solution, but it does solve the problem
should help find the bug
CopyScreenText PROC


   lea  esi, TopCode
   lea  edi, TextResults

   movq mm7, qword ptr [esi]
   movq qword ptr [edi], mm7

   add  edi, Eight  
   lea  esi, SavedScreen

   mov  eax, AlgoRow
   imul eax, MaxCols

   mov  ecx, eax
   
   xor  eax, eax
   xor  ebx, ebx

NextCharText:

   mov eax, [esi]
cmp al,0
jnz around1

mov al,20h

around1:
   mov BYTE PTR [edi], al

frktons

Dave, did you try it? Does it work?
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

yes - using the "C" command.....

┌─────────────────────────────────────────────────────────────[22-Nov-2010 at 01:21 GMT]─┐
│OS  : Microsoft Windows XP Professional Service Pack 2 (build 2600)                     │
│CPU : Intel(R) Pentium(R) 4 CPU 3.00GHz with 2 logical core(s) with SSE3                │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   75,602 │   72,022 │   73,863 │   71,714 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   72,385 │   71,391 │   71,918 │   70,983 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   90,708 │   88,303 │   91,428 │   88,650 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    8,627 │    8,596 │    8,448 │    8,441 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    3,353 │    3,607 │    3,697 │    3,357 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│06 Hutch ustr$ + format algo      │   159   │   11,877 │   11,919 │   12,173 │   12,370 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


GregL

Another suggestion, the following code in TestBed.asm could cause problems. Why not just test for SSE2.


SSE2          EQU  ON ; If your CPU is SSE2 capable set this var to ON


Procedure(s) to test for SSE2.

;---------------------------------------
ChkSSE2 PROC
    ; returns: TRUE  (1) if SSE2 is supported or
    ;          FALSE (0) if SSE2 is not supported
    call ChkCPUID
    test eax, eax
    jz @F           ; CPUID not supported
    xor eax, eax
    cpuid
    test eax, eax
    jz @F           ; function 1 not supported
    mov eax, 1
    cpuid
    xor eax, eax    ; set up for return of FALSE
    bt edx, 26      ; SSE2 supported?
    jnc @f          ; return FALSE
    mov eax, 1      ; return TRUE
  @@:
    ret
ChkSSE2 ENDP
;---------------------------------------
ChkCPUID PROC USES ebx
   ; Return: True (1) if CPUID supported
   ;         False(0) if CPUID not supported
   pushfd
   pop     eax
   btc     eax, 21                ; check if CPUID bit can toggle
   push    eax
   popfd
   pushfd
   pop     ebx
   xor     ebx, eax
   xor     eax, eax               ; set up to return FALSE
   bt      ebx, 21
   jc      @F                     ; CPUID not supported, return FALSE
   mov     eax, 1                 ; CPUID supported, return TRUE
 @@:    
   ret
ChkCPUID ENDP
;---------------------------------------


Antariy

Frank, have a look into AssignStr ;)

frktons

OK, let's see if it works on my system as well  :P

Well you have posted 20 corrections. Now which one is the good one?
All secondary points can be solved later...  :P
Mind is like a parachute. You know what to do in order to use it :-)

Antariy

Quote from: GregL on November 22, 2010, 01:23:24 AM
Another suggestion, the following code in TestBed.asm could cause problems. Why not just test for SSE2.

My old manager handle that issue at runtime :lol

dedndave

while we are discussing SSE   :bg

in 2 places, the CopyTextScreen routine uses
    movq mm7, qword ptr [esi]
    movq qword ptr [edi], mm7


is that necessary ?
i mean - sure it is a little faster, but they are executed only once
that code makes the program incompatible with a P3 machine
i am sure you have several other similar pieces of code in there

i would think it might be better to use non-SSE code, unless speed is an issue
that way, you have fewer pieces of code to write replacement routines for if SSE is not available

oex

I would recommend a global struct something like

CPUFeatures STRUCT
MMX
SSE
SSE2
ETC                                   ;<---- Awesome feature if you have got it :lol
CPUFeatures ENDS

Check for all features at once and then just check the global struct when needed....

Best always to code everything with basic x86 commands first and only optimise where needed at the end (Obviously I used Dave's post as a reference :lol)
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

frktons

One thing at a time.  :P

1] implemented workaround proposed by Dave.
2] I'm going to implement check for SSE2 capable code, this will solve also:

Quote from: dedndave on November 22, 2010, 01:30:33 AM
while we are discussing SSE   :bg

in 2 places, the CopyTextScreen routine uses
    movq mm7, qword ptr [esi]
    movq qword ptr [edi], mm7


is that necessary ?
i mean - sure it is a little faster, but they are executed only once
that code makes the program incompatible with a P3 machine
i am sure you have several other similar pieces of code in there

i would think it might be better to use non-SSE code, unless speed is an issue
that way, you have fewer pieces of code to write replacement routines for if SSE is not available

Because if the system doesn't support MMX, that is probably supported by P3, it will run/compile alternative code.
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

i suggest this code
http://www.masm32.com/board/index.php?topic=15338.msg125149#msg125149
the value can be stored in a single dword (byte, actually)

Antariy

Dave, is this EXE runs properly?

Problem is with many definitions of the size of the string, and not equally of the reeal things.



Alex

frktons


┌─────────────────────────────────────────────────────────────[22-Nov-2010 at 01:37 GMT]─┐
│OS  : Microsoft Windows 7 Ultimate Edition, 64-bit (build 7600)                         │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3           │
├──────────────────────────────────┬─────────┬──────────┬──────────┬──────────┬──────────┤
│        Algorithm notes           │Proc Size│ Test # 1 │ Test # 2 │ Test # 3 │ Test # 4 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│01 ustrv$ + GetNumberFormat       │    95   │   44.100 │   43.737 │   43.864 │   43.768 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│02 udw2str + GetNumberFormat      │    65   │   43.794 │   43.569 │   43.541 │   43.612 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│03 wsprintf + GetNumberFormat     │    73   │   50.065 │   49.866 │   50.556 │   50.546 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│04 Clive - IDIV and Stack         │   120   │    3.134 │    3.142 │    3.149 │    3.113 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│05 Clive - reciprocal IMUL        │   157   │    1.981 │    1.966 │    1.994 │    1.960 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│06 Hutch ustr$ + format algo      │   159   │    5.576 │    5.640 │    5.607 │    5.592 │
├──────────────────────────────────┼─────────┼──────────┼──────────┼──────────┼──────────┤


On my pc almost everything works.  :lol
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

no Alex - same problem

Frank - perhaps this is related to the CPU, itself
you have a newer one than i do   :'(
you don't have any SSE4 code in there, do you ?

also - i noticed in Hutch's thread that you have common controls disbaled or something ???
that makes your machine different than everyone elses