News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Is Negative

Started by herge, May 20, 2008, 02:04:45 PM

Previous topic - Next topic

MichaelW

#15
Quote from: Rockoon on May 22, 2008, 05:00:27 AM
Alternating signs isnt a very good test of anything usefull.

The best way to test things like this is to make a list of typical inputs, then shuffle them all, then perform your test on each item in the list in the now randomly ordered sequence. Anything else biases in favor of branching versions because branch predictors love patterns, patterns that arent typical in the real world usage of such a function.

Good point. I changed the code so it now uses a random sequence of inputs, repeating the same sequence for each of the tests. On a P3 the cycle counts for the macros did not change significantly, and the ranking remained the same, but the random inputs caused the cycle counts for the CRT function to more than double.

800 cycles, abs0
450 cycles, abs1
444 cycles, abs2
516 cycles, abs3
401 cycles, abs4
3853 cycles, crt_abs


I have updated the attachment.

EDIT: Removed code that had absolutely nothing to do with the subject at hand  :red
eschew obfuscation

jj2007

Sizewise the abs4 is also a clear winner. The high level macro (.if eax!>80000000h) scores not too bad, either - but I notice a certain volatility of timings.

1130 cycles, abs0
500 cycles, abs1      ; 9 bytes
523 cycles, abs2      ; 9 bytes
487 cycles, abs3      ; 9 bytes
407 cycles, abs4      ; 5 bytes
375 cycles, abs5      ; 9 bytes
207 cycles, abs6 = nop
3715 cycles, crt_abs   ; 10 bytes

crt_abs means a lot of work ;-)


00401EB8  |. B8 00000000   mov eax, 0
00401EBD  |. 50            push eax                    ; /x = 0
00401EBE  |. FF15 20804000 call near dword ptr [<&msvc>; \labs

labs       8BFF            mov edi, edi                ; ntdll.7C910738
77C36BD2   55              push ebp
77C36BD3   8BEC            mov ebp, esp
77C36BD5   8B45 08         mov eax, dword ptr [ebp+8]
77C36BD8   85C0            test eax, eax
77C36BDA   7D 02           jge short msvcrt.77C36BDE
77C36BDC   F7D8            neg eax ; [color=Red]not taken[/color]
77C36BDE   5D              pop ebp
77C36BDF   C3              retn

00401EC4  |. 83C4 04       add esp, 4

MichaelW

QuoteThe high level macro (.if eax!>80000000h) scores not too bad, either - but I notice a certain volatility of timings.

How would .if eax!>80000000h be used, and what is in abs5? By volatility I assume you mean that the counts vary from run to run. There will always be some variation, and the longer the test the larger the absolute variation. Running on a P3, if I reduce the repeat count to 20, I get the following cycle counts for 4 consecutive runs:

74 74 73 74
43 42 42 43
42 42 42 42
47 47 46 47
38 38 38 38
291 291 291 290


I expect other processors will show more variation.

eschew obfuscation

herge


Hi All:

Mac Cycle
abs0 802
abs1 511
abs2 436
abs3 532
abs4 411
crt_ 4898

It appears abs4 beats all.


    abs4 MACRO; AbsEAX
      cdq
      xor eax,edx
      sub eax,edx
    ENDM


Thanks again drizz.

// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

hutch--

If someone has the time, could they plug this into the benchmark, it sems to work OK. It has 5 instructions but no slow ones so it may perform OK.


    mov eax, 100
    mov ecx, eax
    neg ecx
    test eax, eax
    cmovs eax, ecx

    print str$(eax),13,10

    mov eax, -100
    mov ecx, eax
    neg ecx
    test eax, eax
    cmovs eax, ecx

    print str$(eax),13,10


this is the output.


100
100
Press any key to continue ...


LATER: I added the idea into Michael's latest test piece but its not fast enough at least on my old Northwood.


540 cycles, abs0    ; <<<< I used this macro for it.
503 cycles, abs1
537 cycles, abs2
393 cycles, abs3
489 cycles, abs4
4171 cycles, crt_abs

Press any key to exit...


This is the substitute macro.


    abs0 MACRO
      mov ecx, eax
      neg ecx
      test eax, eax
      cmovs eax, ecx
    ENDM


Here is the result on my 3.2 gig Prescott.


585 cycles, abs0
485 cycles, abs1
512 cycles, abs2
442 cycles, abs3
344 cycles, abs4
3825 cycles, crt_abs

Press any key to exit...


All of these algos are subject to hardware ariation it would seem.

LATER AGAIN: This seems to have a bit more legs but is not the fastest on either PIV.


    abs0 MACRO
      add eax, 0
      js @F
      neg eax
    @@:


Old Northwood PIV.


442 cycles, abs0
535 cycles, abs1
475 cycles, abs2
393 cycles, abs3
444 cycles, abs4
4314 cycles, crt_abs

Press any key to exit...


and on the 3.2 gig Prescott,


433 cycles, abs0
519 cycles, abs1
504 cycles, abs2
444 cycles, abs3
344 cycles, abs4
3825 cycles, crt_abs

Press any key to exit...


Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Jimg

#20
Note: this test is totally invaild for Intel users-
Would someone with an AMD try this?  I vowed to stay out of these cycle wars because AMD just doesn't time the same, but I couldn't resist.

I've run this 10 times with the same general results-

Code:
0  0  0  0  0  0  0

1  1  1  1  1  1  1

2147483647  2147483647  2147483647  2147483647  2147483647  2147483647  21474836
47

1  1  1  1  1  1  1

2147483647  2147483647  2147483647  2147483647  2147483647  2147483647  21474836
47

-2147483648  -2147483648  -2147483648  -2147483648  -2147483648  -2147483648  -2
147483648

764 cycles, abs0
400 cycles, abs1
402 cycles, abs2
400 cycles, abs3
385 cycles, abs4
347 cycles, abs5
2499 cycles, crt_abs

881 cycles, abs0
396 cycles, abs1
407 cycles, abs2
400 cycles, abs3
386 cycles, abs4
349 cycles, abs5
2482 cycles, crt_abs

852 cycles, abs0
394 cycles, abs1
404 cycles, abs2
404 cycles, abs3
382 cycles, abs4
349 cycles, abs5
2470 cycles, crt_abs

732 cycles, abs0
403 cycles, abs1
395 cycles, abs2
412 cycles, abs3
373 cycles, abs4
357 cycles, abs5
2486 cycles, crt_abs

733 cycles, abs0
398 cycles, abs1
392 cycles, abs2
404 cycles, abs3
376 cycles, abs4
345 cycles, abs5
2486 cycles, crt_abs

Press any key to exit...

What am I doing wrong here?

MichaelW

I corrected a problem with the random number generator not being re-seeded between the abs4 and abs5 tests, which was causing the abs5 test to use a different input sequence than the other tests. I also added Hutch's macros (note that the first one fails the 1 and -1 function tests), added the user names to the results, and reduced the repeat count to 20 to shorten the duration of each test loop (the idea being fewer opportunities for interruptions).

Typical results on my P3, Windows 2000 system:

74 cycles, abs0, herge
42 cycles, abs1, evlcrn8
54 cycles, abs2, jimg
47 cycles, abs3, rockoon
38 cycles, abs4, drizz
40 cycles, abs5, jj2007
60 cycles, abs6, hutch1
37 cycles, abs7, hutch2
291 cycles, crt_abs

74 cycles, abs0, herge
52 cycles, abs1, evlcrn8
42 cycles, abs2, jimg
47 cycles, abs3, rockoon
38 cycles, abs4, drizz
40 cycles, abs5, jj2007
60 cycles, abs6, hutch1
37 cycles, abs7, hutch2
291 cycles, crt_abs

74 cycles, abs0, herge
42 cycles, abs1, evlcrn8
42 cycles, abs2, jimg
47 cycles, abs3, rockoon
38 cycles, abs4, drizz
40 cycles, abs5, jj2007
60 cycles, abs6, hutch1
37 cycles, abs7, hutch2
291 cycles, crt_abs

86 cycles, abs0, herge
42 cycles, abs1, evlcrn8
42 cycles, abs2, jimg
47 cycles, abs3, rockoon
38 cycles, abs4, drizz
40 cycles, abs5, jj2007
60 cycles, abs6, hutch1
38 cycles, abs7, hutch2
297 cycles, crt_abs

74 cycles, abs0, herge
42 cycles, abs1, evlcrn8
43 cycles, abs2, jimg
47 cycles, abs3, rockoon
38 cycles, abs4, drizz
40 cycles, abs5, jj2007
60 cycles, abs6, hutch1
37 cycles, abs7, hutch2
291 cycles, crt_abs


Another possibility, if you were willing to accept some risk of buggy code crashing Windows, would be REALTIME_PRIORITY_CLASS. This did not significantly improve the consistency of my results.



[attachment deleted by admin]
eschew obfuscation

jj2007

Quote from: MichaelW on May 22, 2008, 10:48:56 AM
QuoteThe high level macro (.if eax!>80000000h) scores not too bad, either - but I notice a certain volatility of timings.

How would .if eax!>80000000h be used, and what is in abs5? By volatility I assume you mean that the counts vary from run to run


I meant the simple
  .if eax>=80000000h
       neg eax
  .endif
...which translates, if I remember well, to
cmp eax, 80000000h
jl @f
neg eax
@@:

Jimg

#23
Hi Michael-
When you set the one up with my name, you copied abs1 into it instead of the code I presented.  Here's the same one with the correct code-
0  0  0  0  0  0  0  0  0

1  1  1  1  1  1  1  -1  1

2147483647  2147483647  2147483647  2147483647  2147483647  2147483647  21474836
47  -2147483647  2147483647

1  1  1  1  1  1  1  -1  1

2147483647  2147483647  2147483647  2147483647  2147483647  2147483647  21474836
47  -2147483647  2147483647

-2147483648  -2147483648  -2147483648  -2147483648  -2147483648  -2147483648  -2
147483648  -2147483648  -2147483648

64 cycles, abs0, herge
37 cycles, abs1, evlcrn8
34 cycles, abs2, jimg
44 cycles, abs3, rockoon
33 cycles, abs4, drizz
34 cycles, abs5, jj2007
40 cycles, abs6, hutch1
31 cycles, abs7, hutch2
185 cycles, crt_abs

64 cycles, abs0, herge
37 cycles, abs1, evlcrn8
48 cycles, abs2, jimg
37 cycles, abs3, rockoon
33 cycles, abs4, drizz
34 cycles, abs5, jj2007
40 cycles, abs6, hutch1
31 cycles, abs7, hutch2
195 cycles, crt_abs

64 cycles, abs0, herge
37 cycles, abs1, evlcrn8
33 cycles, abs2, jimg
37 cycles, abs3, rockoon
33 cycles, abs4, drizz
33 cycles, abs5, jj2007
40 cycles, abs6, hutch1
31 cycles, abs7, hutch2
185 cycles, crt_abs

65 cycles, abs0, herge
51 cycles, abs1, evlcrn8
34 cycles, abs2, jimg
37 cycles, abs3, rockoon
33 cycles, abs4, drizz
33 cycles, abs5, jj2007
49 cycles, abs6, hutch1
31 cycles, abs7, hutch2
185 cycles, crt_abs

64 cycles, abs0, herge
37 cycles, abs1, evlcrn8
33 cycles, abs2, jimg
37 cycles, abs3, rockoon
33 cycles, abs4, drizz
34 cycles, abs5, jj2007
40 cycles, abs6, hutch1
31 cycles, abs7, hutch2
195 cycles, crt_abs

Press any key to exit...

NightWare

hi,
2 more to test :
abs8 MACRO _Operand_:REQ
      test _Operand_,_Operand_
      jns @F
      neg _Operand_
    @@:
    ENDM

abs9 MACRO _Operand_:REQ
      bt _Operand_,31
      jnc @F
      neg _Operand_
    @@:
    ENDM


hutch, js @F (avoid neg if signed ?)

jimg, abs2 = abs5 in your test (but different result...)

Jimg

Drat.   Ok, I'll let Michael sort it out.  I just knew the one with my name on it was the wrong one.

hutch--

I don't know why but I am getting very large variations in the timings in the last test piece.


80 cycles, abs0, herge
51 cycles, abs1, evlcrn8
69 cycles, abs2, jimg
38 cycles, abs3, rockoon
32 cycles, abs4, drizz
12 cycles, abs5, jj2007
50 cycles, abs6, hutch1
29 cycles, abs7, hutch2
390 cycles, crt_abs

71 cycles, abs0, herge
71 cycles, abs1, evlcrn8
41 cycles, abs2, jimg
38 cycles, abs3, rockoon
32 cycles, abs4, drizz
41 cycles, abs5, jj2007
38 cycles, abs6, hutch1
29 cycles, abs7, hutch2
378 cycles, crt_abs

68 cycles, abs0, herge
20 cycles, abs1, evlcrn8
37 cycles, abs2, jimg
28 cycles, abs3, rockoon
32 cycles, abs4, drizz
13 cycles, abs5, jj2007
38 cycles, abs6, hutch1
14 cycles, abs7, hutch2
377 cycles, crt_abs

103 cycles, abs0, herge
45 cycles, abs1, evlcrn8
37 cycles, abs2, jimg
28 cycles, abs3, rockoon
32 cycles, abs4, drizz
13 cycles, abs5, jj2007
43 cycles, abs6, hutch1
14 cycles, abs7, hutch2
390 cycles, crt_abs

69 cycles, abs0, herge
45 cycles, abs1, evlcrn8
12 cycles, abs2, jimg
40 cycles, abs3, rockoon
32 cycles, abs4, drizz
13 cycles, abs5, jj2007
48 cycles, abs6, hutch1
33 cycles, abs7, hutch2
390 cycles, crt_abs

Press any key to exit...
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: hutch-- on May 23, 2008, 06:52:55 AM
I don't know why but I am getting very large variations in the timings in the last test piece.

First, the good news: On Hutch's average, my code beats all the others. The bad news is that it isn't my code, since I actually proposed the rather ordinary
    .if eax>=80000000h
       neg eax
    .endif

Never mind, I'll bear the false honour with dignity. But jokes apart: The variations are indeed very significant. I added the two by Nightware...


    abs4 MACRO ; Drizz 5 bytes
      cdq
      xor eax,edx
      sub eax,edx
    ENDM

    abs5 Macro ; jimg 6 bytes
    or eax, eax
    .if sign? ; jns @F
        neg eax
    .endif 
    endm

    abs5jj Macro ; jj 9 bytes
    .if eax>=80000000h
       neg eax
    .endif
    endm

    abs6 MACRO ; Hutch1 9 bytes
      mov ecx, eax
      neg ecx
      test eax, eax
      cmovs eax, ecx
    ENDM

    abs7 MACRO ; Hutch2 7 bytes
      add eax, 0
      js @F
      neg eax
    @@:
    ENDM

    abs8 Macro ; Nightware 6 bytes
    test eax, eax
    .if sign? ; jns @F
        neg eax
    .endif 
    endm

    abs9 Macro ; Nightware 6 bytes
    bt eax, 31
    jnc @F
    neg eax
  @@:
    endm



... and get these benchmarks:

71 cycles, abs0, herge
41 cycles, abs1, evlcrn8
41 cycles, abs2, jimg
41 cycles, abs3, rockoon
32 cycles, abs4, drizz
13 cycles, abs5
45 cycles, abs5, jj2007
39 cycles, abs6, hutch1
31 cycles, abs7, hutch2
13 cycles, abs8
51 cycles, abs9
390 cycles, crt_abs

69 cycles, abs0, herge
41 cycles, abs1, evlcrn8
41 cycles, abs2, jimg
29 cycles, abs3, rockoon
32 cycles, abs4, drizz
37 cycles, abs5
45 cycles, abs5, jj2007
38 cycles, abs6, hutch1
33 cycles, abs7, hutch2
37 cycles, abs8
37 cycles, abs9
380 cycles, crt_abs

69 cycles, abs0, herge
33 cycles, abs1, evlcrn8
13 cycles, abs2, jimg
38 cycles, abs3, rockoon
33 cycles, abs4, drizz
13 cycles, abs5
37 cycles, abs5, jj2007
38 cycles, abs6, hutch1
16 cycles, abs7, hutch2
41 cycles, abs8
47 cycles, abs9
378 cycles, crt_abs

75 cycles, abs0, herge
55 cycles, abs1, evlcrn8
41 cycles, abs2, jimg
28 cycles, abs3, rockoon
32 cycles, abs4, drizz
37 cycles, abs5
51 cycles, abs5, jj2007
39 cycles, abs6, hutch1
32 cycles, abs7, hutch2
38 cycles, abs8
37 cycles, abs9
380 cycles, crt_abs

71 cycles, abs0, herge
45 cycles, abs1, evlcrn8
13 cycles, abs2, jimg
38 cycles, abs3, rockoon
32 cycles, abs4, drizz
41 cycles, abs5
41 cycles, abs5, jj2007
38 cycles, abs6, hutch1
33 cycles, abs7, hutch2
51 cycles, abs8
47 cycles, abs9
384 cycles, crt_abs


jimg made it in only 13 cycles, but not for long ;-)

In the absence of a clear winner, let's go for a "diplomatic" solution: Take the product of size * speed... congrats, drizz  :cheekygreen:

MichaelW

#28
Because I doubt that this could be any more confusing than it is now, I reworked the test to use the second set of macros, the ones that start a new time slice at the start of the loops and capture the lowest cycle count that occurs in any loop. I also went back over the thread and attempted to get the names straight. Running on my P3 with the repeat count set to 20 the repeatability for the macro code is near perfect.

abs0    0 1 2147483647 1 2147
abs1    0 1 2147483647 1 2147
abs2    0 1 2147483647 1 2147
abs3t   0 1 2147483647 1 2147
abs4    0 1 2147483647 1 2147
abs5    0 1 2147483647 1 2147
abs6    0 1 2147483647 1 2147
abs7    0 1 2147483647 1 2147
abs8    0 1 2147483647 1 2147
abs9    0 1 2147483647 1 2147
crtabs  0 1 2147483647 1 2147

80 cycles, abs0 (herge)
47 cycles, abs1 (evlncrn8)
45 cycles, abs2 (jimg)
55 cycles, abs3 (Rockoon)
45 cycles, abs4 (drizz)
47 cycles, abs5 (jj2007)
70 cycles, abs6 (hutch1)
45 cycles, abs7 (hutch2)
45 cycles, abs8 (NightWare1)
48 cycles, abs9 (NightWare2)
228 cycles, crt_abs

80 cycles, abs0 (herge)
47 cycles, abs1 (evlncrn8)
45 cycles, abs2 (jimg)
55 cycles, abs3 (Rockoon)
45 cycles, abs4 (drizz)
47 cycles, abs5 (jj2007)
70 cycles, abs6 (hutch1)
45 cycles, abs7 (hutch2)
45 cycles, abs8 (NightWare1)
48 cycles, abs9 (NightWare2)
282 cycles, crt_abs

80 cycles, abs0 (herge)
47 cycles, abs1 (evlncrn8)
45 cycles, abs2 (jimg)
55 cycles, abs3 (Rockoon)
45 cycles, abs4 (drizz)
47 cycles, abs5 (jj2007)
70 cycles, abs6 (hutch1)
45 cycles, abs7 (hutch2)
45 cycles, abs8 (NightWare1)
48 cycles, abs9 (NightWare2)
282 cycles, crt_abs

80 cycles, abs0 (herge)
47 cycles, abs1 (evlncrn8)
45 cycles, abs2 (jimg)
55 cycles, abs3 (Rockoon)
45 cycles, abs4 (drizz)
47 cycles, abs5 (jj2007)
70 cycles, abs6 (hutch1)
45 cycles, abs7 (hutch2)
45 cycles, abs8 (NightWare1)
48 cycles, abs9 (NightWare2)
282 cycles, crt_abs

80 cycles, abs0 (herge)
47 cycles, abs1 (evlncrn8)
45 cycles, abs2 (jimg)
55 cycles, abs3 (Rockoon)
45 cycles, abs4 (drizz)
47 cycles, abs5 (jj2007)
70 cycles, abs6 (hutch1)
45 cycles, abs7 (hutch2)
45 cycles, abs8 (NightWare1)
48 cycles, abs9 (NightWare2)
282 cycles, crt_abs


On a P4 I expect the cycle counts will always be a multiple of 4.


[attachment deleted by admin]
eschew obfuscation

hutch--

What have I done wrong here ? This is the test piece I used which with 1 + -1 both returned 1.


    mov eax, 1
    add eax, 0
    jns @F
    neg eax
  @@:
    print str$(eax),13,10

    mov eax, -1
    add eax, 0
    jns @F
    neg eax
  @@:
    print str$(eax),13,10


Result


1
1
Press any key to continue ...
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php