The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: herge on May 20, 2008, 02:04:45 PM

Title: Is Negative
Post by: herge on May 20, 2008, 02:04:45 PM
 Hi All:


  push eax
  and eax,80000000h ; is high bit on?
  cmp eax,80000000h
  pop eax
  jnz @f
  neg eax ; is Negative so flip it!
@@:



Is there any easy way to test if a signed Number
is negative. Or is they a jmp on a conditon
to do it?
Title: Re: Is Negative
Post by: evlncrn8 on May 20, 2008, 02:30:02 PM
test eax, 080000000h possibly?
Title: Re: Is Negative
Post by: Jimg on May 20, 2008, 03:26:44 PM
or eax,eax
jns @f
neg eax
@@:
Title: Re: Is Negative
Post by: herge on May 20, 2008, 04:22:34 PM

Hi Jimg:

It works for me!

Thank you.
Title: Re: Is Negative
Post by: jj2007 on May 20, 2008, 09:32:01 PM
For unsigned integers such as the well-known eax:

IsNeg   EQU 80000000h   ; eax>= means: eax is negative
IsNegW   EQU 8000h   ; GetKeystate needs a WORD

  .if eax>=IsNeg
   ; negative
  .endif

  .if eax<IsNeg
   ; positive
  .endif
Title: Re: Is Negative
Post by: Rockoon on May 21, 2008, 04:21:48 AM
What about

input in eax

mov ebx, eax
sar eax, 31
add ebx, eax
xor ebx, eax

output in ebx

no branching, but more operations.. so profile.
Title: Re: Is Negative
Post by: herge on May 21, 2008, 08:17:22 AM

Hi Rockon:

Yes that works as well.

Thank you!
Title: Re: Is Negative
Post by: drizz on May 21, 2008, 06:10:49 PM
some more :)
Abs macro __rm:req
.repeat
neg __rm
.until !sign?
endm

AbsEAX macro
cdq
xor eax,edx
sub eax,edx
endm
Title: Re: Is Negative
Post by: Rockoon on May 21, 2008, 07:36:07 PM
Quote from: drizz on May 21, 2008, 06:10:49 PM
some more :)

AbsEAX macro
cdq
xor eax,edx
sub eax,edx
endm


Very humbling. This one is clearly superior to all of the others posted thus far. No benchmrking required. Even uses eax as much as possible for shorter instruction lengths.

Title: Re: Is Negative
Post by: herge on May 21, 2008, 07:44:46 PM

Hi drizz:

The AbsEAX macro works.
But I don't need 64 bits yet.

Thanks.
Title: Re: Is Negative
Post by: hutch-- on May 22, 2008, 01:02:27 AM
 :bg

Like it. Compliments drizz.  :U


AbsEAX macro
cdq
xor eax,edx
sub eax,edx
endm
Title: Re: Is Negative
Post by: MichaelW on May 22, 2008, 04:37:45 AM
The attachment is a quick test of the code here, based on the assumption that the goal is to convert the number in eax to its absolute value. The cycle counts in the results below are for a total of 200 conversions, of alternating positive and negative values, running on a P3. Considering the call overhead, even the CRT function is surprisingly fast.

0  0  0  0  0  0

1  1  1  1  1  1

2147483647  2147483647  2147483647  2147483647  2147483647  2147483647

1  1  1  1  1  1

2147483647  2147483647  2147483647  2147483647  2147483647  2147483647

2147483648  2147483648  2147483648  2147483648  2147483648  2147483648

804 cycles
432 cycles
426 cycles
510 cycles
406 cycles
1605 cycles




[attachment deleted by admin]
Title: Re: Is Negative
Post by: Rockoon on May 22, 2008, 04:46:16 AM
Quote from: herge on May 21, 2008, 07:44:46 PM

Hi drizz:

The AbsEAX macro works.
But I don't need 64 bits yet.

Thanks.


It doesnt do 64-bit values.

It just uses the property of the cdq instruction of sign extending all the way through the edx register, creating an all 1's mask in edx if eax is negative, or an all 0's mask if eax is positive.

Negation in twos complement:

take the NOT of the value, and then add 1.

or

subtract 1 from the value, then take the NOT


The mask can be used for both NOTing and adding/subtracting 1, conditionally, based on the state of eax. (an all 1's mask is equivilent to the value '-1', and xoring with the all 1's mask is equivilent to a NOT)

My methodology is the same, 'cept I wasnt exploiting the CDQ instruction (instead I was making a copy of the input and then doing an arithmetic shift by 31 to create the mask) .. I think i've been spending too much time in HLL's

Title: Re: Is Negative
Post by: Rockoon on May 22, 2008, 05:00:27 AM
Quote from: MichaelW on May 22, 2008, 04:37:45 AM
The cycle counts in the results below are for a total of 200 conversions, of alternating positive and negative values, running on a P3.

Alternating signs isnt a very good test of anything usefull.

The best way to test things like this is to make a list of typical inputs, then shuffle them all, then perform your test on each item in the list in the now randomly ordered sequence. Anything else biases in favor of branching versions because branch predictors love patterns, patterns that arent typical in the real world usage of such a function.
Title: Re: Is Negative
Post by: hutch-- on May 22, 2008, 05:02:29 AM
This is what I got with Michaels test. I added the macro name for each so I knew what was what.


0  0  0  0  0  0

1  1  1  1  1  1

2147483647  2147483647  2147483647  2147483647  2147483647  2147483647

1  1  1  1  1  1

2147483647  2147483647  2147483647  2147483647  2147483647  2147483647

2147483648  2147483648  2147483648  2147483648  2147483648  2147483648

761 cycles abs0
364 cycles abs1
403 cycles abs2
395 cycles abs3
455 cycles abs4
3601 cycles crt_abs

Press any key to exit...


ow all  wonder is if there would be a time difference with the TEST/Jxx code if the jump was taken or not.
Title: Re: Is Negative
Post by: MichaelW on May 22, 2008, 06:55:51 AM
Quote from: Rockoon on May 22, 2008, 05:00:27 AM
Alternating signs isnt a very good test of anything usefull.

The best way to test things like this is to make a list of typical inputs, then shuffle them all, then perform your test on each item in the list in the now randomly ordered sequence. Anything else biases in favor of branching versions because branch predictors love patterns, patterns that arent typical in the real world usage of such a function.

Good point. I changed the code so it now uses a random sequence of inputs, repeating the same sequence for each of the tests. On a P3 the cycle counts for the macros did not change significantly, and the ranking remained the same, but the random inputs caused the cycle counts for the CRT function to more than double.

800 cycles, abs0
450 cycles, abs1
444 cycles, abs2
516 cycles, abs3
401 cycles, abs4
3853 cycles, crt_abs


I have updated the attachment.

EDIT: Removed code that had absolutely nothing to do with the subject at hand  :red
Title: Re: Is Negative
Post by: jj2007 on May 22, 2008, 09:48:22 AM
Sizewise the abs4 is also a clear winner. The high level macro (.if eax!>80000000h) scores not too bad, either - but I notice a certain volatility of timings.

1130 cycles, abs0
500 cycles, abs1      ; 9 bytes
523 cycles, abs2      ; 9 bytes
487 cycles, abs3      ; 9 bytes
407 cycles, abs4      ; 5 bytes
375 cycles, abs5      ; 9 bytes
207 cycles, abs6 = nop
3715 cycles, crt_abs   ; 10 bytes

crt_abs means a lot of work ;-)


00401EB8  |. B8 00000000   mov eax, 0
00401EBD  |. 50            push eax                    ; /x = 0
00401EBE  |. FF15 20804000 call near dword ptr [<&msvc>; \labs

labs       8BFF            mov edi, edi                ; ntdll.7C910738
77C36BD2   55              push ebp
77C36BD3   8BEC            mov ebp, esp
77C36BD5   8B45 08         mov eax, dword ptr [ebp+8]
77C36BD8   85C0            test eax, eax
77C36BDA   7D 02           jge short msvcrt.77C36BDE
77C36BDC   F7D8            neg eax ; [color=Red]not taken[/color]
77C36BDE   5D              pop ebp
77C36BDF   C3              retn

00401EC4  |. 83C4 04       add esp, 4
Title: Re: Is Negative
Post by: MichaelW on May 22, 2008, 10:48:56 AM
QuoteThe high level macro (.if eax!>80000000h) scores not too bad, either - but I notice a certain volatility of timings.

How would .if eax!>80000000h be used, and what is in abs5? By volatility I assume you mean that the counts vary from run to run. There will always be some variation, and the longer the test the larger the absolute variation. Running on a P3, if I reduce the repeat count to 20, I get the following cycle counts for 4 consecutive runs:

74 74 73 74
43 42 42 43
42 42 42 42
47 47 46 47
38 38 38 38
291 291 291 290


I expect other processors will show more variation.

Title: Re: Is Negative
Post by: herge on May 22, 2008, 12:03:55 PM

Hi All:

Mac Cycle
abs0 802
abs1 511
abs2 436
abs3 532
abs4 411
crt_ 4898

It appears abs4 beats all.


    abs4 MACRO; AbsEAX
      cdq
      xor eax,edx
      sub eax,edx
    ENDM


Thanks again drizz.

Title: Re: Is Negative
Post by: hutch-- on May 22, 2008, 12:22:08 PM
If someone has the time, could they plug this into the benchmark, it sems to work OK. It has 5 instructions but no slow ones so it may perform OK.


    mov eax, 100
    mov ecx, eax
    neg ecx
    test eax, eax
    cmovs eax, ecx

    print str$(eax),13,10

    mov eax, -100
    mov ecx, eax
    neg ecx
    test eax, eax
    cmovs eax, ecx

    print str$(eax),13,10


this is the output.


100
100
Press any key to continue ...


LATER: I added the idea into Michael's latest test piece but its not fast enough at least on my old Northwood.


540 cycles, abs0    ; <<<< I used this macro for it.
503 cycles, abs1
537 cycles, abs2
393 cycles, abs3
489 cycles, abs4
4171 cycles, crt_abs

Press any key to exit...


This is the substitute macro.


    abs0 MACRO
      mov ecx, eax
      neg ecx
      test eax, eax
      cmovs eax, ecx
    ENDM


Here is the result on my 3.2 gig Prescott.


585 cycles, abs0
485 cycles, abs1
512 cycles, abs2
442 cycles, abs3
344 cycles, abs4
3825 cycles, crt_abs

Press any key to exit...


All of these algos are subject to hardware ariation it would seem.

LATER AGAIN: This seems to have a bit more legs but is not the fastest on either PIV.


    abs0 MACRO
      add eax, 0
      js @F
      neg eax
    @@:


Old Northwood PIV.


442 cycles, abs0
535 cycles, abs1
475 cycles, abs2
393 cycles, abs3
444 cycles, abs4
4314 cycles, crt_abs

Press any key to exit...


and on the 3.2 gig Prescott,


433 cycles, abs0
519 cycles, abs1
504 cycles, abs2
444 cycles, abs3
344 cycles, abs4
3825 cycles, crt_abs

Press any key to exit...


Title: Re: Is Negative
Post by: Jimg on May 22, 2008, 01:36:41 PM
Note: this test is totally invaild for Intel users-
Would someone with an AMD try this?  I vowed to stay out of these cycle wars because AMD just doesn't time the same, but I couldn't resist.

I've run this 10 times with the same general results-

Code:
0  0  0  0  0  0  0

1  1  1  1  1  1  1

2147483647  2147483647  2147483647  2147483647  2147483647  2147483647  21474836
47

1  1  1  1  1  1  1

2147483647  2147483647  2147483647  2147483647  2147483647  2147483647  21474836
47

-2147483648  -2147483648  -2147483648  -2147483648  -2147483648  -2147483648  -2
147483648

764 cycles, abs0
400 cycles, abs1
402 cycles, abs2
400 cycles, abs3
385 cycles, abs4
347 cycles, abs5
2499 cycles, crt_abs

881 cycles, abs0
396 cycles, abs1
407 cycles, abs2
400 cycles, abs3
386 cycles, abs4
349 cycles, abs5
2482 cycles, crt_abs

852 cycles, abs0
394 cycles, abs1
404 cycles, abs2
404 cycles, abs3
382 cycles, abs4
349 cycles, abs5
2470 cycles, crt_abs

732 cycles, abs0
403 cycles, abs1
395 cycles, abs2
412 cycles, abs3
373 cycles, abs4
357 cycles, abs5
2486 cycles, crt_abs

733 cycles, abs0
398 cycles, abs1
392 cycles, abs2
404 cycles, abs3
376 cycles, abs4
345 cycles, abs5
2486 cycles, crt_abs

Press any key to exit...

What am I doing wrong here?
Title: Re: Is Negative
Post by: MichaelW on May 22, 2008, 05:36:50 PM
I corrected a problem with the random number generator not being re-seeded between the abs4 and abs5 tests, which was causing the abs5 test to use a different input sequence than the other tests. I also added Hutch's macros (note that the first one fails the 1 and -1 function tests), added the user names to the results, and reduced the repeat count to 20 to shorten the duration of each test loop (the idea being fewer opportunities for interruptions).

Typical results on my P3, Windows 2000 system:

74 cycles, abs0, herge
42 cycles, abs1, evlcrn8
54 cycles, abs2, jimg
47 cycles, abs3, rockoon
38 cycles, abs4, drizz
40 cycles, abs5, jj2007
60 cycles, abs6, hutch1
37 cycles, abs7, hutch2
291 cycles, crt_abs

74 cycles, abs0, herge
52 cycles, abs1, evlcrn8
42 cycles, abs2, jimg
47 cycles, abs3, rockoon
38 cycles, abs4, drizz
40 cycles, abs5, jj2007
60 cycles, abs6, hutch1
37 cycles, abs7, hutch2
291 cycles, crt_abs

74 cycles, abs0, herge
42 cycles, abs1, evlcrn8
42 cycles, abs2, jimg
47 cycles, abs3, rockoon
38 cycles, abs4, drizz
40 cycles, abs5, jj2007
60 cycles, abs6, hutch1
37 cycles, abs7, hutch2
291 cycles, crt_abs

86 cycles, abs0, herge
42 cycles, abs1, evlcrn8
42 cycles, abs2, jimg
47 cycles, abs3, rockoon
38 cycles, abs4, drizz
40 cycles, abs5, jj2007
60 cycles, abs6, hutch1
38 cycles, abs7, hutch2
297 cycles, crt_abs

74 cycles, abs0, herge
42 cycles, abs1, evlcrn8
43 cycles, abs2, jimg
47 cycles, abs3, rockoon
38 cycles, abs4, drizz
40 cycles, abs5, jj2007
60 cycles, abs6, hutch1
37 cycles, abs7, hutch2
291 cycles, crt_abs


Another possibility, if you were willing to accept some risk of buggy code crashing Windows, would be REALTIME_PRIORITY_CLASS. This did not significantly improve the consistency of my results.



[attachment deleted by admin]
Title: Re: Is Negative
Post by: jj2007 on May 22, 2008, 06:41:18 PM
Quote from: MichaelW on May 22, 2008, 10:48:56 AM
QuoteThe high level macro (.if eax!>80000000h) scores not too bad, either - but I notice a certain volatility of timings.

How would .if eax!>80000000h be used, and what is in abs5? By volatility I assume you mean that the counts vary from run to run


I meant the simple
  .if eax>=80000000h
       neg eax
  .endif
...which translates, if I remember well, to
cmp eax, 80000000h
jl @f
neg eax
@@:
Title: Re: Is Negative
Post by: Jimg on May 22, 2008, 07:09:54 PM
Hi Michael-
When you set the one up with my name, you copied abs1 into it instead of the code I presented.  Here's the same one with the correct code-
0  0  0  0  0  0  0  0  0

1  1  1  1  1  1  1  -1  1

2147483647  2147483647  2147483647  2147483647  2147483647  2147483647  21474836
47  -2147483647  2147483647

1  1  1  1  1  1  1  -1  1

2147483647  2147483647  2147483647  2147483647  2147483647  2147483647  21474836
47  -2147483647  2147483647

-2147483648  -2147483648  -2147483648  -2147483648  -2147483648  -2147483648  -2
147483648  -2147483648  -2147483648

64 cycles, abs0, herge
37 cycles, abs1, evlcrn8
34 cycles, abs2, jimg
44 cycles, abs3, rockoon
33 cycles, abs4, drizz
34 cycles, abs5, jj2007
40 cycles, abs6, hutch1
31 cycles, abs7, hutch2
185 cycles, crt_abs

64 cycles, abs0, herge
37 cycles, abs1, evlcrn8
48 cycles, abs2, jimg
37 cycles, abs3, rockoon
33 cycles, abs4, drizz
34 cycles, abs5, jj2007
40 cycles, abs6, hutch1
31 cycles, abs7, hutch2
195 cycles, crt_abs

64 cycles, abs0, herge
37 cycles, abs1, evlcrn8
33 cycles, abs2, jimg
37 cycles, abs3, rockoon
33 cycles, abs4, drizz
33 cycles, abs5, jj2007
40 cycles, abs6, hutch1
31 cycles, abs7, hutch2
185 cycles, crt_abs

65 cycles, abs0, herge
51 cycles, abs1, evlcrn8
34 cycles, abs2, jimg
37 cycles, abs3, rockoon
33 cycles, abs4, drizz
33 cycles, abs5, jj2007
49 cycles, abs6, hutch1
31 cycles, abs7, hutch2
185 cycles, crt_abs

64 cycles, abs0, herge
37 cycles, abs1, evlcrn8
33 cycles, abs2, jimg
37 cycles, abs3, rockoon
33 cycles, abs4, drizz
34 cycles, abs5, jj2007
40 cycles, abs6, hutch1
31 cycles, abs7, hutch2
195 cycles, crt_abs

Press any key to exit...
Title: Re: Is Negative
Post by: NightWare on May 22, 2008, 10:28:34 PM
hi,
2 more to test :
abs8 MACRO _Operand_:REQ
      test _Operand_,_Operand_
      jns @F
      neg _Operand_
    @@:
    ENDM

abs9 MACRO _Operand_:REQ
      bt _Operand_,31
      jnc @F
      neg _Operand_
    @@:
    ENDM


hutch, js @F (avoid neg if signed ?)

jimg, abs2 = abs5 in your test (but different result...)
Title: Re: Is Negative
Post by: Jimg on May 22, 2008, 11:20:55 PM
Drat.   Ok, I'll let Michael sort it out.  I just knew the one with my name on it was the wrong one.
Title: Re: Is Negative
Post by: hutch-- on May 23, 2008, 06:52:55 AM
I don't know why but I am getting very large variations in the timings in the last test piece.


80 cycles, abs0, herge
51 cycles, abs1, evlcrn8
69 cycles, abs2, jimg
38 cycles, abs3, rockoon
32 cycles, abs4, drizz
12 cycles, abs5, jj2007
50 cycles, abs6, hutch1
29 cycles, abs7, hutch2
390 cycles, crt_abs

71 cycles, abs0, herge
71 cycles, abs1, evlcrn8
41 cycles, abs2, jimg
38 cycles, abs3, rockoon
32 cycles, abs4, drizz
41 cycles, abs5, jj2007
38 cycles, abs6, hutch1
29 cycles, abs7, hutch2
378 cycles, crt_abs

68 cycles, abs0, herge
20 cycles, abs1, evlcrn8
37 cycles, abs2, jimg
28 cycles, abs3, rockoon
32 cycles, abs4, drizz
13 cycles, abs5, jj2007
38 cycles, abs6, hutch1
14 cycles, abs7, hutch2
377 cycles, crt_abs

103 cycles, abs0, herge
45 cycles, abs1, evlcrn8
37 cycles, abs2, jimg
28 cycles, abs3, rockoon
32 cycles, abs4, drizz
13 cycles, abs5, jj2007
43 cycles, abs6, hutch1
14 cycles, abs7, hutch2
390 cycles, crt_abs

69 cycles, abs0, herge
45 cycles, abs1, evlcrn8
12 cycles, abs2, jimg
40 cycles, abs3, rockoon
32 cycles, abs4, drizz
13 cycles, abs5, jj2007
48 cycles, abs6, hutch1
33 cycles, abs7, hutch2
390 cycles, crt_abs

Press any key to exit...
Title: Re: Is Negative
Post by: jj2007 on May 23, 2008, 09:18:41 AM
Quote from: hutch-- on May 23, 2008, 06:52:55 AM
I don't know why but I am getting very large variations in the timings in the last test piece.

First, the good news: On Hutch's average, my code beats all the others. The bad news is that it isn't my code, since I actually proposed the rather ordinary
    .if eax>=80000000h
       neg eax
    .endif

Never mind, I'll bear the false honour with dignity. But jokes apart: The variations are indeed very significant. I added the two by Nightware...


    abs4 MACRO ; Drizz 5 bytes
      cdq
      xor eax,edx
      sub eax,edx
    ENDM

    abs5 Macro ; jimg 6 bytes
    or eax, eax
    .if sign? ; jns @F
        neg eax
    .endif 
    endm

    abs5jj Macro ; jj 9 bytes
    .if eax>=80000000h
       neg eax
    .endif
    endm

    abs6 MACRO ; Hutch1 9 bytes
      mov ecx, eax
      neg ecx
      test eax, eax
      cmovs eax, ecx
    ENDM

    abs7 MACRO ; Hutch2 7 bytes
      add eax, 0
      js @F
      neg eax
    @@:
    ENDM

    abs8 Macro ; Nightware 6 bytes
    test eax, eax
    .if sign? ; jns @F
        neg eax
    .endif 
    endm

    abs9 Macro ; Nightware 6 bytes
    bt eax, 31
    jnc @F
    neg eax
  @@:
    endm



... and get these benchmarks:

71 cycles, abs0, herge
41 cycles, abs1, evlcrn8
41 cycles, abs2, jimg
41 cycles, abs3, rockoon
32 cycles, abs4, drizz
13 cycles, abs5
45 cycles, abs5, jj2007
39 cycles, abs6, hutch1
31 cycles, abs7, hutch2
13 cycles, abs8
51 cycles, abs9
390 cycles, crt_abs

69 cycles, abs0, herge
41 cycles, abs1, evlcrn8
41 cycles, abs2, jimg
29 cycles, abs3, rockoon
32 cycles, abs4, drizz
37 cycles, abs5
45 cycles, abs5, jj2007
38 cycles, abs6, hutch1
33 cycles, abs7, hutch2
37 cycles, abs8
37 cycles, abs9
380 cycles, crt_abs

69 cycles, abs0, herge
33 cycles, abs1, evlcrn8
13 cycles, abs2, jimg
38 cycles, abs3, rockoon
33 cycles, abs4, drizz
13 cycles, abs5
37 cycles, abs5, jj2007
38 cycles, abs6, hutch1
16 cycles, abs7, hutch2
41 cycles, abs8
47 cycles, abs9
378 cycles, crt_abs

75 cycles, abs0, herge
55 cycles, abs1, evlcrn8
41 cycles, abs2, jimg
28 cycles, abs3, rockoon
32 cycles, abs4, drizz
37 cycles, abs5
51 cycles, abs5, jj2007
39 cycles, abs6, hutch1
32 cycles, abs7, hutch2
38 cycles, abs8
37 cycles, abs9
380 cycles, crt_abs

71 cycles, abs0, herge
45 cycles, abs1, evlcrn8
13 cycles, abs2, jimg
38 cycles, abs3, rockoon
32 cycles, abs4, drizz
41 cycles, abs5
41 cycles, abs5, jj2007
38 cycles, abs6, hutch1
33 cycles, abs7, hutch2
51 cycles, abs8
47 cycles, abs9
384 cycles, crt_abs


jimg made it in only 13 cycles, but not for long ;-)

In the absence of a clear winner, let's go for a "diplomatic" solution: Take the product of size * speed... congrats, drizz  :cheekygreen:
Title: Re: Is Negative
Post by: MichaelW on May 23, 2008, 02:10:23 PM
Because I doubt that this could be any more confusing than it is now, I reworked the test to use the second set of macros, the ones that start a new time slice at the start of the loops and capture the lowest cycle count that occurs in any loop. I also went back over the thread and attempted to get the names straight. Running on my P3 with the repeat count set to 20 the repeatability for the macro code is near perfect.

abs0    0 1 2147483647 1 2147
abs1    0 1 2147483647 1 2147
abs2    0 1 2147483647 1 2147
abs3t   0 1 2147483647 1 2147
abs4    0 1 2147483647 1 2147
abs5    0 1 2147483647 1 2147
abs6    0 1 2147483647 1 2147
abs7    0 1 2147483647 1 2147
abs8    0 1 2147483647 1 2147
abs9    0 1 2147483647 1 2147
crtabs  0 1 2147483647 1 2147

80 cycles, abs0 (herge)
47 cycles, abs1 (evlncrn8)
45 cycles, abs2 (jimg)
55 cycles, abs3 (Rockoon)
45 cycles, abs4 (drizz)
47 cycles, abs5 (jj2007)
70 cycles, abs6 (hutch1)
45 cycles, abs7 (hutch2)
45 cycles, abs8 (NightWare1)
48 cycles, abs9 (NightWare2)
228 cycles, crt_abs

80 cycles, abs0 (herge)
47 cycles, abs1 (evlncrn8)
45 cycles, abs2 (jimg)
55 cycles, abs3 (Rockoon)
45 cycles, abs4 (drizz)
47 cycles, abs5 (jj2007)
70 cycles, abs6 (hutch1)
45 cycles, abs7 (hutch2)
45 cycles, abs8 (NightWare1)
48 cycles, abs9 (NightWare2)
282 cycles, crt_abs

80 cycles, abs0 (herge)
47 cycles, abs1 (evlncrn8)
45 cycles, abs2 (jimg)
55 cycles, abs3 (Rockoon)
45 cycles, abs4 (drizz)
47 cycles, abs5 (jj2007)
70 cycles, abs6 (hutch1)
45 cycles, abs7 (hutch2)
45 cycles, abs8 (NightWare1)
48 cycles, abs9 (NightWare2)
282 cycles, crt_abs

80 cycles, abs0 (herge)
47 cycles, abs1 (evlncrn8)
45 cycles, abs2 (jimg)
55 cycles, abs3 (Rockoon)
45 cycles, abs4 (drizz)
47 cycles, abs5 (jj2007)
70 cycles, abs6 (hutch1)
45 cycles, abs7 (hutch2)
45 cycles, abs8 (NightWare1)
48 cycles, abs9 (NightWare2)
282 cycles, crt_abs

80 cycles, abs0 (herge)
47 cycles, abs1 (evlncrn8)
45 cycles, abs2 (jimg)
55 cycles, abs3 (Rockoon)
45 cycles, abs4 (drizz)
47 cycles, abs5 (jj2007)
70 cycles, abs6 (hutch1)
45 cycles, abs7 (hutch2)
45 cycles, abs8 (NightWare1)
48 cycles, abs9 (NightWare2)
282 cycles, crt_abs


On a P4 I expect the cycle counts will always be a multiple of 4.


[attachment deleted by admin]
Title: Re: Is Negative
Post by: hutch-- on May 23, 2008, 02:53:40 PM
What have I done wrong here ? This is the test piece I used which with 1 + -1 both returned 1.


    mov eax, 1
    add eax, 0
    jns @F
    neg eax
  @@:
    print str$(eax),13,10

    mov eax, -1
    add eax, 0
    jns @F
    neg eax
  @@:
    print str$(eax),13,10


Result


1
1
Press any key to continue ...
Title: Re: Is Negative
Post by: Tedd on May 23, 2008, 03:18:25 PM
They both should return 1 ..?

abs(1) = 1
abs(-1) = 1
Title: Re: Is Negative
Post by: MichaelW on May 23, 2008, 04:21:00 PM
I found the problem. I copied the code as it was posted in reply #19, without analyzing it, even after it failed the function tests. I should have assumed that there must have been an error in transit, and fixed it.

abs0 MACRO
add eax, 0
js @F
neg eax
@@:


I have corrected the problem and posted new results and a new attachment.
Title: Re: Is Negative
Post by: Jimg on May 23, 2008, 05:20:44 PM
how about-
    abs10 Macro
        mov edx,eax
        neg edx
        cmovns eax,edx
    endm

abs0    0 1 2147483647 1 -2147483648  size=16
abs1    0 1 2147483647 1 -2147483648  size=9
abs2    0 1 2147483647 1 -2147483648  size=6
abs3t   0 1 2147483647 1 -2147483648  size=11
abs4    0 1 2147483647 1 -2147483648  size=5
abs5    0 1 2147483647 1 -2147483648  size=9
abs6    0 1 2147483647 1 -2147483648  size=9
abs7    0 1 2147483647 1 -2147483648  size=7
abs8    0 1 2147483647 1 -2147483648  size=6
abs9    0 1 2147483647 1 -2147483648  size=8
abs10   0 1 2147483647 1 -2147483648  size=7
crtabs  0 1 2147483647 1 -2147483648  size=10

74 cycles, abs0 (herge)
38 cycles, abs1 (evlncrn8)
35 cycles, abs2 (jimg)
40 cycles, abs3 (Rockoon)
36 cycles, abs4 (drizz)
38 cycles, abs5 (jj2007)
42 cycles, abs6 (hutch1)
34 cycles, abs7 (hutch2)
36 cycles, abs8 (NightWare1)
38 cycles, abs9 (NightWare2)
31 cycles, abs10 (jimg2)
190 cycles, crtabs (crt_abs)

74 cycles, abs0 (herge)
38 cycles, abs1 (evlncrn8)
35 cycles, abs2 (jimg)
40 cycles, abs3 (Rockoon)
36 cycles, abs4 (drizz)
38 cycles, abs5 (jj2007)
42 cycles, abs6 (hutch1)
34 cycles, abs7 (hutch2)
35 cycles, abs8 (NightWare1)
38 cycles, abs9 (NightWare2)
31 cycles, abs10 (jimg2)
190 cycles, crtabs (crt_abs)

74 cycles, abs0 (herge)
38 cycles, abs1 (evlncrn8)
35 cycles, abs2 (jimg)
40 cycles, abs3 (Rockoon)
36 cycles, abs4 (drizz)
38 cycles, abs5 (jj2007)
42 cycles, abs6 (hutch1)
34 cycles, abs7 (hutch2)
35 cycles, abs8 (NightWare1)
39 cycles, abs9 (NightWare2)
31 cycles, abs10 (jimg2)
190 cycles, crtabs (crt_abs)

74 cycles, abs0 (herge)
38 cycles, abs1 (evlncrn8)
36 cycles, abs2 (jimg)
40 cycles, abs3 (Rockoon)
36 cycles, abs4 (drizz)
38 cycles, abs5 (jj2007)
42 cycles, abs6 (hutch1)
34 cycles, abs7 (hutch2)
35 cycles, abs8 (NightWare1)
39 cycles, abs9 (NightWare2)
31 cycles, abs10 (jimg2)
190 cycles, crtabs (crt_abs)

74 cycles, abs0 (herge)
38 cycles, abs1 (evlncrn8)
35 cycles, abs2 (jimg)
40 cycles, abs3 (Rockoon)
36 cycles, abs4 (drizz)
38 cycles, abs5 (jj2007)
42 cycles, abs6 (hutch1)
34 cycles, abs7 (hutch2)
36 cycles, abs8 (NightWare1)
38 cycles, abs9 (NightWare2)
31 cycles, abs10 (jimg2)
190 cycles, crtabs (crt_abs)

Press any key to exit...


Michael-  The results seem very consistant, probably no need for 5 loops now.  Also, I used a new macro.

[attachment deleted by admin]
Title: Re: Is Negative
Post by: hutch-- on May 24, 2008, 01:27:33 AM
I am not sure why I still get such a wide variation, it may just be the vaguries of a PIV with short tested code of this type. Where Michael's PIII and Jims AMD both seem to produce reasonably reliable timings, mine ar all over the place.


80 cycles, abs0 (herge)
44 cycles, abs1 (evlncrn8)
44 cycles, abs2 (jimg)
36 cycles, abs3 (Rockoon)
44 cycles, abs4 (drizz)
48 cycles, abs5 (jj2007)
48 cycles, abs6 (hutch1)
28 cycles, abs7 (hutch2)
48 cycles, abs8 (NightWare1)
56 cycles, abs9 (NightWare2)
368 cycles, crt_abs

120 cycles, abs0 (herge)
48 cycles, abs1 (evlncrn8)
44 cycles, abs2 (jimg)
36 cycles, abs3 (Rockoon)
44 cycles, abs4 (drizz)
48 cycles, abs5 (jj2007)
48 cycles, abs6 (hutch1)
48 cycles, abs7 (hutch2)
44 cycles, abs8 (NightWare1)
56 cycles, abs9 (NightWare2)
372 cycles, crt_abs

76 cycles, abs0 (herge)
48 cycles, abs1 (evlncrn8)
44 cycles, abs2 (jimg)
36 cycles, abs3 (Rockoon)
32 cycles, abs4 (drizz)
48 cycles, abs5 (jj2007)
48 cycles, abs6 (hutch1)
48 cycles, abs7 (hutch2)
44 cycles, abs8 (NightWare1)
56 cycles, abs9 (NightWare2)
372 cycles, crt_abs

80 cycles, abs0 (herge)
48 cycles, abs1 (evlncrn8)
44 cycles, abs2 (jimg)
36 cycles, abs3 (Rockoon)
44 cycles, abs4 (drizz)
48 cycles, abs5 (jj2007)
48 cycles, abs6 (hutch1)
48 cycles, abs7 (hutch2)
44 cycles, abs8 (NightWare1)
56 cycles, abs9 (NightWare2)
368 cycles, crt_abs

80 cycles, abs0 (herge)
48 cycles, abs1 (evlncrn8)
44 cycles, abs2 (jimg)
36 cycles, abs3 (Rockoon)
44 cycles, abs4 (drizz)
48 cycles, abs5 (jj2007)
48 cycles, abs6 (hutch1)
48 cycles, abs7 (hutch2)
44 cycles, abs8 (NightWare1)
56 cycles, abs9 (NightWare2)
368 cycles, crt_abs

Press any key to exit...


Jim's version with the extra macro.


80 cycles, abs0 (herge)
32 cycles, abs1 (evlncrn8)
44 cycles, abs2 (jimg)
36 cycles, abs3 (Rockoon)
44 cycles, abs4 (drizz)
48 cycles, abs5 (jj2007)
48 cycles, abs6 (hutch1)
48 cycles, abs7 (hutch2)
20 cycles, abs8 (NightWare1)
56 cycles, abs9 (NightWare2)
40 cycles, abs10 (jimg2)
432 cycles, crtabs (crt_abs)

80 cycles, abs0 (herge)
48 cycles, abs1 (evlncrn8)
48 cycles, abs2 (jimg)
36 cycles, abs3 (Rockoon)
44 cycles, abs4 (drizz)
48 cycles, abs5 (jj2007)
48 cycles, abs6 (hutch1)
16 cycles, abs7 (hutch2)
44 cycles, abs8 (NightWare1)
56 cycles, abs9 (NightWare2)
40 cycles, abs10 (jimg2)
432 cycles, crtabs (crt_abs)

80 cycles, abs0 (herge)
48 cycles, abs1 (evlncrn8)
44 cycles, abs2 (jimg)
36 cycles, abs3 (Rockoon)
44 cycles, abs4 (drizz)
48 cycles, abs5 (jj2007)
48 cycles, abs6 (hutch1)
48 cycles, abs7 (hutch2)
44 cycles, abs8 (NightWare1)
56 cycles, abs9 (NightWare2)
40 cycles, abs10 (jimg2)
428 cycles, crtabs (crt_abs)

80 cycles, abs0 (herge)
48 cycles, abs1 (evlncrn8)
44 cycles, abs2 (jimg)
36 cycles, abs3 (Rockoon)
44 cycles, abs4 (drizz)
52 cycles, abs5 (jj2007)
48 cycles, abs6 (hutch1)
48 cycles, abs7 (hutch2)
44 cycles, abs8 (NightWare1)
56 cycles, abs9 (NightWare2)
40 cycles, abs10 (jimg2)
424 cycles, crtabs (crt_abs)

80 cycles, abs0 (herge)
48 cycles, abs1 (evlncrn8)
44 cycles, abs2 (jimg)
36 cycles, abs3 (Rockoon)
44 cycles, abs4 (drizz)
32 cycles, abs5 (jj2007)
48 cycles, abs6 (hutch1)
48 cycles, abs7 (hutch2)
44 cycles, abs8 (NightWare1)
56 cycles, abs9 (NightWare2)
40 cycles, abs10 (jimg2)
432 cycles, crtabs (crt_abs)

Press any key to exit...
Title: Re: Is Negative
Post by: MichaelW on May 24, 2008, 02:48:58 AM
Back when I still had a P4 I observed this problem and could find no way around it. The cycle counts being a multiple of 4, coupled with some Intel documents I have seen (and cannot find ATM), would seem to suggest that the TSC is updated in step with the external clock. I think this alone should account for an uncertainty in the counts of 8 or more cycles. Perhaps a reasonable solution for the P4 might be to use the second set of macros, and for each test, average the counts over 8-16 macro calls.
Title: Re: Is Negative
Post by: hutch-- on May 24, 2008, 04:41:40 AM
I coded up another benchmark with a style that I know runs on a PIV OK and got these results. This does 2 passes, one with "1" as the test number, the second with "-1".


-------------
positive pass
-------------
562 abs0 herge
579 abs1 evlncrn8
578 abs2 jimg 1
563 abs3 rockoon 1
563 abs4 rockoon 2
547 abs5 drizz
579 abs6 jj2007
562 abs7 hutch 1
578 abs8 hutch 2
578 abs9 Nightware 1
562 abs10 Nightware 2
547 abs11 jimg 2
-------------
negative pass
-------------
562 abs0 herge
563 abs1 evlncrn8
563 abs2 jimg 1
547 abs3 rockoon 1
563 abs4 rockoon 2
547 abs5 drizz
562 abs6 jj2007
547 abs7 hutch 1
547 abs8 hutch 2
578 abs9 Nightware 1
563 abs10 Nightware 2
563 abs11 jimg 2
Press any key to continue ...

[attachment deleted by admin]
Title: Re: Is Negative
Post by: herge on May 24, 2008, 07:19:20 AM

Hi All:

My results with hutch-
I have high numbers because I had a
game running.


-------------
positive pass
-------------
6870 abs0 herge
6890 abs1 evlncrn8
7922 abs2 jimg 1
11677 abs3 rockoon 1
6399 abs4 rockoon 2
6009 abs5 drizz
7441 abs6 jj2007
5898 abs7 hutch 1
7651 abs8 hutch 2
6659 abs9 Nightware 1
7671 abs10 Nightware 2
5157 abs11 jimg 2
-------------
negative pass
-------------
5548 abs0 herge
7591 abs1 evlncrn8
10986 abs2 jimg 1
9233 abs3 rockoon 1
5007 abs4 rockoon 2
6139 abs5 drizz
5799 abs6 jj2007
5828 abs7 hutch 1
6029 abs8 hutch 2
5117 abs9 Nightware 1
5978 abs10 Nightware 2
4607 abs11 jimg 2
Press any key to continue ...



Cheers.
Title: Re: Is Negative
Post by: sinsi on May 24, 2008, 07:40:17 AM
G'day herge

This line made me laugh
QuoteI have high numbers because I had a
game running.

Me too (plus media player 11), but here are my numbers

-------------
positive pass
-------------
500 abs0 herge
562 abs1 evlncrn8
500 abs2 jimg 1
375 abs3 rockoon 1
375 abs4 rockoon 2
438 abs5 drizz
563 abs6 jj2007
375 abs7 hutch 1
500 abs8 hutch 2
500 abs9 Nightware 1
563 abs10 Nightware 2
391 abs11 jimg 2
-------------
negative pass
-------------
437 abs0 herge
437 abs1 evlncrn8
375 abs2 jimg 1
375 abs3 rockoon 1
375 abs4 rockoon 2
438 abs5 drizz
437 abs6 jj2007
375 abs7 hutch 1
375 abs8 hutch 2
375 abs9 Nightware 1
438 abs10 Nightware 2
375 abs11 jimg 2


Do we count a quad-core q6600 as a p4?
I've noticed that quite a few of these timing posts seem to rely a lot on the processor speed - I'm getting a bit disheartened when a 3GHz CPU can beat my quad  :bdg
Title: Re: Is Negative
Post by: MichaelW on May 24, 2008, 08:33:30 AM
The attachment is a test piece that hopefully will minimize the variations for a P4. It does essentially what I described in reply #34.



[attachment deleted by admin]
Title: Re: Is Negative
Post by: jj2007 on May 24, 2008, 09:32:07 AM
Quote from: MichaelW on May 24, 2008, 08:33:30 AM
test piece that hopefully will minimize the variations for a P4.

It yields pretty stable results. Now we might ask what to choose as a Masm32 library candidate...
4 thoughts:
- function form, so that you can call it as mov MyMemLocation, Abs(esi)
- should not change any other registers (some of the candidates violate this condition)
- size should matter (5-9 bytes)
- speed should matter
Any other views?
Title: Re: Is Negative
Post by: MichaelW on May 24, 2008, 10:52:11 PM
Considering that the code ideally would execute in only a few clock cycles, I think it should be a macro of a form that would effectively provide an ABS opcode. As a basis for this I considered the four macros that were fastest on a P3, abs2, abs4, abs7, and abs8 in my last test, all at 45 cycles per 20 executions, and this count included a mov reg,immed that was not part of the macros. After eliminating those not suitable for memory operands and those that would affect more than one operand, I ended up with abs7, the second macro that hutch posted. The attachment is a test app, and these are the results on my P3:

0       1       2147483647      1       2147483648
0       1       2147483647      1       2147483648
0       1       32767   1       32768
0       1       32767   1       32768
0       1       127     1       128
0       1       127     1       128

45 cycles, abs eax
89 cycles, abs m32
157 cycles, abs ax
173 cycles, abs m16
42 cycles, abs al
79 cycles, abs m8

45 cycles, abs eax
89 cycles, abs m32
157 cycles, abs ax
173 cycles, abs m16
42 cycles, abs al
79 cycles, abs m8

45 cycles, abs eax
89 cycles, abs m32
157 cycles, abs ax
173 cycles, abs m16
42 cycles, abs al
79 cycles, abs m8

45 cycles, abs eax
89 cycles, abs m32
157 cycles, abs ax
173 cycles, abs m16
42 cycles, abs al
79 cycles, abs m8

45 cycles, abs eax
89 cycles, abs m32
157 cycles, abs ax
173 cycles, abs m16
42 cycles, abs al
79 cycles, abs m8



[attachment deleted by admin]
Title: Re: Is Negative
Post by: rags on May 25, 2008, 01:23:40 AM
Here is my results on a P-IV 2.5ghz:

0       1       2147483647      1       2147483648
0       1       2147483647      1       2147483648
0       1       32767   1       32768
0       1       32767   1       32768
0       1       127     1       128
0       1       127     1       128

40 cycles, abs eax
81 cycles, abs m32
43 cycles, abs ax
81 cycles, abs m16
53 cycles, abs al
85 cycles, abs m8

44 cycles, abs eax
85 cycles, abs m32
40 cycles, abs ax
85 cycles, abs m16
51 cycles, abs al
83 cycles, abs m8

42 cycles, abs eax
83 cycles, abs m32
41 cycles, abs ax
85 cycles, abs m16
50 cycles, abs al
83 cycles, abs m8

47 cycles, abs eax
86 cycles, abs m32
42 cycles, abs ax
82 cycles, abs m16
52 cycles, abs al
83 cycles, abs m8

51 cycles, abs eax
85 cycles, abs m32
44 cycles, abs ax
85 cycles, abs m16
50 cycles, abs al
84 cycles, abs m8
Title: Re: Is Negative
Post by: lingo on May 25, 2008, 02:35:30 PM
For me the fastest is next code from  A.Fog's book:

"22.4. Avoiding conditional jumps by using flags (all processors)

The most important jumps to eliminate are conditional jumps, especially if they are poorly predictable.
Sometimes it is possible to obtain the same effect as a branch by ingenious manipulation of bits and flags.
For example you may calculate the absolute value of a signed number without branching:
         CDQ
         XOR EAX,EDX
         SUB EAX,EDX
(On PPlain and PMMX, use MOV EDX,EAX / SAR EDX,31 instead of CDQ).
The carry flag is particularly useful for this kind of tricks:"


from "How to optimize for the Pentium family of microprocessors
Copyright © 1996, 2000 by Agner Fog"
Title: Re: Is Negative
Post by: hutch-- on May 25, 2008, 02:51:15 PM
Lingo,

I agree with the view but even with that code that Drizz posted, the branchless versions were no faster than the conditional jump versions. The CMOVxx versions were no faster either.
Title: Re: Is Negative
Post by: jj2007 on May 25, 2008, 05:13:04 PM
Quote from: lingo on May 25, 2008, 02:35:30 PM
For me the fastest is ...
         CDQ
         XOR EAX,EDX
         SUB EAX,EDX

Hi Lingo,
Your code is somewhat incomplete. Check your timings with this one:

   MOV eax, -123h
   push edx
   CDQ
   XOR EAX,EDX
   SUB EAX,EDX
   pop edx
Title: Re: Is Negative
Post by: MichaelW on May 25, 2008, 05:27:32 PM
My tests were using data that alternated randomly between positive and negative, on the assumption that this would make the jumps unpredictable, and if there was a difference it was too small to measure. Since conditional jumps, including unpredictable ones, are so common in code, it seems plausible that the processors would have been designed to run such code as efficiently as possible.
Title: Re: Is Negative
Post by: herge on May 26, 2008, 12:27:30 AM
 Hi All:


FOR mac,<abs0,abs1,abs2,abs3,abs4,abs5,abs6,abs7,abs8,abs9,abs10,abs11,crtabs>


I am having trouble finding where mac is defined.
Must admit I can't find it?

thanks in advance.
Title: Re: Is Negative
Post by: hutch-- on May 26, 2008, 12:32:52 AM
MAC is defined in the statement starting with,


FOR mac
Title: Re: Is Negative
Post by: MichaelW on May 26, 2008, 07:02:49 AM
See FOR Loops and Variable-Length Parameters, about half way down the page  here (http://webster.cs.ucr.edu/Page_TechDocs/MASMDoc/ProgrammersGuide/Chap_09.htm). I keep a copy of this page on my desktop.
Title: Re: Is Negative
Post by: herge on May 26, 2008, 07:16:25 AM

Hi MichaelW:

Thanks.
Title: Re: Is Negative
Post by: Don57 on June 09, 2008, 11:59:24 PM
How about using TEST and jumping on the flags