News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Fastest Absolute Function

Started by Twister, August 23, 2010, 09:31:34 PM

Previous topic - Next topic

Twister

Now we are going to get into some real competition. :bdg

abs proc uses edx dwNum:DWORD
mov eax, dwNum
mov edx, eax
and eax, 0F0000000h
cmp eax, edx
je @F
xor edx, 0F0000000h
xchg eax, edx
@@:
ret
abs endp

lingo

Why you use  "uses edx"? Hutch, it's contagious ...:lol
        pop ecx
        pop eax 
        cdq
        xor eax,edx 
        sub eax,edx
        jmp ecx

dioxin

Quotejmp ecx
You aren't really advocating that, for speed, you should pop a return address off the stack and jump to it, are you? That messes up the branch prediction mechanism which has short cuts for a paired CALL-RETURN.


clive

Quote from: GTX
abs proc uses edx dwNum:DWORD
mov eax, dwNum
mov edx, eax
and eax, 0F0000000h
cmp eax, edx
je @F
xor edx, 0F0000000h
xchg eax, edx
@@:
ret
abs endp


Whole boat load of fail there.

C:\MASM>test42 80000000
80000000 80000000

C:\MASM>test42 f0000000
F0000000 F0000000

C:\MASM>test42 FFFFFFFF
FFFFFFFF 0FFFFFFF

C:\MASM>test42 0
00000000 00000000

C:\MASM>test42 1
00000001 F0000001

C:\MASM>test42 2
00000002 F0000002
It could be a random act of randomness. Those happen a lot as well.

hutch--

Both objections are correct, EDX is a transient register so it does not need to be preserved. Paul's comment is also correct, CALL / RET are paired in hardware so jumping back to the following address in the calling proc messes up that pairing.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

there is so little code involved, i think it would be better implemented as a macro
the CALL/RET overhead isn't needed
assuming the value is already in EAX, the basic code is:

        cdq
        xor     eax,edx
        sub     eax,edx

i think that's 5 bytes - same length as a CALL   :bg
if you want to preserve EDX, then it's 7 bytes (not really necessary, but i can see times when it would be nice)
abs     MACRO
        push    edx
        cdq
        xor     eax,edx
        sub     eax,edx
        pop     edx
abs     ENDM


maybe you can have your cake and eat it, too
abs     MACRO
        push    edx
abs_edx MACRO
        cdq
        xor     eax,edx
        sub     eax,edx
abs_edx ENDM
        pop     edx
abs     ENDM


note: before the code-nazi accuses me of "stealing" his code
i think that is a well-known snippet
i previously published essentially the same code in my ling ling kai fang routines
notice how i did not accuse him - lol

lingo

dioxin and Hutch,

"That messes up the branch prediction mechanism"
It is a little bit wrong but I agree with you in general...Why? Pls, read this:

"B.5.3.3 Mispredicted Returns
19. Mispredicted Return Instruction Rate: BR_RET_MISSP_EXEC/BR_RET_EXEC
The processor has a special mechanism that tracks CALL-RETURN pairs. The
processor assumes that every CALL instruction has a matching RETURN instruction.
If a RETURN instruction restores a return address, which is not the one stored during
the matching CALL, the code incurs a misprediction penalty.
"


But I haven't RETURN instruction in my code, hence I haven't a misprediction penalty too;  :lol
and I'm not responsible for the testing program which has somewhere the matching/non-matching RETURN instruction... :lol

hutch--

 :bg

Lingo,

Now you have me worried, even if you don't have a RET, the caller must CALL and with your jump to the return address you mess up the CALL RET pairing.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

sinsi

>because, it doesn't
For 1 value it doesn't seem to work
        cdq
        xor     eax,edx
        sub     eax,edx

Try it for eax=80000000h
Light travels faster than sound, that's why some people seem bright until you hear them.

MichaelW

Perhaps I'm missing some key point here. How could branch prediction be an issue for code where there are no conditional branches?
eschew obfuscation

MichaelW

Well, I apparently am missing something.

;====================================================================
    include \masm32\include\masm32rt.inc
    .686
    include \masm32\macros\timers.asm
;====================================================================
    .data
    .code
;====================================================================

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
abs proc dwNum:DWORD
    mov eax, [esp+4]
    cdq
    xor eax,edx
    sub eax,edx
    ret 4
abs endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

;====================================================================

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
absL proc dwNum:DWORD
    pop ecx
    pop eax
    cdq
    xor eax,edx
    sub eax,edx
    jmp ecx
absL endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

;====================================================================
start:
;====================================================================

    invoke abs, 0
    print ustr$(eax),13,10
    invoke abs, 1
    print ustr$(eax),13,10
    invoke abs, -1
    print ustr$(eax),13,10
    invoke abs, 2147483647
    print ustr$(eax),13,10
    invoke abs, -2147483648
    print ustr$(eax),13,10,13,10

    invoke absL, 0
    print ustr$(eax),13,10
    invoke absL, 1
    print ustr$(eax),13,10
    invoke absL, -1
    print ustr$(eax),13,10
    invoke absL, 2147483647
    print ustr$(eax),13,10
    invoke absL, -2147483648
    print ustr$(eax),13,10,13,10

    invoke Sleep, 3000

    REPEAT 3

      counter_begin 1000, HIGH_PRIORITY_CLASS
          REPEAT 4
              invoke abs, -1
          ENDM
      counter_end
      print str$(eax)," cycles",13,10

      counter_begin 1000, HIGH_PRIORITY_CLASS
          REPEAT 4
              invoke absL, -1
          ENDM
      counter_end
      print str$(eax)," cycles",13,10

    ENDM
    print chr$(13,10)

    inkey "Press any key to exit..."
    exit
   
;====================================================================
end start


22 cycles
60 cycles
21 cycles
60 cycles
21 cycles
60 cycles

eschew obfuscation

sinsi

Is that your venerable PIII?
Q6600

15 cycles
13 cycles
15 cycles
13 cycles
15 cycles
13 cycles

I think agner has something about branch prediction, mostly 'ignore it on newer cpus'.

You cheat by using ustr$ since we are testing signed numbers.
80000000h gives a different result (hint: it starts with '-')
Light travels faster than sound, that's why some people seem bright until you hear them.

MichaelW

QuoteIs that your venerable PIII?

Yes, so what I'm missing here is a modern computer.

eschew obfuscation

NightWare

Quote from: lingo on August 24, 2010, 02:10:36 AM
"B.5.3.3 Mispredicted Returns
19. Mispredicted Return Instruction Rate: BR_RET_MISSP_EXEC/BR_RET_EXEC
The processor has a special mechanism that tracks CALL-RETURN pairs. The
processor assumes that every CALL instruction has a matching RETURN instruction.
If a RETURN instruction restores a return address, which is not the one stored during
the matching CALL, the code incurs a misprediction penalty.
"

::) i don't know where this stupidity come from, but there is NO special mecanism... the call instruction simply store the return address in the cache. so unless you need an enormous number of address in your function there is no possible misprediction (and if you have an enormous number of address then YOU WILL HAVE MISPREDICTIONS anyway). nothing to see with the use of "ret" instruction or not (in itself). so there is no misprediction with "jmp ecx"... but "jmp ecx" IS ALSO A STUPIDITY, lingo, stop with that... i'm pretty sure that all beginners quit their functions like that now... :'(
guys, USE 1 BYTE instructions, if they're there, there is a reason !!!

dedndave

#14
well, RET 4 is not a single byte
and JMP reg32 is probably the fastest of all near branches
but any speed advantage is probably more related to how the stack parms are loaded at the beginning of the routine
        pop     ecx
        pop     eax
.
.
        jmp     ecx

is logically faster than
        push    ebp
        mov     ebp,esp
        mov     eax,[ebp+8]
.
.
        pop     ebp
        ret     4

i doubt it saves all that much time
it makes the code, shall we say "non-standard", though
for this simple function (which i still hold is better as a macro), it seems ok
but for more complicated functions, there is a disadvantage in that the parm is gone
it may be accessed only once, and it doesn't necessarily work well for functions with a few parms
and in many cases, it is better to have the parm on the stack to access again

i do have a problem with calling the standard method "stupid" or "idiotic"
there is nothing wrong with learning to write functions in a straightforward mannar
do we really care if a few bytes are saved ? - not with todays computers
do we really care if a few clock cycles are saved ? - same answer, i think

if you stick a function inside a loop and call it 10 million times, then it matters
but, i may be inclined to put that loop inside the proc and call it with an iteration parm
well - that is if you are really concerned about speed

this is a good place for a macro, guys
the code of the function is the same length as CALL - doh !

note - i realized my error and edited my previous posts
it was either too much pot in the previous century or not enough coffee in this one