The xchg-statement in high level languages, how does it work?

Started by Ani_Skywalker, July 28, 2010, 10:16:35 PM

Previous topic - Next topic

Ani_Skywalker

Hi guys,

First of all, thank you all for all the awesome answers to my previous post. I am still struggling to understand the sample-code that jj2007 and Hutch- posted. In jj2007's case it is pretty straight forward but Hutch-'s piece is harder to melt. To figure it out I went back to my books and they in turn headed me this way.

Now, this question is about something completely different and I'm not sure this is the right forum for it. Here it goes, this is from one of my books:

QuoteThe 80×86 has a very useful xchg instruction that exchanges data in one location with data in another location. It accomplishes in a single instruction the operation that often requires three high-level language instructions. Suppose Value1 and Value2 are being exchanged. In a design or a high-level language, this might be done using

     Temp := Value1;     { swap Value1 and Value2 }
     Value1 := Value2;
     Value2 := Temp;

Assuming that Value1 is stored in the EAX register and Value2 is stored in the EBX register, the above swap can be coded as

     xchg eax, ebx      ; swap Value1 and Value2

Instead of using the xchg instruction, one could code

     mov   ecx, eax      ; swap Value1 and Value2
     mov   eax, ebx
     mov   ebx, ecx

However, each of these mov instructions takes one clock cycle and two bytes for a total of three clock cycles and six bytes of code; the xchg instruction requires only two bytes and two clock cycles (on a Pentium). In addition, it is much easier to write one instruction than three, and the resulting code is easier to understand.


The example above is indeed how we would implement an exchange in most high-level languages. Now, why is it like that? Why doesn't the high level compilers inherit the xchg-command as a keyword since it is easier to code? The point of high-level languages are to make it easier and faster to code in expense of execution speed, right? However, by the way we implement an exchange in HL-languages it looks like they are using 3 mov-statements for an exchange. My guess is that there is a very logical reason why we implement an exchange the way we do in high-level languages and maybe you can explain it for me?

KeepingRealBusy

You can't do memory to memory exchanges. What C does since it is working with memory variables is to load a variable into a register, then the register into a temp location, then the same thing with the second variable, but to the first variable's location, then finally move the temp variable into the second variable's location. You can drop into asm and do it yourself:


    __asm
    {
    mov eax,var1
    mov ebx,var2
    mov eax,var2
    mov ebx,var1
    }


But note that you do not need an xchg since the variables are all memory variables and must remain so. You can declare register variables, but the spec says that there MAY be honored and you do not have a clue which register the variable may be in. Even if you get a .cod file, the registers may change if the surrounding code changes, so you really have no clue.

Dave.

jj2007

Hi Ani,
Some HLL do have a Swap command - GfaBasic for example (for integers and arrays alike).
However, one reason might be that there is no direct xchg for a mem to mem operation, and that the native assembler command xchg reg32, [mem32] is exceptionally slow (note that xchg reg32, reg32 is fast).
Since HLL use memory variables most of the time, a direct xchg command would be very slow...

redskull

The author of your book makes many assumptions about how a HLL compiler will put together instructions, and under what circumstances.  Besides the fact that outright swapping of two variables is a rare occurance anyway, most programming is done via functions, which necessitates input and output being done via the memory; moving them into registers from memory, swapping them, and then moving them back would actually be slower.  Books that talk about optimization in an isolated, set-peice snippit like this just perpetuate the myth that CPU's and compilers still operate like it's 1984; the fact the author is still cycle counting on a Pentium merely should be your first red flag.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

Ani_Skywalker

#4
Confusion cleared out, thanks again guys! :bg

Maybe it is time to introduce myself a little, this way you might understand why I'm bothering you with my stupid questions:

I am a 27 years old programmer that started very late. I wrote my first piece when I were 21 years old (Matlab) and really loved programming. 24 years old I switched from a bachelor + master (in Sweden we read them as a 5 year degree, like a civil engineer) in physics  to a bachelor in computer science. Step by step I've moved closer to the core of the computer. Starting with HLLs like Matlab, Java, C# and Prolog I've worked my way to faster or lower level languages like C, C++, Ocaml and F#. I like to focus on three disciplines off programming more then others: Parallel programming, reverse engineering, embedded systems.

I started with assembly to learn reverse engineering and understand computer components better. But also to learn how different compilers "think" and how I can speed up my programs. This might sometimes lead me to ask questions that you feel are completely irrelevant to the assembly language but most of the time, their answers still does some sense to me. Not always thought, but that cases mostly occur when I miss-understood something previously.

Once more, thank you for your time. If you think you there is a programming task I might be useful for, just ask me. I would like to help you guys out with my time since you spend your time explaining stuff to me.

Ani_Skywalker

Quote from: redskull on July 28, 2010, 11:54:37 PM
The author of your book makes many assumptions about how a HLL compiler will put together instructions, and under what circumstances.  Besides the fact that outright swapping of two variables is a rare occurance anyway, most programming is done via functions, which necessitates input and output being done via the memory; moving them into registers from memory, swapping them, and then moving them back would actually be slower.  Books that talk about optimization in an isolated, set-peice snippit like this just perpetuate the myth that CPU's and compilers still operate like it's 1984; the fact the author is still cycle counting on a Pentium merely should be your first red flag.

-r

You're right. Problem is, I have not found any better suggestions then the books I am reading, of which this one being the last updated. Do you have any suggestions?

redskull

Quote from: Ani_Skywalker on July 29, 2010, 12:13:21 AM
You're right. Problem is, I have not found any better suggestions then the books I am reading, of which this one being the last updated. Do you have any suggestions?

Unfortunetly, I don't.  One of the biggest, if not *the* biggest problem in the ASM community is the rash of out-of-date material that still hangs out on the internet and in libraries.  I usually recommend Agner Fogs books for the inner-workings of the CPU, but they are both extremely dense, and not exactly the most readable things out there.  The best you can do is take everything you read with a hefty grain of salt, and verify it for yourself by studying listings and doing timings.  And, of course, if you ever have questions, everyone here will be more than happy to give you 50 different, conflicting answers!  :bg

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

oex

Quote from: redskull on July 29, 2010, 01:05:19 AM
everyone here will be more than happy to give you 50 different, conflicting answers!  :bg

I disagree.... :lol.... Serously though the 50 different conflicting answers you get here will be more beneficial to you that the wealth of misinformation you will find on the web :bg
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

hutch--

Ani,

The action is always understanding what is going on and with a value swap, the technology is very simple. Work out what for the data is stored in (registers, variables or both) and know the limitations of the instruction set which does not support memory to memory without an intermediate register. For historical reasons XCHG is very slow as Intel processors are RISC under the hood and XCHG is a legacy instruction that survives only in microcode.

REG to REG is the easiest and fastest.
MEM to REG or REG to MEM is next.
MEM to MEM takes an extra register and is usually slower as it has an extra memory operation to perform.

It can also be done with PUSH POP for non optimal requirements as it does not use registers at all but this technique is slower and should not be used in a critical algorithm.


PUSH mem1
PUSH mem2
POP mem1
POP mem2


I agree with red here, most of the published asm books are badly out of date and make obsolete assumptions in relation to the hardware. The only reliable technique is to write and time test pieces with the tests designed to closely emulate the conditions that the algo will be used under.

Here is a small test piece for you that shows how to swap memory operands with PUSH POP. It uses a simple macro.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    swapmem MACRO mem1,mem2
      push mem1
      push mem2
      pop mem1
      pop mem2
    ENDM

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL var1  :DWORD
    LOCAL var2  :DWORD

    mov var1, 1234
    mov var2, 5678

    print str$(var1)," var1",13,10
    print str$(var2)," var2",13,10

    swapmem var1,var2

    print "Memory operands swapped",13,10

    print str$(var1)," var1",13,10
    print str$(var2)," var2",13,10

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

PBrennick

I  like using the stack for these type of situations, also.

Paul
The GeneSys Project is available from:
The Repository or My crappy website

dedndave


FORTRANS

Hi,

QuoteFor historical reasons XCHG is very slow as Intel processors are RISC under the hood and XCHG is a legacy instruction that survives only in microcode.

   Actually it is the guarantee of atomic behavior that is the
major reason that XCHG is slow.  IIRC.  The entire read-
modify-write operation must complete before any other
instruction can be executed.

Regards,

Steve N.

KeepingRealBusy

Doesn't the real slow down only apply to reg mem (xchg eax,loc1)?

Dave.

hutch--

 :bg

Test it and find out, its the most reliable method.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Ani_Skywalker

#14
Quote from: hutch-- on July 29, 2010, 02:02:30 AM
Ani,

The action is always understanding what is going on and with a value swap, the technology is very simple. Work out what for the data is stored in (registers, variables or both) and know the limitations of the instruction set which does not support memory to memory without an intermediate register. For historical reasons XCHG is very slow as Intel processors are RISC under the hood and XCHG is a legacy instruction that survives only in microcode.

REG to REG is the easiest and fastest.
MEM to REG or REG to MEM is next.
MEM to MEM takes an extra register and is usually slower as it has an extra memory operation to perform.

It can also be done with PUSH POP for non optimal requirements as it does not use registers at all but this technique is slower and should not be used in a critical algorithm.


PUSH mem1
PUSH mem2
POP mem1
POP mem2



Let's see if I understand this correctly: You're pushing the value of mem1 to the stack, then mem2. After this you pop out the most recent added, that means the mem2-value and store it to mem1, then you do the same thing with the next value in the stack, which is number one now, and store it in mem2. I got it right?

By the way Hutch--, I've read your sample about my other question. I'm understanding most of it, and the part I don't understand is nothing I can ask about at the moment. Need to read hard on loops, conditions and jumps and give it another try. So far the sample helped me understand a lot. However, most important is that I found a way of structure my code. I like your approach and will use it.