News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Little endian, mov dword ptr, add dword ptr

Started by bolzano_1989, November 05, 2011, 07:13:13 AM

Previous topic - Next topic

bolzano_1989

In Tutor 5 - Opcodes of Win32Asm Tutorial, I have some problems to understand the "little endian" format.
Could you explain what actually happens in the memory and registers in the following 2 examples which I created & tried to understand more about little endian :bg ?

mov dword ptr [0000003Ah], 725E7A25h
add dword ptr [0000003Ah], 1
mov eax, dword ptr [0000003Ah]


mov dword ptr [0000003Ah], 725E7A25h
inc dword ptr [0000003Ah]
mov eax, dword ptr [0000003Ah]


Actually, I don't know when I should care about little endian :( .

NoCforMe

I'm not sure those are very good examples if you want to learn how little-endian storage works. In those examples, you're just moving DWORD (doubleword) values into doubleword registers. Kinda boring.

You really do need to understand this concept, though, if you're going to assemble on "X86" processors. Little endian is confusing at first (especially if you ask "Why???"; my advice; don't ask that. It's just that way. The god Intel made it so.)

So imagine you have four bytes of memory lined up like this:



0: aa  bb cc dd



(the "0:" indicates the address of the first byte).  This is going from low to high, just as you'd scan it on the page, left to right.

OK, simple enough. Now, taking those letters as hexadecimal values, you'd think that what you have there is the hexadecimal DWORD value



0xAABBCCDD



Ah, but you'd be wrong.  On the 68000, that is what you would have. But  in the X86 world, what you'd have (if you loaded these 4 locations into a doubleword register) would be



0xDDCCBBAA



Everything gets all swapped around. (The words are "byte-swapped", and the DWORDs are "word-swapped".) The way it works is this:

  • If you have two WORDs in memory (like AAAA followed by BBBB) and you load them into a DWORD register, the first word (in sequence left to right) becomes the LOW WORD, and the second becomes the HIGH WORD, so the result IN THE REGISTER will be BBBBAAAA. Just the opposite of the way you might think it would be.
  • Same thing operates on the byte/word level: if you have the two bytes AA BB in memory, and you load them into a WORD register, the first one is considered the low byte, so you get BBAA in the register.
In Windows programming, this wacky way of organizing memory often works to our advantage. For instance, sometimes a word value (16 bits) will be passed in the low word of lParam or wParam, two often-used parameters. It's really easy to get these values, say if they're on the stack, without doing any arithmetic:


MOV  AX, WORD PTR lParam


(where "lParam" is a DWORD value on the stack) Here, we're just extracting the first part (low word) of the DWORD into a WORD-sized register (AX). If the processor wasn't little-endian, we'd have to do this instead:


MOV  AX, WORD PTR lParam + 2


since the low word would actually be "behind" the high word, not "in front of it".

We can also get the value into the whole 32 bits of EAX easily (back to little-endian here) by doing this:


MOVZX EAX, WORD PTR lParam


(assuming we want the value to be zero-extended; we can use MOVSX  to retain the sign of the word if we want)

Hopefully this didn't just confuse you more. It takes a little while to wrap your head around this aspect of '86 assembly.

bolzano_1989

Quote from: NoCforMe on November 05, 2011, 07:55:20 AM
I'm not sure those are very good examples if you want to learn how little-endian storage works. In those examples, you're just moving DWORD (doubleword) values into doubleword registers. Kinda boring.

NoCforMe, thank you for your help, but could you help me with the "boring" examples above  :bg ?
For example, could you tell me after the first instruction:
mov dword ptr [0000003Ah], 725E7A25h

Which one will the result in memory be:
Quoteoffset   34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42
data     0D 0A 50 32 44 57 25 7A 5E 72 EF 7D FF AD C7

OR

Quoteoffset   34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42
data     0D 0A 50 32 44 57 72 5E 7A 25 EF 7D FF AD C7
?

jj2007

Here is another perspective, hope it helps. Apologies that it looks like Basic
:wink
include \masm32\MasmBasic\MasmBasic.inc   ; download
TheMem   db "abcdefghijklmnopq", 0

   Init
   mov esi, offset TheMem   ; point esi to memory
   xor eax, eax   ; clear the complete register
   mov al, [esi]
   PrintLine Chr$(eax), Tb$, Tb$, "al as string"   ; you will see a
   xor eax, eax   ; clear the complete register
   mov al, [esi]
   PrintLine Hex$(eax), Tb$, "al as hex"
   xor eax, eax   ; clear the complete register
   mov ax, [esi]
   PrintLine Hex$(eax), Tb$, "ax as hex"
   mov eax, [esi]
   PrintLine Hex$(eax), Tb$, "eax as hex"
   Inkey "ok"
   Exit
end start
Output:
a               al as string
00000061        al as hex
00006261        ax as hex
64636261        eax as hex

ToutEnMasm

Little-endian and big-endian are just an other order to show the bytes of a dword or word.
It is need by some api.
There is two instructions of the cpu who made that (I haven't refind them).
The intel or amd books give more explain , but it is not very useful.
Try to find the instructions and a sample it is enough.
BSWAP
Quote
Reverses the byte order of the specified register. This action converts the contents of the register from
little endian to big endian or vice versa. In a doubleword, bits 7–0 are exchanged with bits 31–24, and
bits 15–8 are exchanged with bits 23–16. In a quadword, bits 7–0 are exchanged with bits 63–56, bits
15–8 with bits 55–48, bits 23–16 with bits 47–40, and bits 31–24 with bits 39–32. A subsequent use of
the BSWAP instruction with the same operand restores the original value of the operand.
The result of applying the BSWAP instruction to a 16-bit register is undefined. To swap the bytes of a
16-bit register, use the XCHG instruction and specify the respective byte halves of the 16-bit register
as the two operands. For example, to swap the bytes of AX, use XCHG AL, AH.


Quote
Figure 1-2. Little-Endian Byte-Order of Instruction Stored in Memory
The basic operation of an instruction is specified by an opcode. The opcode is one or two bytes long, as
described in "Opcode" on page 17. An opcode can be preceded by any number of legacy prefixes.
These prefixes can be classified as belonging to any of the five groups of prefixes described in
"Instruction Prefixes" on page 3. The legacy prefixes modify an instruction's default address size,
operand size, or segment, or they invoke a special function such as modification of the opcode, atomic
bus-locking, or repetition. The REX prefix can be used in 64-bit mode to access the register extensions
illustrated in "Application-Programming Register Set" in Volume 1. If a REX prefix is used, it must
immediately precede the first opcode byte.
An instruction's opcode consists of one or two bytes. In several 128-bit and 64-bit media instructions,
a legacy operand-size or repeat prefix byte is used in a special-purpose way to modify the opcode. The
opcode can be followed by a mode-register-memory (ModRM) byte, which further describes the
operation and/or operands. The opcode, or the opcode and ModRM byte, can also be followed by a
scale-index-base (SIB) byte, which describes the scale, index, and base forms of memory addressing.
The ModRM and SIB bytes are described in "ModRM and SIB Bytes" on page 17, but their legacy
functions can be modified by the REX prefix ("Instruction Prefixes" on page 3).
The 15-byte instruction-length limit can only be exceeded by using redundant prefixes. If the limit is
exceeded, a general-protection exception occurs.

ToutEnMasm

Wikipedia (in your langage):
Little endian : low byte is before hight byte        ;it's the normal order
Big endian   :reverse this order.
If an api want big-endian use bswap and all is good.


bolzano_1989

Thank you everybody for your help  :bg.
In Tutor 5 - Opcodes of Win32Asm Tutorial, Mad Wizard (Thomas Bleeker) explain clearly the instruction:
mov eax, dword ptr [0000003Ah]
so that I understand what happens in memory and registers for that instruction. In that instruction the "dword ptr" is in the source of the mov opcode.

What I need to understand is what happens in memory and registers if the "dword ptr" is in the destination of the mov opcode  :toothy, so that I modified and made my 2 examples in the first post of this threads.
Could you help me know what actually happens in the memory and registers in those 2 examples?

For example, could you tell me after the first instruction:
mov dword ptr [0000003Ah], 725E7A25h

Which one will the result in memory be:
Quoteoffset   34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42
data     0D 0A 50 32 44 57 25 7A 5E 72 EF 7D FF AD C7

OR

Quoteoffset   34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42
data     0D 0A 50 32 44 57 72 5E 7A 25 EF 7D FF AD C7
?

ToutEnMasm


I have took a more simple example,a dword put in memory.
The view of the result is given by a debugger
Quote
mov eax,12345678h       ;12345678 = eax
mov endianx,eax
lea edx,endianx              ;78563412 in memory    little-endian
bswap eax         ;78563412 = eax
mov endianx,eax      ;12345678 in memory    big-endian

qWord

Quote from: bolzano_1989 on November 05, 2011, 03:18:29 PMWhich one will the result in memory be:
Quoteoffset   34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42
data     0D 0A 50 32 44 57 25 7A 5E 72 EF 7D FF AD C7 <= this one

OR

Quoteoffset   34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42
data     0D 0A 50 32 44 57 72 5E 7A 25 EF 7D FF AD C7
?
the first
FPU in a trice: SmplMath
It's that simple!

bolzano_1989

Thank you qWord and ToutEnMasm.
Now, I think for my first example:
mov dword ptr [0000003Ah], 725E7A25h
add dword ptr [0000003Ah], 1
mov eax, dword ptr [0000003Ah]


After
Quotemov dword ptr [0000003Ah], 725E7A25h
, the result in memory will be:
Quoteoffset   34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42
data     0D 0A 50 32 44 57 25 7A 5E 72 EF 7D FF AD C7

After
Quoteadd dword ptr [0000003Ah], 1
, the result in memory will be:
Quoteoffset   34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42
data     0D 0A 50 32 44 57 26 7A 5E 72 EF 7D FF AD C7

And after
Quotemov eax, dword ptr [0000003Ah]
eax = 725E7A26h

Could you tell me whether my view is right now  :bg ?

qWord

FPU in a trice: SmplMath
It's that simple!

bolzano_1989

Quote from: qWord on November 05, 2011, 04:32:55 PM
Quote from: bolzano_1989 on November 05, 2011, 04:16:28 PMCould you tell me whether my view is right now  :bg ?
You've got it!  :U

Thank you qWord :bg !

Better slow than sorry, about my second example:
Now, I think my second example is the same as my first example.
I mean
inc dword ptr [0000003Ah]
is the same as
add dword ptr [0000003Ah], 1.
Is that right :bg ?


In detail:
mov dword ptr [0000003Ah], 725E7A25h
inc dword ptr [0000003Ah]
mov eax, dword ptr [0000003Ah]


After
Quotemov dword ptr [0000003Ah], 725E7A25h
, the result in memory will be:
Quoteoffset   34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42
data     0D 0A 50 32 44 57 25 7A 5E 72 EF 7D FF AD C7

After
Quoteinc dword ptr [0000003Ah]
, the result in memory will be:
Quoteoffset   34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42
data     0D 0A 50 32 44 57 26 7A 5E 72 EF 7D FF AD C7

And after
Quotemov eax, dword ptr [0000003Ah]
eax = 725E7A26h

Is it right?

mineiro

little endian means "end in a lower".
If you have the word "exit" in memory and like to compare that, you do:
cmp dword ptr [some_location],"tixe"  ;little endian of "exit".
This is where you need take care.
.386
.model flat, stdcall
option casemap:none

.data
quit db "exit",00h

.code
main proc
lea eax,quit
cmp dword ptr [eax],"tixe"
je done
nop
done:
ret
main endp
end main

Actual pc's dont work with bits, they work with bytes. If you need change one bit, you need read a full byte , then change that bit, and after store that byte. This is why you don't simply reverse all nibbles.
When you read data from memory, that data is reverse when it reach the registers, and that data is reversed again when you store to memory.
Why this? Well, because we write from left to right, but when we deal with numbers, we write from right to left.

qWord

Quote from: bolzano_1989 on November 05, 2011, 04:46:35 PMinc dword ptr [0000003Ah]
is the same as
add dword ptr [0000003Ah], 1.
Is that right :bg ?

[...]

eax = 725E7A26h

Is it right?
yes and yes :thumbu
FPU in a trice: SmplMath
It's that simple!