"Reverse Storage" is confusing terminology

Tapejara · February 05, 2010, 12:00:16 AM

::)I am starting to read the GoAsm manual in earnest. With all due respect, Jeremy, your terminology of "Reverse Storage" is confusing. I concur that the Big-Endian vs. Little-Endian issue is a difficult one to grapple with. But I have programmed extensively in both systems and, after 25 years of being a C programmer (30+ for assembly in many different architectures), I have come to terms with the differences. To make clear my point, I assert to you that our "Big Endian" way of writing numbers is in reverse order of what it should be. So it is our writing that is backwards, not the Intel method of storing multi-byte values. With Little-Endian ordering, the bit number of every bit is the same in both register and memory which is its offset from a base bit address. With Big-Endiian, its bit number starts at 0 in a register (and counts up) but begins with (N-1) in memory (and counts down) so they are backwards from eachother. I for one think that the Little-Endian method is more mathematically natural and we humans just can't get over the fact that we write our numbers backwards. But people are quirky and they think the way they learn something when they are young is always the "right" way. Looks like we were wrong on this one. Anyway, I think that sticking with industry standard terminology here is the best way to go. (Alas, Motorola's syntax is far superior to Intel's, but they didn't win the architecture wars for our desk tops)

Tapejara · February 05, 2010, 09:39:35 PM

Reading in the manual further, it is unclear what Jeremy is trying to say. In the following excerpt:

MOV AL,'1'
MOV AX,'12' ;regarded as bytes - 1 first then 2
MOV EAX,'ABCD' ;regarded as bytes - A first, then B then C then D

It would clarify what is happening if he showed what the contents of EAX was after execution of the last statement. My guess is that it is 0x44434241. This is most logical because, after it is stored to memory, it will be in the order "ABCD". If it is loaded into the register as 0x41424344 then it will store to memory as "DCBA" which is not the way it should be. I think Jeremy made the right choice but it is not totally clear from reading the manual.

jj2007 · February 06, 2010, 07:28:47 AM

Quote from: Tapejara on February 05, 2010, 09:39:35 PM
My guess is that it is 0x44434241. This is most logical

Do not guess. Use Olly. It is 41424344, in eax and in memory.

donkey · February 06, 2010, 08:24:37 AM

Hi Tapejara,

GoAsm follows Intel syntax except in the case of mov'ing quoted strings, in that case the string is reversed. In GoAsm you would write:

mov eax, '1234'

In MASM this is equivalent to

mov eax, '4321'

Because it is a quoted string GoAsm will reverse it during assembly for you, it is intended to make source code more easily readable.

Edgar

jj2007 · February 06, 2010, 10:12:29 AM

Quote from: donkey on February 06, 2010, 08:24:37 AM
Because it is a quoted string GoAsm will reverse it during assembly for you, it is intended to make source code more easily readable.

Sorry Tapejara, i had not realised you were talking GoAsm. Edgar is right (I had used Masm syntax).

dedndave · February 06, 2010, 10:57:32 AM

that is probably why Jeremy uses the "reverse storage" terminology - it is backwards from masm
he wanted to highlight the differences

Tapejara · February 06, 2010, 02:55:26 PM

Quote from: jj2007 on February 06, 2010, 07:28:47 AM
Quote from: Tapejara on February 05, 2010, 09:39:35 PM
My guess is that it is 0x44434241. This is most logical
Do not guess. Use Olly. It is 41424344, in eax and in memory.

This is not even true in MASM. It is the hardware that makes registers look backwards from memory, not the assembler (assuming we express our numbers high order first). Imagine that the source code has been assembled and you are executing it with a debugger. You are looking at memory and register values, not your original source code. Here, source code no longer exists. Just an executable. Once this is understood, then you can go go back and look at the source code that produced it. What the guys doing MASM failed to do is to realize that the iimage in memory should match the source code order for strings. The funky way things look in registers is secondary as this is only an intermediate (and temporary) form for processing. So Jeremy did it right. The characters in a string you quote in source code should end up in the same order in memory as it looks in source code.

However, the language feature of loading a string fragment into a scalar register should probably only be used as an optimization. The eax is a scalar register and a single character is also a scalar value. These are a good match. Strings are more like vectors (i.e. arrays) and should be loaded into vector registers. The extra room in each scalar register is for storing larger character codes (such as Unicode). The syntax of loading arrays into scalar registers clashes with true parallel programming syntax of HLL's which is almost non-existant in the industry right now. But it will be the up and coming thing of the future...

jj2007 · February 06, 2010, 05:44:38 PM

Quote from: Tapejara on February 06, 2010, 02:55:26 PM
Quote from: jj2007 on February 06, 2010, 07:28:47 AM
Quote from: Tapejara on February 05, 2010, 09:39:35 PM
My guess is that it is 0x44434241. This is most logical
Do not guess. Use Olly. It is 41424344, in eax and in memory.

This is not even true in MASM.

Code Select

Masm code:	mov eax,'abcd'
Olly, code:	mov eax, 61626364 aka abcd
Olly, regs:	eax 61626364 aka abcd
Olly, memory:	dcba aka 64636261

It is indeed confusing...

dedndave · February 06, 2010, 05:54:17 PM

for masm...

mov eax,'ABCD'
mov String,eax

results in a String of 'DCBA'

donkey · February 06, 2010, 08:51:18 PM

I think you are looking for an explanation of a hardware issue in a software manual, the exact way the x86 stores numbers is an issue for Intel and has little to do inside the documentation for the assembler. For example the C++ manual does not go into details of what addressing modes are used to dereference an object, nor does the MASM manual explain how it uses EAX to address local data. I don't find the terminology or Jeremy's explanation particularly confusing when read in the light of its effect on source code as compared to other assemblers. GoAsm only acts differently than other assemblers in one respect, when dealing with quoted immediates, I use this feature a lot, for example when checking the extension of a file it is a quick way to compare them:

invoke PathFindExtension, offset path
mov eax,[eax]
cmp eax, '.lnk'

Works well in GoAsm, in MASM the string needs to be reversed so you end up with

cmp eax, 'knl.'

Yes, from a hardware standpoint it is actually GoAsm that is reversing the byte order of a DWORD but that is not a source code issue, from the standpoint of the source alone it is not reversed. I find the GoAsm code a bit more readable, in other words if you're looking for the bytes . L N K you do not need to reverse their order in the quoted immediate. I think Jeremy did a fairly good job of documenting a rather difficult subject for newcomers to grasp, however it is explained as it affects source code, not at the hardware level, what happens to it when it is loaded into the register or stored in memory is matter for the x86 architecture manual to address. I also find that not reversing quoted immediates in the source code an elegant addition to the language.

Now, if you want to be really confused try synchsafe, that ones just nuts :)

Edgar

News:

"Reverse Storage" is confusing terminology

Tapejara

Tapejara

jj2007

donkey

jj2007

dedndave

Tapejara

jj2007

dedndave

donkey