News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Memory, reversed?

Started by n00b!, August 06, 2008, 10:58:32 PM

Previous topic - Next topic

n00b!

.

dsouza123

This may shed some light.


.586                      ; create 32 bit code
.model flat, stdcall      ; 32 bit memory model
option casemap :none      ; case sensitive

include    \masm32\include\windows.inc
include    \masm32\include\user32.inc
include    \masm32\include\kernel32.inc
includelib \masm32\lib\user32.lib
includelib \masm32\lib\kernel32.lib

.data
    pp        db 16 dup (0)
    tt        db 16 dup (0)
    szTitle   db "Memory order",0

.code
start:
  mov [pp + 0], '0'
  mov [pp + 1], '1'
  mov [pp + 2], '2'
  mov [pp + 3], '3'

  invoke MessageBox,NULL,ADDR pp,ADDR szTitle,MB_OK

  mov [tt + 0], 'T'
  mov [tt + 1], 'e'
  mov [tt + 2], 's'
  mov [tt + 3], 't'

  invoke MessageBox,NULL,ADDR tt,ADDR szTitle,MB_OK

  invoke  ExitProcess, 0
end start


Also remember a dword gets the lowest byte first (little endian)
and the byte lowest in memory contains a T.


  mov [tt + 0], 'T'
  mov [tt + 1], 'e'
  mov [tt + 2], 's'
  mov [tt + 3], 't'

  mov eax, dword ptr [tt]

  al = T  a0
  ah = e  a1
  a2 = s  a2
  a3 = t  a3

n00b!

#2
.

Mark Jones

#3
Quote from: n00b! on August 07, 2008, 12:12:44 PM
Quotecmp DWORD ptr [txt1], "tseT"
And because of the DWORD ptr it's "tseT" and not "Test"?

The "dword ptr" only indicates to the assembler that [txt1] is a memory address, and not a value -- it has no other function. For understanding the differences between how these are handled in MASM, check out the \masm32\help\asmintro.chm file.

MASM creates some rather confusing ambiguity because use of square brackets around an operand ([txt1]) traditionally means "value of", yet the "dword ptr" prefix makes it "address of." Likewise, "ADDR txt1" and "OFFSET txt1" are also nearly identical. Still yet, another way to get an address of an identifier is by the LEA (load effective address) instruction. All of these are just different ways of getting the same thing -- a memory address.

For a little more perspective, in the GoASM assembler for instance, this ambiguity has been eliminated so that anything in brackets is always a VALUE, and anything not in brackets must be preceeded by "addr" or "offset" to indicate that it is the address of that identifier (and not the value present at that identifier.)

EDIT: Apparently I'm still confused about the bold above, lol. Corrected.
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

dsouza123

ah2 = "t"    al2 = "s"     ah = "e"     al = "T"?
Yes

mov BYTE ptr [pp], 10000101b           ; -> low word of pp = 10000101
mov BYTE ptr [pp + 1], 1101000b       ; -> high word of pp = 1101000
mov BYTE ptr [pp + 2], 100b              ; -> 2. low word of pp = 100
No

should be
mov BYTE ptr [pp], 10000101b           ; -> zeroth btye of pp = 10000101
mov BYTE ptr [pp + 1], 1101000b       ; -> first btye of pp = 1101000
mov BYTE ptr [pp + 2], 100b              ; -> second byte of pp = 100

also BYTE ptr is not needed if pp is an array of bytes


cmp DWORD ptr [txt1], "tseT"
No, if txt1 is an array of bytes, DWORD ptr is to allow the bytes to be accessed in dword (4 byte) chunks
little endian (Intel x86) dwords have the lowest (0) byte first, when read in from memory,
so if the byte string Test was in memory locations 0,1,2,3 then loaded into eax,
when eax is shown using the number display convention of HIGH to LOW in a Left to Right direction
register bytes 3,2,1,0 would list as tseT.
Unfortunately for analysing text stored in dwords, the written word display convention in English (and many other languages)
is LOW to HIGH in a Left to Right direction, effectively displaying it in reverse order.

Summary numbers are shown with higher positions to the left, written words are shown with further positions to the right,
a clash of conventions.

Mark_Larson

  my brain has the same problem!  That is why I get so confused  :bg
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

n00b!

#6
.

AeroASM

Quote from: n00b! on August 07, 2008, 10:11:51 PM
And why do I have to write
Quotecmp DWORD ptr [txt1], "tseT"
when it's
Quotetxt1 db "Test", 0
?

txt1 db "Test",0
is equivalent to
txt1 db "T", "e", "s", "t", 0
which is equivalent to
txt1 db 054h,065h,073h,074h,0

so in memory (suppose txt1 is at address 0403000h)

0403000 54 ("T")
0403001 65 ("e")
0403002 73 ("s")
0403004 74 ("t")
0403005 0

So,
byte ptr [txt1] equals 054h
word ptr [txt1] equals 06554h
dword ptr [txt1] equals 074736554h
because of little-endianness.

The single dword 074736554h is, for all practical purposes, the same as the byte array {074h,073h,065h,054h} which is the same as {"t","s","e","T"}, which is the same as "tseT".

dsouza123

#8
  Source of the confusion, strings are parsed and stored low to high in the .data section
what you see is what you get, strings are parsed high to low but stored low to high
when used as an immediate operand, what you see is the opposite order of what you get.

  An immediate operand follows the convention of specifing numbers high to low, left to right,
which is fine when the immediate is a number, but causes a problem when using a string,
which uses the opposite convention, low to high, left to right of written text or .data section strings.

  The issue is when specifing a string such as "3210" or "Test"
as the immediate operand to an instruction like  mov eax, "Test"
MASM parses it as four bytes, high to low (3 downto 0) from the left to right.


00401000                    start:
...
0040101F B830313233             mov     eax,33323130h   ; mov eax, "3210"
...
0040102F B874736554             mov     eax,54657374h   ; mov eax, "Test"
...
0040103F B830313233             mov     eax,33323130h   ; mov eax, 33323130h   a hex number


  It is stored low to high (0 to 3) little endian, perhaps someone will write a macro that takes
a string of four bytes and reverses the order, mov eax, txt("Test") ,effectively allowing MASM to treat
the immediate operand as being specified in low to high order (left to right) using the convention
of text declared in the .data section


.data 
  szTitle db "Test of string storage conventions",0

00403000 54 65 73 74 20 6F 66 20 - 73 74 72 69 6E 67 20 73  Test of string s
00403010 74 6F 72 61 67 65 20 63 - 6F 6E 76 65 6E 74 69 6F  torage conventio
00403020 6E 73 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00  ns..............


Perhaps the macro could be flexible enough to work with other size immediate operand strings,
2 byte for words, 8 bytes for qwords (MMX) and 16 bytes for double qwords (SSE2).
Or maybe separate macros such as txt2( txt4( txt8( txt16( must be done.

dsouza123

txt4 a fixed size, 4 bytes, hard coded macro using KSS's uni$ macro as the starting source.
It will work for either " " or ' ' quoted 4 byte strings.


    txt4 MACRO text:VARARG

        nustr equ <>
        slen SIZESTR <text>

     ;; ------------------------------------------------
     ;; test for errors in length
     ;; ------------------------------------------------
        if slen ne 6
          echo -----------------------
          echo *** STRING TOO SHORT or TOO LONG***
          echo -----------------------
        .ERR
        EXITM <>
        endif

      ;; ------------------------------------------------
      ;; create a new 4 byte string in reverse order
      ;; ------------------------------------------------
        nustr1 SUBSTR <text>,1,1
        nustr2 SUBSTR <text>,2,1
        nustr3 SUBSTR <text>,3,1
        nustr4 SUBSTR <text>,4,1
        nustr5 SUBSTR <text>,5,1
        nustr6 SUBSTR <text>,6,1

        nustr  CATSTR nustr6,nustr5,nustr4,nustr3,nustr2,nustr1
        EXITM nustr
    ENDM



   mov eax, txt4("Test")