News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Constants, addresses and their contents

Started by mercifier, March 31, 2007, 10:39:52 AM

Previous topic - Next topic

mercifier

Having programmed Motorola assebler, it confuses me a bit that the distinguishing of immediate mode and absolute (direct, displacement only) mode does not depend on operators, but on the context, as far as I understand.

So,
mov ax,10                             ;is always immediate
but
mov ax,<symbol>                   ;depends on how <symbol> is declared
Not so?

Now, I found the following code in masm32\tutorial\console\demo7\complex.asm:

    .data
    ; --------------------------
    ; initialise 10 DWORD values
    ; --------------------------
      itm0  dd 0

(8 lines removed)
      itm9  dd 9
    ; ---------------------------------
    ; put their addresses into an array
    ; ---------------------------------
      array dd itm0,itm1,itm2,itm3,itm4
            dd itm5,itm6,itm7,itm8,itm9

(several lines removed)
    mov ebx, array              ; put BASE ADDRESS of array in EBX

My question is, WHAT does ebx contain at this point - the address to "array" (the value of that label) as the comment insinuates, or the contents at that address, which would be the address to "itm0"? Does it depend on how and where the label whas declared, or is it possible to distinguish by means of an operator or keyword, such as "offset"? Would the "offset" keyword make any difference in this case?

Ok, I suppose you will have to be patient with me, because I'm a very theoretical person, not the "trial and error" kind of programmer, so I didn't even bother testing the program and try to determine by its output.  :snooty: (Unless I can expect that a program will behave according to my intentions, I see no point in assembling it at all... But hey, that's just me!  :toothy)

Tedd

"mov ebx,array" is treated as "mov ebx,DWORD PTR [array]" in every case (that I can think of at this very moment :P)
"mov ebx,OFFSET array" then it will give you address of the symbol 'array'

I would generally choose to be explicit where it's not obvious, but obviously you need to know the difference for reading others' code.
No snowflake in an avalanche feels responsible.

sinsi

The code    mov ebx, array              ; put BASE ADDRESS of array in EBXactually loads the first member of array i.e. the address of itm0. It is just
a fluke that itm0...itm9 are DWORDS, so ESI*4 works in this case. I think the correct syntax should be    mov ebx, OFFSET array              ; put BASE ADDRESS of array
Light travels faster than sound, that's why some people seem bright until you hear them.

lingo

Examples:  :lol 
array's members         ; 0           1          2         3 
members addresses       ;400020h   400024h    400028h   40002Ch 
members content         ; 5           8          2         7                 

A.  Retrieving Addresses  of the array's members     

    mov ebx, OFFSET array    ; ebx->address of the 1st array's member ;ebx=400020h
or  lea ebx, array            ; ebx->address of the 1st array's member ;ebx=400020h
                               

    mov eax, offset array + 0*4     ; eax->address of the 1st array's member ;eax=400020h
    mov eax, offset array + 1*4     ; eax->address of the 2nd array's member ;eax=400024h

Note:
    4 means each array's member is DWORD->4 bytes
   If  each array's member is WORD->2 bytes we have 2 (not 4}:

    mov eax, offset array + 0*2     ; eax->address of the 1st array's member ;eax=400020h
    mov eax, offset array + 1*2     ; eax->address of the 2nd array's member ;eax=400022h

   If  each array's member is BYTE we have 1 (not 2 or 4):

    mov eax, offset array + 0*1     ; eax->address of the 1st array's member ;eax=400020h
    mov eax, offset array + 1*1     ; eax->address of the 2nd array's member ;eax=400021h


B.  Retrieving Contents of the array's members     

    mov ebx, array +0*4     ; ebx->value of the 1st array's member ;ebx=5
    mov ebx, array +1*4     ; ebx->value of the 2nd array's member ;ebx=8 

or
    mov ebx, dword ptr ds:[400020h] ; ebx->value of the 1st array's member ;ebx=5
    mov ebx, dword ptr ds:[400024h] ; ebx->value of the 2nd array's member ;ebx=8

or
   mov eax, offset array + 0*4     ; eax->address of the 1st array's member ;eax=400020h
   mov ecx, 1*4                     ; ecx = 2nd member X 4 bytes    
   mov ebx, dword ptr [ecx+eax]    ; ebx->value of the 2nd array's member ;ebx=8     

or
   mov eax, offset array + 0*4     ; eax->address of the 1st array's member ;eax=400020h
   mov ecx, 1                        ; ecx = 2nd member    
   mov ebx, dword ptr [ecx*4+eax]  ; ebx->value of the 2nd array's member ;ebx=8   

or
   mov ecx, 1                       ; ecx = 2nd member  
   lea eax, [array + ecx*4]        ; eax->address of the 2nd array's member ;eax=400024h
   mov ebx, dword ptr [eax]        ; ebx->value of the 2nd array's member   ;ebx=8

 
Regards,
Lingo  


mercifier

Quote from: lingo on March 31, 2007, 01:37:21 PM

Regards,
Lingo   

Oh! That was exhaustive!

But things got clear now I think: A label which points out at specific data or code statement always represents the address of that statement if there is no "offset" keyword.

(The example code that I quoted was kind of careless written, I suppose.)

PBrennick

mercifier,

About using OFFSET, it is good to remember that ADDR also performs this function. It is usually used in the INVOKE statement. So when you run into it in some example at some point, you will know what it is doing.

Lingo's example may have been a bit long but it is very well written. It is probably a good idea for you to make a printout of his post so you will always have it to hand as a reference. Eventually, you will know them by heart but for right now, a cribsheet makes a lot of sense to me.

I have a background in 6502, 6800, 6809 and 68000, also. It is my belief that once you get the hang of the differing conventions, programming the x86 is a lot easier and a lot more fun. Don't get confused by the brackets. MASM allows them by ignoring them.

Paul
The GeneSys Project is available from:
The Repository or My crappy website

mercifier

Yeah, the syntax takes at bit getting-used-to, but I seem to adopt the "destination first, source second" format faster than I ever thought I would... (Actually it makes some sense compared to the 6502 Assemby language: LDA #x -> MOV A,x) I guess I will cope with context-based addressing modes soner or later too...

lingo

#7
"always represents the address of that statement if there is no "offset" keyword."

Depends on instruction: MOV or LEA   :lol

My last example:
   mov ecx, 1                          ; ecx = 2nd member    
   lea   eax, [array + ecx*4]      ; eax->address of the 2nd array's member ;eax=400024h
   mov ebx, dword ptr [eax]     ; ebx->value of the 2nd array's member   ;ebx=8

With LEA we just compute contents inside the  parentheses(offset of the array plus 1*4 bytes)
and save the result in eax.Hence, [400020h +1*4] = 400024h

Here (in the parentheses) array is equal of the OFFSET array == 400020h

        lea eax, [array + ecx*4]          ; eax->address of the 2nd array's member ;eax=400024h
and   lea eax, [offset array + ecx*4] ; eax->address of the 2nd array's member ;eax=400024h
are equal

But

With MOV we get the contents of the memory cell
with address computed inside the  parentheses.

Here,  inside the  parentheses we compute address of the sell:
[array + ecx*4] == [400020h +1*4] == 400024h  ; where offset of array=400020h and ecx=1


  mov eax, [array + ecx*4]        ; eax->contents of the 2nd array's member ;eax=8
or
  mov eax, [OFFSET array + ecx*4] ; eax->contents of the 2nd array's member ;eax=8
or
  mov eax, ds:[400024h]           ; eax->contents of the 2nd array's membr ;eax=8

and
  mov eax, [array]        ; eax->contents of the 1st array's member ;eax=5
is equal to
  mov eax, array          ; eax->contents of the 1st array's member ;eax=5

Regards,
Lingo

mercifier

Damn! Why not just use the '#'-operator for immediate mode, like in Motorola assembly?
:bg

PBrennick

mercifer,

It would be too easy, then.  :bg By the way, though, is not the # only used for immediate values with the $ to distinguish hex from decimal? It has been years but that is what just bubbled up in this thing I use for a brain.

Paul
The GeneSys Project is available from:
The Repository or My crappy website

Ratch

mercifier ,

     You are running into confusion because INTEL did not understand the Queen's English when they selected their mnemonics.  A constant or address into a destination should be called a LOAD.  A register contents or memory contents into a destination should be called a COPY.  X86 does not really have a MOVE instruction, because that would mean the source gets copied into the destination, and then the source gets deleted.   Obviously that does not happen, so MOV is a misnomer as far as its functionality is concerned.  Ratch

mercifier

In Motorola (and 6502) assembly, the '#' operator is used to differentiate absolute addressing (i.e. direct/displacement only) from immediate mode. The '$' operator means the following value is hexadecimal. These can be used independently of each other:


move.l 12345678,d0 ;load d0 with contents at decimal address 12345678
move.l #12345678,d0 ;load d0 with decimal (immediate) value of 12,345,678
move.l $12345678,d0 ;load d0 with contents at 12345678h
move.l #$12345678,d0 ;load 20 with hexadecimal (immediate) value of 12345678h

mercifier

Ratch,

You got a point there. Data can never move, only be copied. Even the immediate constant remains unchanged, but in Motorola assembly the source of "move" comes first, which makes more sense to the syntax: move d0,d1 means "move (or copy) contents of d0 to d1". If the Intel copy instruction was named load instead, it would make more sense: mov ax,bx = "LOAD ax FROM bx" !!

(Maybe I'll just make a macro synonym of the MOV instruction...)

hutch--

mercifier,

> (Maybe I'll just make a macro synonym of the MOV instruction...)

Do yourself a favour and DON'T as the MOV family of mnemonics is easy enough to get the swing of with practice. There is a guy on Usenet who converted x86 to pseudo 68k notation with macros but he is forever suspended in out of date hardware where x86 is reasonably dynamic in terms of adaption.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Ratch

mercifier ,

    I include a EQU statement like @ EQU OFFSET in a  my code.  Then I can code like INVOKE EAX,@ LABELX,EBX,etc.  You can do a # EQU OFFSET if you prefer.  By the way, how do you like the way the Motorola 68000 series can do a direct memory to memory copy instead of having to run the contents through a register or stack first?  Ratch