Print Page - Pointers, etc

Title: Pointers, etc
Post by: redskull on September 22, 2005, 03:09:50 PM

I think i've got a pretty good grasp of this stuff, but I was hoping one of the mavens around here could just clarify it all to make certain I'm not confused: What's the difference between a using a pointer (DWORD PTR) to a variable, using OFFSET before the variable, using ADDR before the variable, and using brackets around a variable? They all seem somewhat alike, and at the same time very different.
thanks

Title: Re: Pointers, etc
Post by: QvasiModo on September 22, 2005, 03:29:51 PM

OFFSET gives you a constant memory address that is known at assembly time.

ADDR can do the same as OFFSET in some cases, and additionally generates assembly codes to calculate the address of local variables, which can only be known at runtime.

A pointer is a variable which in turn contains the memory address of another variable. Being a variable, this value can be changed in runtime. The DWORD PTR operator lets you use a pointer located at a register.

Some code examples:

Code Select



; this loads the contents of SomeVariable into ECX
MOV ECX, SomeVariable

; loads the memory address of SomeVariable into EAX
MOV EAX, OFFSET SomeVariable

; reads into EDX the contents of SomeVariable, using the pointer in EAX
; so now ECX and EDX have the same value
MOV EDX, DWORD PTR [EAX]

; calls a procedure, passing as a parameter the memory address of a local variable
INVOKE SomeProcedure, ADDR SomeLocalVariable

; another way to do the same as above
LEA EAX, SomeLocalVariable
INVOKE SomeProcedure, EAX

; this one is wrong! it's passing the *value* of SomeLocalVariable instead of it's *address*
MOV EAX, SomeLocalVariable
INVOKE SomeProcedure, EAX

Hope this makes things a bit clearer! :U

Title: Re: Pointers, etc
Post by: OceanJeff32 on September 23, 2005, 11:47:28 PM

Question:

so if you've got a variable (array) declared like this:

blocks dd 40, 50, 20, 20, 90, 100, 20, 20

then later in code you access the first element by pointing an index register or general register to the beginning like this: (?)

lea eax, blocks

or

mov edi, blocks

or does it have to be lea...

THEN once you've pointed it there, you access the variable by doing this:

mov firstelement, [eax]

or

mov firstelement, [edi]

AND you can reference further elements within the array by adding offsets in the []s

mov otherelements, [eax + sizeofelement * (2,4,8...)]

optionally using the DWORD or WORD PTR or BYTE PTR...

am I right? I'm trying this now, so I'll know very shortly, but I have a feeling I may be confused very shortly...

later,

jeff c
:bdg

Title: Re: Pointers, etc
Post by: hutch-- on September 24, 2005, 12:02:48 AM

Hi redskull,

A "pointer" is actually a higher level idea but its easy to do in assembler. What you have is data in memory of whatever type and "where" it is located is its "address". When you want to store that "address" somewhere you place the address in a variable that is alocated somewhere else and that variable "points" to the address.

Now you can get the address in a number of ways depending on where the data is located, if it is in the .DATA or .DATA? section, you use its OFFSET which is determined at assembly time. With data allocated on the stack at runtime such as LOCAL variables, you get the "address" of that LOCAL using LEA or in an "invoke" statement you use ADDR to do the same. You also work with an address when you allocate memory dynamically you copy the return value to a variable which is the pointer to that memory.

Title: This Thread Should Be an FAQ
Post by: Trope on September 24, 2005, 01:07:48 AM

As a newbie myself, I struggled with this also.

This thread is absolutely PERFECT. The replies from QvasiModo, OceanJeff32, and Hutch are , as far as I am concerned, Textbook.

If I had seen this thread back in the day, it would have saved me many Google hours of reading up on C pointers ( which I know now is related, but at the time did not ).

Kick ass explanations guys.

Trope

Title: Re: Pointers, etc
Post by: redskull on September 24, 2005, 03:25:34 AM

thanks to everybody for the responses; just one more question; isn't all this sort of syntax tra-jickery? I mean, in the long run, keeping a pointer variable does pretty much the same thing as LEAing the variable into a register, which is pretty much what OFFSET does when the code gets generated, right? Some might be faster than others, but are there specific situations where one is better (or required) to have?

Title: Re: Pointers, etc
Post by: hutch-- on September 24, 2005, 05:12:58 AM

redskull,

You can learn the hard way like most of us did or learn the right way and save yourself the problems. OFFSET literally means a location in a binary file and this is determined at assembly time. A LOCAL variable is set at runtime on the stack of each procedure and it is NOT the same thing. It just depends on how much grief you want finding out. :bg

Title: Re: Pointers, etc
Post by: gabor on September 26, 2005, 08:18:27 AM

Hello!

Just 2 remarks:

Quote from: OceanJeff32 on September 23, 2005, 11:47:28 PM
lea eax, blocks

or

mov edi, blocks

or does it have to be lea...

Yes, it has to be lea. This way mov edi,blocks will load the first dword at label blocks into edi, that is edi=40h.

Quote from: OceanJeff32 on September 23, 2005, 11:47:28 PM

mov otherelements, [eax + sizeofelement * (2,4,8...)]

Scaling, I mean adding a reg*2,4,8 to the address calculation is valid only for the lea instruction. Am I right?
This is for sure:
lea can be used this way:
lea reg1,[label+offset+reg2*1*2/4/8]
This loads the offset of label+offset+reg2*1/2/4/8 into reg1.
Example:

Code Select


blocks  dd  10203040h, 50607080h, 90A0B0C0h, D0E0F011h

mov  ecx,2                ; index of 3rd element (0 indexes the first element!)
lea  edi,[blocks+3+ecx*4]
mov al,BYTE PTR [edi]     ; get the 4th byte of the DWORD at index from table blocks

As a result al will contain the MSB byte of the 3rd dword in the table, that is 90h.

Maybe I am not right, but I think two registers can be on the right side of the instruction at one time. And one of them can have scaling.

I guess, this thread is totally about something difficult. This is proven by the fact, that every 'modern' and fashionate high level languages banish the pointers (JAVA, C#) . The thing gets really complicated when using arrays of pointers and the array itself is accessed by a pointer too. That is a real brain challange when debugging! :))

Greets, Gábor

Title: Re: Pointers, etc
Post by: QvasiModo on September 26, 2005, 03:43:52 PM

Quote from: gabor on September 26, 2005, 08:18:27 AM
Scaling, I mean adding a reg*2,4,8 to the address calculation is valid only for the lea instruction. Am I right?

No, the address calculation syntax is the same for all instructions that can use registers for memory addresses. So scaling and all the other goodies can be used with LEA, MOV, PUSH, POP, etc... they can even be used with CALL and JMP. :)

Quote from: gabor on September 26, 2005, 08:18:27 AM
Maybe I am not right, but I think two registers can be on the right side of the instruction at one time. And one of them can have scaling.

Quote from: gabor on September 26, 2005, 08:18:27 AM
I guess, this thread is totally about something difficult. This is proven by the fact, that every 'modern' and fashionate high level languages banish the pointers (JAVA, C#) . The thing gets really complicated when using arrays of pointers and the array itself is accessed by a pointer too. That is a real brain challange when debugging! :))

You're right, pointers are difficult enough and the many Intel addressing modes make this topic very complex, but worthwile. And while Java may be hiding pointers for the programmer's sake, things have to be done with them "under the hood"... :wink

Title: Re: Pointers, etc
Post by: hutch-- on September 27, 2005, 01:36:07 AM

The trick with the Intel compex addressing modes is to simply learn how it works. The variations are built into the processor as different opcodes and once they are understood, they are very clear and precise to use.

An array of DWORD member size is nothing more that a location in memory, dynamic, stack or data section, with a member count that you can store somewhere if you wish. The complex addressing modes give you a fast and efficient way to access array members. With the info that Quasimodo has posted for you, just learn how it works. With an instruction something like,

Code Select


    mov eax, [ebx+ecx*4+16]

It breaks up into its component parts easily.

ebx is the base address.
ecx is that array index
the +4 is the scale of the index to set the array data size and the trailing +16 is what is called a displacement which adjusts the address by that many bytes.

When the instruction does not set the data size it uses a size specifier so code like,

Code Select


push [ebx+ecx*4]

properly need a data size specifier.

Code Select


push DWORD PTR [ebx+ecx*4]

as this determines which opcode the assembler will choose for the output.

Title: Re: Pointers, etc
Post by: OceanJeff32 on September 27, 2005, 05:35:08 AM

You're right. It can be done like this: [base_register + index_register * scale_constant + displacement_constant]

where:
base_register can be EBX or EBP
index_register can be ESI, EDI, EAX, ECX or EDX
scale_constant can be 1, 2 or 4
displacement_constant can be any constant

Ok, QUESTION:

What happens if you use something other than ebx or ebp for base register? This may be my problem... :dazzled:

and I take it, the index register can be set to any valid amount, as long as it's not greater than the size of the array you are accessing ??

and...the displacement constant should be defined at compile time ??

Very cool!

AND ... (one last AND, for the moment anyways...) How helpful is that XLAT instruction for accessing array elements, and is the DS register still used by the system to keep track of your data?? I assume that all the data is globbed together into one big data section by the assembler when you finally compile your program, and all the code is grouped together, unless you tell the compiler to do otherwise...

Just curious, because the XLAT instruction gives you an offset (?) AS COMPUTED from the value in the ds register, AND the value in the ebx register.

::)

Well, enough questions for now, I'll have more later, hopefully I'll be answering some questions around here too!

later,

jeff c
:U

Title: Re: Pointers, etc
Post by: gabor on September 27, 2005, 07:28:05 AM

Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
and...the displacement constant should be defined at compile time ??

I guess yes, it must be set at compile time, that's why it is called a constant.

Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
AND ... (one last AND, for the moment anyways...) How helpful is that XLAT instruction for accessing array elements, and is the DS register still used by the system to keep track of your data??

I, personally never used the XLAT not even in real mode. Now, in 32bit mode I rather use these cool addressing methods, I simply enjoy the freedom to index with the register I prefer (not only with esi or edi). So if I need an array to be accessed I write the index into a free register (before, I try to set the element size of the array to a power of 2, use padding if necessary) and use scaling to address the indexed element.

Code Select


XLAT

    mov    ebx,offset Table
    mov    al,12           ; index
    xlat
; al contains byte at offset Table+12

is equivalent to
    mov    al,offset Table+12

Is this right so? Or is it lame not to use XLAT? And BTW what do you say about the other more complex instructions, like string instructions? I've heard that they are rather slow and it is better to use a simple MOV/SUB/JNZ or JNC combination, like this:

Code Select


    lea    esi,Src
    lea    edi,Dst
    mov    ecx,1000h
    rep movsd

or

    lea    esi,Src
    lea    edi,Dst
    mov    ecx,0FFFh
@@:
    mov    eax,[esi+ecx*4]
    mov    [edi+ecx*4],eax
    sub    ecx,1
    jnc @B

I hope I didn't mess up anything! One important thing: in the second way the count is set to 0FFFh and not to 1000h. This is because the loop ends when ecx underflows (jnc @B), so there are 0..0FFFh=1000h cycles!

And of course: Guys! Tonns of thanks for the important and usefull infos !

Greets, Gábor

Title: Re: Pointers, etc
Post by: hutch-- on September 27, 2005, 10:16:14 AM

Jeff,

In most circumstaces you can use any register as a base register in complex addressing mode. There is also an opcode that lets you use a named address wich is an assembly time data section address, usually something like a table. You can use XLATB but its slow in comparison to incremented pointers with a table.

Title: Re: Pointers, etc
Post by: MazeGen on September 27, 2005, 12:57:29 PM

Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
scale_constant can be 1, 2 or 4

Scale can be 1, 2, 4 or 8.

Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
base_register can be EBX or EBP
index_register can be ESI, EDI, EAX, ECX or EDX

You can use any combination of any general-purpose registers. The only exception is you can't use ESP as an index (e.g. [EAX+ESP*2] - not encodable), but you can still use ESP as a base (e.g. [ESP+EAX*8])

Title: Re: Pointers, etc
Post by: QvasiModo on September 27, 2005, 03:30:31 PM

Quote from: MazeGen on September 27, 2005, 12:57:29 PM
Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
scale_constant can be 1, 2 or 4

Scale can be 1, 2, 4 or 8.

Didn't know that! :thumbu

Quote from: MazeGen on September 27, 2005, 12:57:29 PM
Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
base_register can be EBX or EBP
index_register can be ESI, EDI, EAX, ECX or EDX

You can use any combination of any general-purpose registers. The only exception is you can't use ESP as an index (e.g. [EAX+ESP*2] - not encodable), but you can still use ESP as a base (e.g. [ESP+EAX*8])

My bad, I must have been confused with real mode addressing :red

Title: Re: Pointers, etc
Post by: MazeGen on September 27, 2005, 03:38:37 PM

Quote from: QvasiModo on September 27, 2005, 03:30:31 PM
Quote from: MazeGen on September 27, 2005, 12:57:29 PM
Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
scale_constant can be 1, 2 or 4

Scale can be 1, 2, 4 or 8.

Didn't know that! :thumbu

Yeah, that's because Scale is encoded in two bits:

2^0 = 1
2^1 = 2
2^2 = 4
2^3 = 8

Quote from: QvasiModo on September 27, 2005, 03:30:31 PM
Quote from: MazeGen on September 27, 2005, 12:57:29 PM
Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
base_register can be EBX or EBP
index_register can be ESI, EDI, EAX, ECX or EDX

You can use any combination of any general-purpose registers. The only exception is you can't use ESP as an index (e.g. [EAX+ESP*2] - not encodable), but you can still use ESP as a base (e.g. [ESP+EAX*8])

My bad, I must have been confused with real mode addressing :red

:wink It seems so. In 16-bit addressing, we can use only BP or BX as a base and SI or DI as an index. No scale here.

Title: Re: Pointers, etc
Post by: Ratch on September 27, 2005, 04:12:18 PM

To the Ineffable All,
If you use EBP as a index register, it will cost you an extra byte. MASM in not smart enough to to reverse the register roles, as is shown in the code below. Ratch

00000004 8B 14 29       MOV EDX,[EBP+ECX]
00000000 8B 54 0D 00    MOV EDX,[ECX+EBP] ;functionally equavalent to previous instruction
00000007 FF 34 28       PUSH [EBP+EAX]
0000000A FF 74 05 00    PUSH [EAX+EBP] ;functionally equavalent to previous instruction

Title: Re: Pointers, etc
Post by: MazeGen on September 27, 2005, 04:33:36 PM

I thing it has nothing with a smartness. MASM just compile what you code. I, personally, like such behaviour of an assembler.

Title: Re: Pointers, etc
Post by: Ratch on September 27, 2005, 05:08:04 PM

Quote
I thing it has nothing with a smartness. MASM just compile what you code. I, personally, like such behaviour of an assembler.

MazeGen,

Actually, it apprears that coding EBP as a BASE register causes an extra byte to be generated. MASM appears to ASSUME that the first register is the index register, unless one specifies otherwise. Is that what you mean when you say that it assembles what you code? Who says that the first register is an index register? I think that it should select the shortest instruction if there is an ambiguity about which is what. Perhaps it has something to do with dumbness. Ratch

00000000 8B 14 29       MOV EDX,[EBP+ECX] ;assumes ECX is base
00000003 8B 54 0D 00    MOV EDX,[ECX+EBP] ;assumes EBP is base
00000007 8B 14 29       MOV EDX,[1*EBP+ECX] ;explicit ECX is base
0000000A 8B 14 29       MOV EDX,[ECX+1*EBP] ;explicit ECX is base
0000000D 8B 54 0D 00    MOV EDX,[1*ECX+EBP] ;explicit EBX is base
00000011 8B 54 0D 00    MOV EDX,[EBP+1*ECX] ;explicit EBX is base

Title: Re: Pointers, etc
Post by: MazeGen on September 27, 2005, 05:27:08 PM

Sorry, Ratch, I haven't read your post carefully.

The answer is that you should use strict MASM syntax when you expect such encoding:

Quote from: chap_03.doc
... If scaling is not used, the first register is the base. ...

...
mov eax, [edx][ebp] ; EDX base (first - seg DS)
mov eax, [ebp][edx] ; EBP base (first - seg SS)
...

Code Select


 00000000  8B 14 29		MOV EDX,[EBP][ECX]   ; EBP is base
 00000003  8B 54 0D 00		MOV EDX,[ECX][EBP]   ; ECX is base

Title: Re: Pointers, etc (Base address mode)
Post by: tenkey on September 28, 2005, 12:35:11 AM

Code Select

8B 4D 00   mov ecx,[ebp]
8B 4D 01   mov ecx,[ebp+1]
8B 0E      mov ecx,[esi]
8B 4E 01   mov ecx,[esi+1]

If you try to encode ecx,[ebp] as 0D, the processor will use the next four bytes as an address.

Code Select

8B 0D 12345678 mov ecx,[ds:12345678H]

Title: Re: Pointers, etc
Post by: Ratch on September 28, 2005, 01:32:32 AM

MazeGen,
The documentation is WRONG! The first register is truly ASSUMED to be the index, not the base register. That is easily determined by comparing your two example instructions with the explicit instructions in my example. Ratch

00000000 8B 14 29       MOV EDX,[EBP+ECX]
00000003 8B 54 0D 00   MOV EDX,[ECX+EBP]
00000007 8B 14 29       MOV EDX,[1*EBP+ECX]
0000000A 8B 14 29       MOV EDX,[ECX+1*EBP]
0000000D 8B 54 0D 00    MOV EDX,[1*ECX+EBP]
00000011 8B 54 0D 00    MOV EDX,[EBP+1*ECX]
00000015 8B 54 0D 00 mov edx, [ecx][ebp] ; ECX index, EBP base
00000019 8B 14 29     mov edx, [ebp][ecx] ; EBP index, ECX base

Title: Re: Pointers, etc
Post by: MazeGen on September 28, 2005, 05:35:06 AM

My apologies, Ratch. I have to be distracted these days or what :(
I use these two (1 (http://www.sandpile.org/ia32/opc_rm32.htm), 2 (http://www.sandpile.org/ia32/opc_sib.htm)) tables where it is clear that you are right.

I stop posting to this topic to discontinue the confusion :red

Title: Re: Pointers, etc
Post by: QvasiModo on September 28, 2005, 03:21:56 PM

Quote from: MazeGen on September 28, 2005, 05:35:06 AM
I stop posting to this topic to discontinue the confusion :red

Been there too ;)

The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: redskull on September 22, 2005, 03:09:50 PM