Pointers, etc

redskull · September 22, 2005, 03:09:50 PM

I think i've got a pretty good grasp of this stuff, but I was hoping one of the mavens around here could just clarify it all to make certain I'm not confused: What's the difference between a using a pointer (DWORD PTR) to a variable, using OFFSET before the variable, using ADDR before the variable, and using brackets around a variable? They all seem somewhat alike, and at the same time very different.
thanks

QvasiModo · September 22, 2005, 03:29:51 PM

OFFSET gives you a constant memory address that is known at assembly time.

ADDR can do the same as OFFSET in some cases, and additionally generates assembly codes to calculate the address of local variables, which can only be known at runtime.

A pointer is a variable which in turn contains the memory address of another variable. Being a variable, this value can be changed in runtime. The DWORD PTR operator lets you use a pointer located at a register.

Some code examples:

Code Select



; this loads the contents of SomeVariable into ECX
MOV ECX, SomeVariable

; loads the memory address of SomeVariable into EAX
MOV EAX, OFFSET SomeVariable

; reads into EDX the contents of SomeVariable, using the pointer in EAX
; so now ECX and EDX have the same value
MOV EDX, DWORD PTR [EAX]

; calls a procedure, passing as a parameter the memory address of a local variable
INVOKE SomeProcedure, ADDR SomeLocalVariable

; another way to do the same as above
LEA EAX, SomeLocalVariable
INVOKE SomeProcedure, EAX

; this one is wrong! it's passing the *value* of SomeLocalVariable instead of it's *address*
MOV EAX, SomeLocalVariable
INVOKE SomeProcedure, EAX

Hope this makes things a bit clearer! :U

OceanJeff32 · September 23, 2005, 11:47:28 PM

Question:

so if you've got a variable (array) declared like this:

blocks dd 40, 50, 20, 20, 90, 100, 20, 20

then later in code you access the first element by pointing an index register or general register to the beginning like this: (?)

lea eax, blocks

or

mov edi, blocks

or does it have to be lea...

THEN once you've pointed it there, you access the variable by doing this:

mov firstelement, [eax]

or

mov firstelement, [edi]

AND you can reference further elements within the array by adding offsets in the []s

mov otherelements, [eax + sizeofelement * (2,4,8...)]

optionally using the DWORD or WORD PTR or BYTE PTR...

am I right? I'm trying this now, so I'll know very shortly, but I have a feeling I may be confused very shortly...

later,

jeff c
:bdg

hutch-- · September 24, 2005, 12:02:48 AM

Hi redskull,

A "pointer" is actually a higher level idea but its easy to do in assembler. What you have is data in memory of whatever type and "where" it is located is its "address". When you want to store that "address" somewhere you place the address in a variable that is alocated somewhere else and that variable "points" to the address.

Now you can get the address in a number of ways depending on where the data is located, if it is in the .DATA or .DATA? section, you use its OFFSET which is determined at assembly time. With data allocated on the stack at runtime such as LOCAL variables, you get the "address" of that LOCAL using LEA or in an "invoke" statement you use ADDR to do the same. You also work with an address when you allocate memory dynamically you copy the return value to a variable which is the pointer to that memory.

Trope · September 24, 2005, 01:07:48 AM

As a newbie myself, I struggled with this also.

This thread is absolutely PERFECT. The replies from QvasiModo, OceanJeff32, and Hutch are , as far as I am concerned, Textbook.

If I had seen this thread back in the day, it would have saved me many Google hours of reading up on C pointers ( which I know now is related, but at the time did not ).

Kick ass explanations guys.

Trope

redskull · September 24, 2005, 03:25:34 AM

thanks to everybody for the responses; just one more question; isn't all this sort of syntax tra-jickery? I mean, in the long run, keeping a pointer variable does pretty much the same thing as LEAing the variable into a register, which is pretty much what OFFSET does when the code gets generated, right? Some might be faster than others, but are there specific situations where one is better (or required) to have?

hutch-- · September 24, 2005, 05:12:58 AM

redskull,

You can learn the hard way like most of us did or learn the right way and save yourself the problems. OFFSET literally means a location in a binary file and this is determined at assembly time. A LOCAL variable is set at runtime on the stack of each procedure and it is NOT the same thing. It just depends on how much grief you want finding out. :bg

gabor · September 26, 2005, 08:18:27 AM

Hello!

Just 2 remarks:

Quote from: OceanJeff32 on September 23, 2005, 11:47:28 PM
lea eax, blocks

or

mov edi, blocks

or does it have to be lea...

Yes, it has to be lea. This way mov edi,blocks will load the first dword at label blocks into edi, that is edi=40h.

Quote from: OceanJeff32 on September 23, 2005, 11:47:28 PM

mov otherelements, [eax + sizeofelement * (2,4,8...)]

Scaling, I mean adding a reg*2,4,8 to the address calculation is valid only for the lea instruction. Am I right?
This is for sure:
lea can be used this way:
lea reg1,[label+offset+reg2*1*2/4/8]
This loads the offset of label+offset+reg2*1/2/4/8 into reg1.
Example:

Code Select


blocks  dd  10203040h, 50607080h, 90A0B0C0h, D0E0F011h

mov  ecx,2                ; index of 3rd element (0 indexes the first element!)
lea  edi,[blocks+3+ecx*4]
mov al,BYTE PTR [edi]     ; get the 4th byte of the DWORD at index from table blocks

As a result al will contain the MSB byte of the 3rd dword in the table, that is 90h.

Maybe I am not right, but I think two registers can be on the right side of the instruction at one time. And one of them can have scaling.

I guess, this thread is totally about something difficult. This is proven by the fact, that every 'modern' and fashionate high level languages banish the pointers (JAVA, C#) . The thing gets really complicated when using arrays of pointers and the array itself is accessed by a pointer too. That is a real brain challange when debugging! :))

Greets, Gábor

QvasiModo · September 26, 2005, 03:43:52 PM

Quote from: gabor on September 26, 2005, 08:18:27 AM
Scaling, I mean adding a reg*2,4,8 to the address calculation is valid only for the lea instruction. Am I right?

No, the address calculation syntax is the same for all instructions that can use registers for memory addresses. So scaling and all the other goodies can be used with LEA, MOV, PUSH, POP, etc... they can even be used with CALL and JMP. :)

Quote from: gabor on September 26, 2005, 08:18:27 AM
Maybe I am not right, but I think two registers can be on the right side of the instruction at one time. And one of them can have scaling.

You're right. It can be done like this: [base_register + index_register * scale_constant + displacement_constant]

where:
base_register can be EBX or EBP
index_register can be ESI, EDI, EAX, ECX or EDX
scale_constant can be 1, 2 or 4
displacement_constant can be any constant

and of course they're all optional. The examples from my previous post only used the displacement_constant part.

Quote from: gabor on September 26, 2005, 08:18:27 AM
I guess, this thread is totally about something difficult. This is proven by the fact, that every 'modern' and fashionate high level languages banish the pointers (JAVA, C#) . The thing gets really complicated when using arrays of pointers and the array itself is accessed by a pointer too. That is a real brain challange when debugging! :))

You're right, pointers are difficult enough and the many Intel addressing modes make this topic very complex, but worthwile. And while Java may be hiding pointers for the programmer's sake, things have to be done with them "under the hood"... :wink

hutch-- · September 27, 2005, 01:36:07 AM

The trick with the Intel compex addressing modes is to simply learn how it works. The variations are built into the processor as different opcodes and once they are understood, they are very clear and precise to use.

An array of DWORD member size is nothing more that a location in memory, dynamic, stack or data section, with a member count that you can store somewhere if you wish. The complex addressing modes give you a fast and efficient way to access array members. With the info that Quasimodo has posted for you, just learn how it works. With an instruction something like,

Code Select


    mov eax, [ebx+ecx*4+16]

It breaks up into its component parts easily.

ebx is the base address.
ecx is that array index
the +4 is the scale of the index to set the array data size and the trailing +16 is what is called a displacement which adjusts the address by that many bytes.

When the instruction does not set the data size it uses a size specifier so code like,

Code Select


push [ebx+ecx*4]

properly need a data size specifier.

Code Select


push DWORD PTR [ebx+ecx*4]

as this determines which opcode the assembler will choose for the output.

OceanJeff32 · September 27, 2005, 05:35:08 AM

You're right. It can be done like this: [base_register + index_register * scale_constant + displacement_constant]

where:
base_register can be EBX or EBP
index_register can be ESI, EDI, EAX, ECX or EDX
scale_constant can be 1, 2 or 4
displacement_constant can be any constant

Ok, QUESTION:

What happens if you use something other than ebx or ebp for base register? This may be my problem... :dazzled:

and I take it, the index register can be set to any valid amount, as long as it's not greater than the size of the array you are accessing ??

and...the displacement constant should be defined at compile time ??

Very cool!

AND ... (one last AND, for the moment anyways...) How helpful is that XLAT instruction for accessing array elements, and is the DS register still used by the system to keep track of your data?? I assume that all the data is globbed together into one big data section by the assembler when you finally compile your program, and all the code is grouped together, unless you tell the compiler to do otherwise...

Just curious, because the XLAT instruction gives you an offset (?) AS COMPUTED from the value in the ds register, AND the value in the ebx register.

::)

Well, enough questions for now, I'll have more later, hopefully I'll be answering some questions around here too!

later,

jeff c
:U

gabor · September 27, 2005, 07:28:05 AM

Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
and...the displacement constant should be defined at compile time ??

I guess yes, it must be set at compile time, that's why it is called a constant.

Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
AND ... (one last AND, for the moment anyways...) How helpful is that XLAT instruction for accessing array elements, and is the DS register still used by the system to keep track of your data??

I, personally never used the XLAT not even in real mode. Now, in 32bit mode I rather use these cool addressing methods, I simply enjoy the freedom to index with the register I prefer (not only with esi or edi). So if I need an array to be accessed I write the index into a free register (before, I try to set the element size of the array to a power of 2, use padding if necessary) and use scaling to address the indexed element.

Code Select


XLAT

    mov    ebx,offset Table
    mov    al,12           ; index
    xlat
; al contains byte at offset Table+12

is equivalent to
    mov    al,offset Table+12

Is this right so? Or is it lame not to use XLAT? And BTW what do you say about the other more complex instructions, like string instructions? I've heard that they are rather slow and it is better to use a simple MOV/SUB/JNZ or JNC combination, like this:

Code Select


    lea    esi,Src
    lea    edi,Dst
    mov    ecx,1000h
    rep movsd

or

    lea    esi,Src
    lea    edi,Dst
    mov    ecx,0FFFh
@@:
    mov    eax,[esi+ecx*4]
    mov    [edi+ecx*4],eax
    sub    ecx,1
    jnc @B

I hope I didn't mess up anything! One important thing: in the second way the count is set to 0FFFh and not to 1000h. This is because the loop ends when ecx underflows (jnc @B), so there are 0..0FFFh=1000h cycles!

And of course: Guys! Tonns of thanks for the important and usefull infos !

Greets, Gábor

hutch-- · September 27, 2005, 10:16:14 AM

Jeff,

In most circumstaces you can use any register as a base register in complex addressing mode. There is also an opcode that lets you use a named address wich is an assembly time data section address, usually something like a table. You can use XLATB but its slow in comparison to incremented pointers with a table.

MazeGen · September 27, 2005, 12:57:29 PM

Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
scale_constant can be 1, 2 or 4

Scale can be 1, 2, 4 or 8.

Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
base_register can be EBX or EBP
index_register can be ESI, EDI, EAX, ECX or EDX

You can use any combination of any general-purpose registers. The only exception is you can't use ESP as an index (e.g. [EAX+ESP*2] - not encodable), but you can still use ESP as a base (e.g. [ESP+EAX*8])

QvasiModo · September 27, 2005, 03:30:31 PM

Quote from: MazeGen on September 27, 2005, 12:57:29 PM
Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
scale_constant can be 1, 2 or 4

Scale can be 1, 2, 4 or 8.

Didn't know that! :thumbu

Quote from: MazeGen on September 27, 2005, 12:57:29 PM
Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
base_register can be EBX or EBP
index_register can be ESI, EDI, EAX, ECX or EDX

You can use any combination of any general-purpose registers. The only exception is you can't use ESP as an index (e.g. [EAX+ESP*2] - not encodable), but you can still use ESP as a base (e.g. [ESP+EAX*8])

My bad, I must have been confused with real mode addressing :red

News:

Pointers, etc

QvasiModo

Trope

QvasiModo

MazeGen

QvasiModo