I think i've got a pretty good grasp of this stuff, but I was hoping one of the mavens around here could just clarify it all to make certain I'm not confused: What's the difference between a using a pointer (DWORD PTR) to a variable, using OFFSET before the variable, using ADDR before the variable, and using brackets around a variable? They all seem somewhat alike, and at the same time very different.
thanks
OFFSET gives you a constant memory address that is known at assembly time.
ADDR can do the same as OFFSET in some cases, and additionally generates assembly codes to calculate the address of local variables, which can only be known at runtime.
A pointer is a variable which in turn contains the memory address of another variable. Being a variable, this value can be changed in runtime. The DWORD PTR operator lets you use a pointer located at a register.
Some code examples:
; this loads the contents of SomeVariable into ECX
MOV ECX, SomeVariable
; loads the memory address of SomeVariable into EAX
MOV EAX, OFFSET SomeVariable
; reads into EDX the contents of SomeVariable, using the pointer in EAX
; so now ECX and EDX have the same value
MOV EDX, DWORD PTR [EAX]
; calls a procedure, passing as a parameter the memory address of a local variable
INVOKE SomeProcedure, ADDR SomeLocalVariable
; another way to do the same as above
LEA EAX, SomeLocalVariable
INVOKE SomeProcedure, EAX
; this one is wrong! it's passing the *value* of SomeLocalVariable instead of it's *address*
MOV EAX, SomeLocalVariable
INVOKE SomeProcedure, EAX
Hope this makes things a bit clearer! :U
Question:
so if you've got a variable (array) declared like this:
blocks dd 40, 50, 20, 20, 90, 100, 20, 20
then later in code you access the first element by pointing an index register or general register to the beginning like this: (?)
lea eax, blocks
or
mov edi, blocks
or does it have to be lea...
THEN once you've pointed it there, you access the variable by doing this:
mov firstelement, [eax]
or
mov firstelement, [edi]
AND you can reference further elements within the array by adding offsets in the []s
mov otherelements, [eax + sizeofelement * (2,4,8...)]
optionally using the DWORD or WORD PTR or BYTE PTR...
am I right? I'm trying this now, so I'll know very shortly, but I have a feeling I may be confused very shortly...
later,
jeff c
:bdg
Hi redskull,
A "pointer" is actually a higher level idea but its easy to do in assembler. What you have is data in memory of whatever type and "where" it is located is its "address". When you want to store that "address" somewhere you place the address in a variable that is alocated somewhere else and that variable "points" to the address.
Now you can get the address in a number of ways depending on where the data is located, if it is in the .DATA or .DATA? section, you use its OFFSET which is determined at assembly time. With data allocated on the stack at runtime such as LOCAL variables, you get the "address" of that LOCAL using LEA or in an "invoke" statement you use ADDR to do the same. You also work with an address when you allocate memory dynamically you copy the return value to a variable which is the pointer to that memory.
As a newbie myself, I struggled with this also.
This thread is absolutely PERFECT. The replies from QvasiModo, OceanJeff32, and Hutch are , as far as I am concerned, Textbook.
If I had seen this thread back in the day, it would have saved me many Google hours of reading up on C pointers ( which I know now is related, but at the time did not ).
Kick ass explanations guys.
Trope
thanks to everybody for the responses; just one more question; isn't all this sort of syntax tra-jickery? I mean, in the long run, keeping a pointer variable does pretty much the same thing as LEAing the variable into a register, which is pretty much what OFFSET does when the code gets generated, right? Some might be faster than others, but are there specific situations where one is better (or required) to have?
redskull,
You can learn the hard way like most of us did or learn the right way and save yourself the problems. OFFSET literally means a location in a binary file and this is determined at assembly time. A LOCAL variable is set at runtime on the stack of each procedure and it is NOT the same thing. It just depends on how much grief you want finding out. :bg
Hello!
Just 2 remarks:
Quote from: OceanJeff32 on September 23, 2005, 11:47:28 PM
lea eax, blocks
or
mov edi, blocks
or does it have to be lea...
Yes, it has to be
lea. This way
mov edi,blocks will load the first dword at label blocks into edi, that is edi=40h.
Quote from: OceanJeff32 on September 23, 2005, 11:47:28 PM
mov otherelements, [eax + sizeofelement * (2,4,8...)]
Scaling, I mean adding a reg*2,4,8 to the address calculation is valid only for the
lea instruction. Am I right?
This is for sure:
lea can be used this way:
lea reg1,[label+offset+reg2*1*2/4/8]
This loads the offset of label+offset+reg2*1/2/4/8 into reg1.
Example:
blocks dd 10203040h, 50607080h, 90A0B0C0h, D0E0F011h
mov ecx,2 ; index of 3rd element (0 indexes the first element!)
lea edi,[blocks+3+ecx*4]
mov al,BYTE PTR [edi] ; get the 4th byte of the DWORD at index from table blocks
As a result al will contain the MSB byte of the 3rd dword in the table, that is 90h.
Maybe I am not right, but I think two registers can be on the right side of the instruction at one time. And one of them can have scaling.
I guess, this thread is totally about something difficult. This is proven by the fact, that every 'modern' and fashionate high level languages banish the pointers (JAVA, C#) . The thing gets really complicated when using arrays of pointers and the array itself is accessed by a pointer too. That is a real brain challange when debugging! :))Greets, Gábor
Quote from: gabor on September 26, 2005, 08:18:27 AM
Scaling, I mean adding a reg*2,4,8 to the address calculation is valid only for the lea instruction. Am I right?
No, the address calculation syntax is the same for all instructions that can use registers for memory addresses. So scaling and all the other goodies can be used with LEA, MOV, PUSH, POP, etc... they can even be used with CALL and JMP. :)
Quote from: gabor on September 26, 2005, 08:18:27 AM
Maybe I am not right, but I think two registers can be on the right side of the instruction at one time. And one of them can have scaling.
You're right. It can be done like this:
[base_register + index_register * scale_constant + displacement_constant]where:
base_register can be EBX or EBP
index_register can be ESI, EDI, EAX, ECX or EDX
scale_constant can be 1, 2 or 4
displacement_constant can be any constant
and of course they're all optional. The examples from my previous post only used the
displacement_constant part.
Quote from: gabor on September 26, 2005, 08:18:27 AM
I guess, this thread is totally about something difficult. This is proven by the fact, that every 'modern' and fashionate high level languages banish the pointers (JAVA, C#) . The thing gets really complicated when using arrays of pointers and the array itself is accessed by a pointer too. That is a real brain challange when debugging! :))
You're right, pointers are difficult enough and the many Intel addressing modes make this topic very complex, but worthwile. And while Java may be hiding pointers for the programmer's sake, things have to be done with them "under the hood"... :wink
The trick with the Intel compex addressing modes is to simply learn how it works. The variations are built into the processor as different opcodes and once they are understood, they are very clear and precise to use.
An array of DWORD member size is nothing more that a location in memory, dynamic, stack or data section, with a member count that you can store somewhere if you wish. The complex addressing modes give you a fast and efficient way to access array members. With the info that Quasimodo has posted for you, just learn how it works. With an instruction something like,
mov eax, [ebx+ecx*4+16]
It breaks up into its component parts easily.
ebx is the base address.
ecx is that array index
the +4 is the scale of the index to set the array data size and the trailing +16 is what is called a displacement which adjusts the address by that many bytes.
When the instruction does not set the data size it uses a size specifier so code like,
push [ebx+ecx*4]
properly need a data size specifier.
push DWORD PTR [ebx+ecx*4]
as this determines which opcode the assembler will choose for the output.
You're right. It can be done like this: [base_register + index_register * scale_constant + displacement_constant]
where:
base_register can be EBX or EBP
index_register can be ESI, EDI, EAX, ECX or EDX
scale_constant can be 1, 2 or 4
displacement_constant can be any constant
Ok, QUESTION:
What happens if you use something other than ebx or ebp for base register? This may be my problem... :dazzled:
and I take it, the index register can be set to any valid amount, as long as it's not greater than the size of the array you are accessing ??
and...the displacement constant should be defined at compile time ??
Very cool!
AND ... (one last AND, for the moment anyways...) How helpful is that XLAT instruction for accessing array elements, and is the DS register still used by the system to keep track of your data?? I assume that all the data is globbed together into one big data section by the assembler when you finally compile your program, and all the code is grouped together, unless you tell the compiler to do otherwise...
Just curious, because the XLAT instruction gives you an offset (?) AS COMPUTED from the value in the ds register, AND the value in the ebx register.
::)
Well, enough questions for now, I'll have more later, hopefully I'll be answering some questions around here too!
later,
jeff c
:U
Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
and...the displacement constant should be defined at compile time ??
I guess yes, it must be set at compile time, that's why it is called a constant.
Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
AND ... (one last AND, for the moment anyways...) How helpful is that XLAT instruction for accessing array elements, and is the DS register still used by the system to keep track of your data??
I, personally never used the XLAT not even in real mode. Now, in 32bit mode I rather use these cool addressing methods, I simply enjoy the freedom to index with the register I prefer (not only with esi or edi). So if I need an array to be accessed I write the index into a free register (before, I try to set the element size of the array to a power of 2, use padding if necessary) and use scaling to address the indexed element.
XLAT
mov ebx,offset Table
mov al,12 ; index
xlat
; al contains byte at offset Table+12
is equivalent to
mov al,offset Table+12
Is this right so? Or is it lame not to use XLAT? And BTW what do you say about the other more complex instructions, like string instructions? I've heard that they are rather slow and it is better to use a simple MOV/SUB/JNZ or JNC combination, like this:
lea esi,Src
lea edi,Dst
mov ecx,1000h
rep movsd
or
lea esi,Src
lea edi,Dst
mov ecx,0FFFh
@@:
mov eax,[esi+ecx*4]
mov [edi+ecx*4],eax
sub ecx,1
jnc @B
I hope I didn't mess up anything! One important thing: in the second way the count is set to 0FFFh and not to 1000h. This is because the loop ends when ecx underflows (jnc @B), so there are 0..0FFFh=1000h cycles!
And of course: Guys! Tonns of thanks for the important and usefull infos !Greets, Gábor
Jeff,
In most circumstaces you can use any register as a base register in complex addressing mode. There is also an opcode that lets you use a named address wich is an assembly time data section address, usually something like a table. You can use XLATB but its slow in comparison to incremented pointers with a table.
Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
scale_constant can be 1, 2 or 4
Scale can be 1, 2, 4
or 8.
Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
base_register can be EBX or EBP
index_register can be ESI, EDI, EAX, ECX or EDX
You can use any combination of any general-purpose registers. The only exception is you can't use ESP as an index (e.g. [EAX+ESP*2] - not encodable), but you can still use ESP as a base (e.g. [ESP+EAX*8])
Quote from: MazeGen on September 27, 2005, 12:57:29 PM
Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
scale_constant can be 1, 2 or 4
Scale can be 1, 2, 4 or 8.
Didn't know that! :thumbu
Quote from: MazeGen on September 27, 2005, 12:57:29 PM
Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
base_register can be EBX or EBP
index_register can be ESI, EDI, EAX, ECX or EDX
You can use any combination of any general-purpose registers. The only exception is you can't use ESP as an index (e.g. [EAX+ESP*2] - not encodable), but you can still use ESP as a base (e.g. [ESP+EAX*8])
My bad, I must have been confused with real mode addressing :red
Quote from: QvasiModo on September 27, 2005, 03:30:31 PM
Quote from: MazeGen on September 27, 2005, 12:57:29 PM
Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
scale_constant can be 1, 2 or 4
Scale can be 1, 2, 4 or 8.
Didn't know that! :thumbu
Yeah, that's because Scale is encoded in two bits:
2^0 = 1
2^1 = 2
2^2 = 4
2^3 = 8
Quote from: QvasiModo on September 27, 2005, 03:30:31 PM
Quote from: MazeGen on September 27, 2005, 12:57:29 PM
Quote from: OceanJeff32 on September 27, 2005, 05:35:08 AM
base_register can be EBX or EBP
index_register can be ESI, EDI, EAX, ECX or EDX
You can use any combination of any general-purpose registers. The only exception is you can't use ESP as an index (e.g. [EAX+ESP*2] - not encodable), but you can still use ESP as a base (e.g. [ESP+EAX*8])
My bad, I must have been confused with real mode addressing :red
:wink It seems so. In 16-bit addressing, we can use only BP or BX as a base and SI or DI as an index. No scale here.
To the Ineffable All,
If you use EBP as a index register, it will cost you an extra byte. MASM in not smart enough to to reverse the register roles, as is shown in the code below. Ratch
00000004 8B 14 29 MOV EDX,[EBP+ECX]
00000000 8B 54 0D 00 MOV EDX,[ECX+EBP] ;functionally equavalent to previous instruction
00000007 FF 34 28 PUSH [EBP+EAX]
0000000A FF 74 05 00 PUSH [EAX+EBP] ;functionally equavalent to previous instruction
I thing it has nothing with a smartness. MASM just compile what you code. I, personally, like such behaviour of an assembler.
Quote
I thing it has nothing with a smartness. MASM just compile what you code. I, personally, like such behaviour of an assembler.
MazeGen,
Actually, it apprears that coding EBP as a BASE register causes an extra byte to be generated. MASM appears to ASSUME that the first register is the index register, unless one specifies otherwise. Is that what you mean when you say that it assembles what you code? Who says that the first register is an index register? I think that it should select the shortest instruction if there is an ambiguity about which is what. Perhaps it has something to do with dumbness. Ratch
00000000 8B 14 29 MOV EDX,[EBP+ECX] ;assumes ECX is base
00000003 8B 54 0D 00 MOV EDX,[ECX+EBP] ;assumes EBP is base
00000007 8B 14 29 MOV EDX,[1*EBP+ECX] ;explicit ECX is base
0000000A 8B 14 29 MOV EDX,[ECX+1*EBP] ;explicit ECX is base
0000000D 8B 54 0D 00 MOV EDX,[1*ECX+EBP] ;explicit EBX is base
00000011 8B 54 0D 00 MOV EDX,[EBP+1*ECX] ;explicit EBX is base
Sorry, Ratch, I haven't read your post carefully.
The answer is that you should use strict MASM syntax when you expect such encoding:
Quote from: chap_03.doc
... If scaling is not used, the first register is the base. ...
...
mov eax, [edx][ebp] ; EDX base (first - seg DS)
mov eax, [ebp][edx] ; EBP base (first - seg SS)
...
00000000 8B 14 29 MOV EDX,[EBP][ECX] ; EBP is base
00000003 8B 54 0D 00 MOV EDX,[ECX][EBP] ; ECX is base
8B 4D 00 mov ecx,[ebp]
8B 4D 01 mov ecx,[ebp+1]
8B 0E mov ecx,[esi]
8B 4E 01 mov ecx,[esi+1]
If you try to encode ecx,[ebp] as 0D, the processor will use the next four bytes as an address.
8B 0D 12345678 mov ecx,[ds:12345678H]
MazeGen,
The documentation is WRONG! The first register is truly ASSUMED to be the index, not the base register. That is easily determined by comparing your two example instructions with the explicit instructions in my example. Ratch
00000000 8B 14 29 MOV EDX,[EBP+ECX]
00000003 8B 54 0D 00 MOV EDX,[ECX+EBP]
00000007 8B 14 29 MOV EDX,[1*EBP+ECX]
0000000A 8B 14 29 MOV EDX,[ECX+1*EBP]
0000000D 8B 54 0D 00 MOV EDX,[1*ECX+EBP]
00000011 8B 54 0D 00 MOV EDX,[EBP+1*ECX]
00000015 8B 54 0D 00 mov edx, [ecx][ebp] ; ECX index, EBP base
00000019 8B 14 29 mov edx, [ebp][ecx] ; EBP index, ECX base
My apologies, Ratch. I have to be distracted these days or what :(
I use these two (1 (http://www.sandpile.org/ia32/opc_rm32.htm), 2 (http://www.sandpile.org/ia32/opc_sib.htm)) tables where it is clear that you are right.
I stop posting to this topic to discontinue the confusion :red
Quote from: MazeGen on September 28, 2005, 05:35:06 AM
I stop posting to this topic to discontinue the confusion :red
Been there too ;)