Hi,
I have read somewhere that OFFSET is used to access the memory of global variables, while ADDR is used to access the memory of either global or local variables. So why in INVOKE statements we use ADDR to access the memory of an argument when a pointer is required (and the data is global)? Also, since the ADDR is capable to access both global and local why don't we use only ADDR directive?
OFFSET uses the MOV instruction
ADDR uses the LEA instruction (Load Effective Address)
probably one clock cycle difference :P
mov edx,offset SomeLabel
;or
lea edx,[ebp+4] ;stack address of a local
however, the LEA is a little more versatile in what it can do
notice that, in an INVOKE directive, we intend to PUSH the operand...
push offset SomeLabel
;or
lea eax,[ebp+4] ;stack address of a local
push eax
so code size is also a factor
it is good to know which kind of variable you are working with :bg
Hi guro,
The operator OFFSET literally means what it says, its a location in the executable file from its start address. Almost exclusively this means the address of data stored in either the initialised or uninitialised data sections of the file. Data stored this way has GLOBAL scope.
The MASM notation ADDR is an abstraction of an address which can be either an OFFSET or a dynamically created memory location on the stack, normally called a LOCAL variable.
When the variable is a LOCAL created on the stack, the underlying code uses LEA to get the effective address of that variable.
LOCAL myvar :DWORD
.....
lea eax, myvar ; load the address of myvar into the EAX register
Please verify if the following is correct: When we use the INVOKE directive, it is actually expanded to a series of PUSH instructions followed by a CALL instruction to the selected API routine. If this is correct, then if the API routine has an argument that is a pointer for example to a string, which is declared to .data section, then it would be logical to access the memory address of the string by using OFFSET directive, like in PUSH OFFSET var , right? But we write INVOKE API_Name, ADDR var ! why? (i know you maybe have already explained it but i it's not clear to me yet!) .
Also, can you tell me what is the equivalent of an OFFSET and ADDR directives? I mean when the assembler matches the OFFSET var or ADDR var what code it generates? Does it depends from the context ?
@hutch
In the code you wrote don't we have to add the ADDR directive like in: lea eax, ADDR myvar ?
@ dedndave
OFFSET can be used in PUSH instructions also, not only in MOV. So what do you mean when you say that ADDR uses the LEA instruction, that is actually expanded to a LEA instruction internally in the assembler?
the ADDR-operator allows to addres local variables,which are relative to ebp.
QuoteLOCAL myvar:DWORD
...
invoke function,ADDR myvar
->
Quotelea eax,myvar ; whereas myvar is something like [ebp-12]
push eax
call function
Quote from: guro on April 06, 2011, 01:13:29 PMit would be logical to access the memory address of the string by using OFFSET directive, like in PUSH OFFSET var , right? But we write INVOKE API_Name, ADDR var !
The invoke macro doesn't care if you type addr or offset, except it chokes if you try to use offset on a local var.
It is a matter of taste. Personally, I use offset for global vars and addr for local vars, it reminds me of the variable's scope.
If you want to pass eax, you may get this message:
error A2133:register value overwritten by INVOKE
Quoteinclude \masm32\include\masm32rt.inc
MyTest PROTO: DWORD, :DWORD
.code
AppName db "Masm32 is great!", 0
Hello db "A message:", 0
start:
invoke MyTest, offset AppName, addr Hello
exit
MyTest proc arg1:DWORD, arg2:DWORD
LOCAL buffer[1000]:BYTE
invoke lstrcpy, addr buffer, arg2
mov eax, arg1
MsgBox 0, eax, addr buffer, MB_OK
ret
MyTest endp
Try replacing eax with edx, and it works. The reason is that the invoke macro wants to use
lea eax, buffer but then detects that you want to push eax afterwards.
OFFSET can only be used with static labels known as assemble-time, i.e. .data section labels, not LOCALS. ADDR is smart enough to be used on either, but only as part of an INVOKE statment. To my knowledge, there is no MASM-language construct to get the address of a LOCAL outside of an INVOKE, and you must manually use the LEA instruction to calculate it (which is what ADDR does 'behind the scenes' during invoke).
-r
Hey, thanks all for the replies :U
Now i understand that: The OFFSET directive is used to access global variables, while the ADDR is used to access both local and/or global variables. The INVOKE directive works fine either by using OFFSET (for global) or ADDR directives (global/local), but we prefer to distinct their usage to support readability. Finally, the ADDR is actually translated by the assembler to the pair of instructions: LEA eax, var; PUSH eax
in order to determine the address of the local variable var, otherwise is translated like the OFFSET directive (that is the address is known at the compile time and is patched to the code).
One last question that came from the example of jj2007 : Does the assembler always use the EAX to push a local variable (when INVOKE is used) or we can set the register we want?
Quote from: redskull on April 06, 2011, 01:51:48 PM
ADDR is smart enough to be used on either, but only as part of an INVOKE statment.
Well, I wouldn't call ADDR smart, LEA is definitely not the way it should be done since it destroys a register with no good reason occasionally resulting in the error "Register value overwritten by INVOKE". We had this discussion in the compiler section quite a long time ago. A better way to accomplish the same thing is to use offsets from the contents of EBP or ESP. For example we know at assembly time that the first DWORD local (without the USES directive) is offset -4 bytes from the value in EBP, the second is -8 bytes, etc... So a more transparent way is to use the following:
; First DWORD local
push ebp
add DWORD PTR [esp],-4
call function
The LEA approach is only used in MASM (and perhaps JWASM) as far as I know.
guro,
I did not answer the question for practice, I bothered to explain the SEMANTICS of the reserve work OFFSET for a reason.
> The OFFSET directive is used to access global variables
MASM has data section entries which are global in scope, OFFSET does not address GLOBAL variables, it literally gives you an OFFSET from a known location, the executable file's start address. OFFSET means "HOW FAR" a data section extry is from the executable start address.
ok i understand now that is wrong to say that: "OFFSETdirective is used to access global variables". what actually happens is to use distances from base address (the executable load address) and find out the correct address of the global variable by adding its 'offset' to the base address. Is that what you mean?
Does the INVOKE uses always the EAX register in case one of its arguments is local and has to resolve its address?
Quote from: hutch-- on April 06, 2011, 02:22:26 PM
MASM has data section entries which are global in scope, OFFSET does not address GLOBAL variables, it literally gives you an OFFSET from a known location, the executable file's start address. OFFSET means "HOW FAR" a data section extry is from the executable start address.
Offset gives a segment relative offset, it has nothing to do with the start address of the executable. In the FLAT programming model it is the offset from 0. For example
push offset Here
Here:
pop eax
Assembles to:
01351021 . 68 26103501 PUSH TestOffs.01351026
01351026 . 58 POP EAX
Here: is certainly not 0x01351026 bytes from the start address of the executable.
It's +0 from the EIP's value ?
@donkey
You say that in FLAT programming model it is the offset from 0 and not relative to a segment value?
To use the EIP relative encoding<G>
00000000 E8 00000000 Start: call Here
00000005 58 Here: pop eax
Quote from: guro on April 06, 2011, 02:45:52 PM
It's +0 from the EIP's value ?
@donkey
You say that in FLAT programming model it is the offset from 0 and not relative to a segment value?
Yes and no, the OFFSET is still relative but since the segment register is set to zero it is equivalent to linear addressing. For example DS:040000 is the same as CS:040000 since both CS and DS contain 0. In the Windows FLAT model, only the FS segment register contains a value other than zero, which is why a segment override is needed for that particular one.
Values relative from EIP are used in CALL and short/near jumps.
So: is it possible not to use the ADDR directive explicitly in MASM but use the other mechanism (manually) that donkey proposed with EBP register?
Is it possible to select the register that INVOKE will use in case of an ADDR var argument to resolve its address?
No, you cannot select a register, EAX is always used. In MASM it is far too much work for the advantage gained to use the PUSH EBP method instead of the standard INVOKE implementation. The example just demonstrated a "better" way of getting the offset of a local variable, however using it would make your code unreadable and difficult to debug, it is meant to be implemented at the compiler level. It was mainly in response to Redskull's comment (shamelessly taken out of context):
Quoteand you must manually use the LEA instruction to calculate it
Quote from: donkey on April 06, 2011, 02:21:07 PM
So a more transparent way is to use the following:
; First DWORD local
push ebp
add DWORD PTR [esp],-4
call function
Feasible but one byte longer, Edgar. And adding/subtracting to a memory location is slow.
As long as it throws that overwrite error, I can live with trashing eax.
The other thing to remember with the standard ABI is that EAX can be destroyed by the called subroutine and is assumed to contain the return value afterwards. Using it as a scratch register should not be an issue.
If you want better (finer/explicit) register level management, you have to do it manually, which is what we all did before INVOKE existed.
Quote from: jj2007 on April 06, 2011, 03:58:03 PM
Feasible but one byte longer, Edgar. And adding/subtracting to a memory location is slow.
As long as it throws that overwrite error, I can live with trashing eax.
Its pretty much up to the individual, I don't agree with trashing registers for no good reason and extra overhead of a single byte still falls within a cache line so there is no second fetch.
Jochen you must be joking, any extra execution time is a non-issue, if you're worried about an extra cycle or two of execution time the function should have been in-lined and not a separate procedure which adds quite a bit of overhead.
Quote from: clive on April 06, 2011, 04:08:44 PM
The other thing to remember with the standard ABI is that EAX can be destroyed by the called subroutine and is assumed to contain the return value afterwards. Using it as a scratch register should not be an issue.
If you want better (finer/explicit) register level management, you have to do it manually, which is what we all did before INVOKE existed.
I have said that this was meant for compiler level implementation and was just to point out another way aside from LEA.
Quote from: donkey on April 06, 2011, 04:14:39 PM
Jochen you must be joking, any extra execution time is a non-issue, if you're worried about an extra cycle or two...
I have said that this was meant for compiler level implementation and was just to point out another way aside from LEA.
It is one extra cycle (I timed it :bg) and one extra byte, for no compelling reason except if you need eax as input to the subroutine.
By the way, for a long time we haven't had a proper war on register destruction (http://www.masm32.com/board/index.php?topic=9650.0) and the like. Ready to go?
:wink :thumbu
Quote from: jj2007 on April 06, 2011, 07:15:22 PM
By the way, for a long time we haven't had a proper war on register destruction (http://www.masm32.com/board/index.php?topic=9650.0) and the like. Ready to go?
:wink :thumbu
If you look through that thread I didn't get involved then and am not likely to get involved now. But it would be a boring war.
Quote from: donkey on April 06, 2011, 02:40:46 PM
Offset gives a segment relative offset, it has nothing to do with the start address of the executable. In the FLAT programming model it is the offset from 0.
Offset returns the value of the location counter for the segment. By default the location counter is set to 0 at the start of the segment, but within limits you can use the ORG directive to change it, at the start of the segment or elsewhere in the segment.
Quote from: MichaelW on April 06, 2011, 09:10:13 PM
Quote from: donkey on April 06, 2011, 02:40:46 PM
Offset gives a segment relative offset, it has nothing to do with the start address of the executable. In the FLAT programming model it is the offset from 0.
Offset returns the value of the location counter for the segment. By default the location counter is set to 0 at the start of the segment, but within limits you can use the ORG directive to change it, at the start of the segment or elsewhere in the segment.
Well, there are any number of hacks you can do but most don't
Lets not confuse things here, an executable file is loaded by the OS at 400000h (its hInstance) while a DLL also loaded into the running process has its load address (DLL hInstance) relocated by the OS loader unless its set to load at a specific address. The important factor here is that in a protected mode operating system each application is loaded into its own memory space with each running application (and DLLs) using the same range of addressing. (A hInstance returned by GetModuleHandle()) is the same 400000h as the next running application's hInstance. The difference is the OS provides the memory space for each application with the same addressing range.
Now you can use the terminology of calling section addresses RVAs but in terms of a running application (including its DLLs [ not system DLLs] ) you have an instant GP fault by addressing below 400000h so in application terms the load address is the effective reference for an OFFSET within the applications addressable space.
Quote from: hutch-- on April 07, 2011, 02:23:48 AM
Now you can use the terminology of calling section addresses RVAs but in terms of a running application (including its DLLs [ not system DLLs] ) you have an instant GP fault by addressing below 400000h so in application terms the load address is the effective reference for an OFFSET within the applications addressable space.
No it isn't, the value in the segment register is the reference for OFFSET, the load address is nothing more than the load address and has nothing at all to do with the OFFSET operator. You might as well say that short jumps, which are relative to EIP are referenced from the load address under that logic.
You are confusing absolute addressing that does not exist in ring3 user mode with addressing relative to the provided memory space from the operating system. Your only reference to segment registers under FLAT memory model is they are set to the same value.
With MASM the operator OFFSET is specifically a "distance" from a reference point, not an absolute address, you truly would have enjoyed the bad old days with Win3.? where you did not have hardware provided protected mode and had to rely on everyone else not making a mess of memory allocation. You are also confusing the current location in CODE with an ADDRESS that is an OFFSET, EIP is dynamic, OFFSET is STATIC and they refer to different targets.
Hi Hutch,
No, I'm not confusing anything, but any way you need to envision it so it works for you is fine by me.
Here's a good doc from Intel that describes the FLAT model, it might help to explain it to you
http://www.intel.com/design/intarch/papers/esc_ia_p.htm
I confess I never had a problem with PECOFF.DOC from Microsoft or the original definition of FLAT memory model in MASM 6.1. Ain't like anything has changed in the base concept since then.
RE the notion of OFFSET used in MASM and where it is referenced from,
Virtual Address (VA)
Same as RVA (see above), except that the base address of the image file is not subtracted. The address is called a "Virtual Address" because Windows NT creates a distinct virtual address space for each process, independent of physical memory. For almost all purposes, a virtual address should be considered just an address. A virtual address is not as predictable as an RVA, because the loader might not load the image at its preferred location.
::)
I give up. I'm out of this thread, its taken a turn that puts it beyond the limits of sanity.
I'd like to point out that it's the linker that does the heavy lifting of keeping track of the 'real' locations, so OFFSET can only ever be relative to some amorphous point in the source file (i.e. the start of the segment as zero). For instance, in this listing both offsets are still zero:
.386
.MODEL FLAT, stdcall
option casemap:none
00000000 .data
00000000 00000000 foo1 DWORD 0
00000000 .code
00000000 start:
00000000 B8 00000000 R mov eax, OFFSET foo1
00000005 B9 00000000 R mov ecx, OFFSET start
0000000A C3 ret
end start
You know this thingy about the OFFSET operator all comes down to "Introduction to Computer Science"
http://www.c-jump.com/CIS77/ASM/Instructions/lecture.html#I77_0180_offset_operator
QuoteThe OFFSET operator returns the offset of a memory location relative to the beginning of the segment to which the location belongs
The load address of the executable thing is just a pig, you can try to put lipstick on the pig but its still a pig.
(http://l.yimg.com/us.yimg.com/i/mesg/emoticons7/24.gif)
(http://img535.imageshack.us/img535/1861/piglipstick.png)
Quote from: donkey on April 07, 2011, 02:41:34 PM
You know this thingy about the OFFSET operator all comes down to "Introduction to Computer Science"
http://www.c-jump.com/CIS77/ASM/Instructions/lecture.html#I77_0180_offset_operator
QuoteThe OFFSET operator returns the offset of a memory location relative to the beginning of the segment to which the location belongs
Not to add fuel to the fire here, but without qualification that statement is not correct. The OFFSET operator returns the value of the location counter.
;===================================================================================
include \masm32\include\masm32rt.inc
;===================================================================================
printf MACRO format:REQ, args:VARARG
IFNB <args>
invoke crt_printf, cfm$(format), args
ELSE
invoke crt_printf, cfm$(format)
ENDIF
EXITM <>
ENDM
;===================================================================================
.data
data0 dd 0
data1 dd 0
org 10h
data2 dd 0
org 0
data3 dd 0
.code
;===================================================================================
start:
;===================================================================================
code0:
mov eax, OFFSET data0
mov eax, OFFSET data1
mov eax, OFFSET data2
mov eax, OFFSET data3
printf( "OFFSET data0 : %Xh\n", OFFSET data0 )
printf( "OFFSET data1 : %Xh\n", OFFSET data1 )
printf( "OFFSET data2 : %Xh\n", OFFSET data2 )
printf( "OFFSET data3 : %Xh\n\n", OFFSET data3 )
code1:
code2:
org 10h
code3:
org 0
code4:
org code1
code5:
mov eax, OFFSET code0
mov eax, OFFSET code1
mov eax, OFFSET code2
mov eax, OFFSET code3
mov eax, OFFSET code4
mov eax, OFFSET code5
printf( "OFFSET code0 : %Xh\n", OFFSET code0 )
printf( "OFFSET code1 : %Xh\n", OFFSET code1 )
printf( "OFFSET code2 : %Xh\n", OFFSET code2 )
printf( "OFFSET code3 : %Xh\n", OFFSET code3 )
printf( "OFFSET code4 : %Xh\n", OFFSET code4 )
printf( "OFFSET code5 : %Xh\n\n", OFFSET code5 )
inkey "Press any key to exit..."
exit
;===================================================================================
end start
From the listing:
00000000 B8 00000000 R mov eax, OFFSET data0
00000005 B8 00000004 R mov eax, OFFSET data1
0000000A B8 00000010 R mov eax, OFFSET data2
0000000F B8 00000000 R mov eax, OFFSET data3
00000060 B8 00000000 R mov eax, OFFSET code0
00000065 B8 00000060 R mov eax, OFFSET code1
0000006A B8 00000060 R mov eax, OFFSET code2
0000006F B8 00000010 R mov eax, OFFSET code3
00000074 B8 00000000 R mov eax, OFFSET code4
00000079 B8 00000060 R mov eax, OFFSET code5
At run time:
OFFSET data0 : 403000h
OFFSET data1 : 403004h
OFFSET data2 : 403010h
OFFSET data3 : 403000h
OFFSET code0 : 401000h
OFFSET code1 : 401060h
OFFSET code2 : 401060h
OFFSET code3 : 401010h
OFFSET code4 : 401000h
OFFSET code5 : 401060h
Michael,
You are absolutely correct, and I agree that the ORG statement will "screw" with the FLAT model at source level, however, it only adjusts how the assembler/linker calculates the OFFSETs, it does not change the fact that in a FLAT model the OFFSET is relative to the segment. This is like saying that the number 400 is different than 400 because the formula used to get it was 2*200 instead of 4*100.
it is relative to the segment
register at runtime
however, at assemble time, it is relative to the segment or group that contains the reference
i remember, in the old days, i sometimes had to specify it as OFFSET DGROUP:SomeLabel
because the segment base was not the same as the segment register - it pointed to the base of DGROUP
to save some typing, i did this...
ODG EQU OFFSET DGROUP
OFS EQU OFFSET
;
;
;
mov ax,DGROUP
mov ds,ax
;
;
mov si,ODG:SomeLabel
;or
mov bx,OFS SomeOtherLabel ;when applicable
correction:
Quoteit is relative to the segment register at runtime
hopefully, the segment referenced in the assembly source is in the segment selector at runtime :bg
I was considering this some time ago but thought I'd look foolish if I didnt know how this worked :lol....
invoke Callback, oexPROC
invoke Callback, OFFSET oexPROC
invoke Callback, ADDR oexPROC
All return the correct proc address in 32 bit code right?
ADDR destroys eax
EDIT: Nobody's saying anything.... Now I'm starting to feel foolish :lol
Vector dd oexPROC
;or
mov eax,oexPROC
;or
invoke Callback,oexPROC
the assembler knows it is a code address because it is a ":" or PROC label
so, OFFSET is implied, or more accurately, OFFSET .text (or whatever the code segment is named)
Ah kk so long as the assembler handles it and I havent missed something important that's what matters :bg.... Just seemed odd to have 3 possible reserved word methods for effectively 1 thing....
:green
Quote from: oex on April 07, 2011, 07:00:06 PM
Only when addressing a LOCAL variable - which is not the case for a call.
Sorry jj can you try that explanation again.... Which one(s) of the 3 were wrong.... What is LOCAL scope of a PROC label? Are not all PROCs GLOBAL scope?....
None is wrong:
invoke Callback, oexPROC
invoke Callback, OFFSET oexPROC
invoke Callback, ADDR oexPROC
... but "ADDR destroys eax" happens only when ADDR refers to a local variable. The invoke macro uses internally OFFSET GlobalVar if you write ADDR GlobalVar. For a local var, in contrast, it uses
lea eax, [ebp+n]
push eax
Tip: Write a little proggie, and look at it through Olly.
Quote from: jj2007 on April 07, 2011, 09:40:15 PM
Tip: Write a little proggie, and look at it through Olly.
:lol yep I'm just being lazy.... I dont use Olly but I had tested partially with my own code I just wondered if I'd missed something :lol
ty for the infos it clarifies it for me :bg
Trust a group of assembler programmers to take a basic question and turn it into a multi-page argument ::)
I think most of this belongs in The Colosseum? And let's hope you haven't scared guro from ever asking a question again :bdg
Next question: what does mov REALLY do? :bg
Hey Tedd, i didn't scare at all ... it was a pleasure to read all those comments and learn from all! If we would like to add another Murphy's law (if not already exist) that simple questions have complicated answers...but anyway!
Now, I study the various links to documents that our friends provided to their comments in this thread, but to make a premature statement about the ongoing argue, I think we talk about the same thing but from different perspectives (linker abstraction, loader abstraction, in-file offsets, in-memory offsets, assembler abstraction, etc) which is very instructive!
Well, I ever believed that offset is a pointer to some place, but things get really strange when you change your way of programming. To me, offset can deal with fowards referenced labels, but today I can't compile a single hello world. :red
Quote
Syntax: OFFSET expression
The <expression> is any label, variable, segment, or other direct memory operand.
include \masm32\include\masm32rt.inc
.code
start:
lea edx,TheText
invoke MessageBox, 0, edx, edx, MB_OK ;works ok
push MB_OK ;works ok
push offset TheText
push offset TheText
push 0
call MessageBox
invoke MessageBox, 0, offset TheText, offset TheTitle, MB_OK ;dont work
invoke ExitProcess, 0
TheText:
db "The text", 0
TheTitle:
db "The title", 0
end start
If you put that variables betwen .code and start: , works fine.
The absense of light make me blind, but much light make me blind too.
Strangely enough, this works with JWasm, but ml (6.14...9.0) throws indeed error A2006:undefined symbol : TheTitle
Quote from: jj2007 on May 18, 2011, 02:38:13 AM
Strangely enough, this works with JWasm, but ml (6.14...9.0) throw indeed error A2006:undefined symbol : TheTitle
Look into MASM's INVOKE as in MACRO of assembler-level. MACROses expanded at first pass, so any symbol should be defined before using of MACRO, in this case - INVOKE.
Thanks for answering, now this make sense to me.