
MASM32 SDK Description, downloads and other helpful links New Forum Link
masmforum WebSite

The OFFSET operator

Started by bf2, May 10, 2011, 04:30:28 PM

Previous topic - Next topic


I am been reading Kip Irvine's book. In page 112 Kip says the OFFSET operator returns the offset of a data label from the beginning of the data segment.

Is this correct, or does OFFSET return the physical address of the data label?

For example, I ran the code below:

include \masm32\include\

msgBoxCaption DB "Caption", 0
msgBoxText DB ?
var1 DD 130

MOV EAX, OFFSET msgBoxCaption
INVOKE dwtoa, EAX, ADDR msgBoxText
INVOKE MessageBox, NULL, ADDR msgBoxText, ADDR msgBoxCaption, MB_OK

INVOKE dwtoa, EAX, ADDR msgBoxText
INVOKE MessageBox, NULL, ADDR msgBoxText, ADDR msgBoxCaption, MB_OK

INVOKE dwtoa, EAX, ADDR msgBoxText
INVOKE MessageBox, NULL, ADDR msgBoxText, ADDR msgBoxCaption, MB_OK

invoke ExitProcess, NULL

END start

Every time the program is run, the three message boxes show the following values on my machine: 4206592, 4206600 and 4206601.

These values don't look like offsets from the beginning of the data segment (which would be 0, 8 and 9). They look more like physical addresses.

On the other hand how can the values be exactly the same regardless of when I run the program? Surely the .exe is not getting loaded to the exact same physical memory location every time?

PS (added later): Just realised 4206592, 4206600 and 4206601 stand for 403000h, 403008h and 403009h respectively.


If you want to use dwtoa, msgBoxText cannot be a byte

msgBoxCaption   DB "Caption", 0
msgBoxText       DB 16 dup (?)
var1          DD 130


There was an excellent, but, contentious thread just recently on this subject: INVOKE and ADDR directives.


MOV EAX, OFFSET msgBoxCaption = lea eax, msgBoxCaption

command lea need less tacks of processor


MOV OFFSET is the right choice, in this case

the reason the address appears to be higher than expected is that win32 programs use the flat memory model
simply stated, that means the code, data, and stack segments are combined
the operating system also has code and data in the 4 gB of addressable space


Quote from: bf2 on May 10, 2011, 04:30:28 PM
Every time the program is run, the three message boxes show the following values on my machine: 4206592, 4206600 and 4206601.

These values don't look like offsets from the beginning of the data segment (which would be 0, 8 and 9). They look more like physical addresses.

On the other hand how can the values be exactly the same regardless of when I run the program? Surely the .exe is not getting loaded to the exact same physical memory location every time?

The data segment is not limited to the data you define. Under Windows each process has its own virtual address space, and Windows has complete control of that address space. From the POV of your application, all addresses are offsets within that address space. Unlike DOS applications, with few exceptions Windows applications have nothing to do with physical addresses.
eschew obfuscation


Here is a minimalistic snippet that assembles at 1,536 bytes with ml.exe and polink.exe:
include \masm32\include\

AppName db "Masm32:", 0
TheText db "Hello World", 0

invoke MessageBox, 0, offset TheText, addr AppName, MB_OK
invoke ExitProcess, 0
end start

The same through the eyes of OllyDbg:

QuoteAddress             Hex dump            Command                            Comments
<ModuleEntryPoint>  /.  6A 00           push 0                             ; /Type = MB_OK|MB_DEFBUTTON1|MB_APPLMODAL
00401002            |.  68 00204000     push offset 00402000               ; |Caption = "Masm32:"
00401007            |.  68 08204000     push offset 00402008               ; |Text = "Hello World"
0040100C            |.  6A 00           push 0                             ; |hOwner = NULL
0040100E            |.  E8 07000000     call <jmp.&user32.MessageBoxA>     ; \USER32.MessageBoxA
00401013            |.  6A 00           push 0                             ; /ExitCode = 0
00401015            \.  E8 06000000     call <jmp.&kernel32.ExitProcess>   ; \KERNEL32.ExitProcess
0040101A             $- FF25 60204000   jmp [<&user32.MessageBoxA>]
00401020             $- FF25 68204000   jmp [<&kernel32.ExitProcess>]
00401026                00              db 00
00401027                00              db 00
Same for data:
QuoteAddress   Hex dump                                         ASCII
00402000  4D 61 73 6D|33 32 3A 00|48 65 6C 6C|6F 20 57 6F| Masm32:.Hello Wo
00402010  72 6C 64 00|50 20 00 00|00 00 00 00|00 00 00 00| rld.P ..........

You can see that ModuleEntryPoint is at 00401000h, while the data segment starts 1000h bytes later at 00402000h. Under Windows, all executables believe that they start at 00401000h - the OS translates a physical location on your memory chip into a virtual address that the CPU can interpret as if it was one contiguous space starting always at 00401000h.

Only DLLs are loaded to different addresses, but that's another story.


Quote from: bf2 on May 10, 2011, 04:30:28 PM
I am been reading Kip Irvine's book. In page 112 Kip says the OFFSET operator returns the offset of a data label from the beginning of the data segment.
The code below is not good to follow, bad habits, but works.

include \masm32\include\

msgBoxText       DB 8+1+1 dup (?)

var1 DD 130    ;a data label in code section
code_sec DB "this code section starts at:", 0
data_sec DB "this data section starts at:", 0

INVOKE dw2hex, EAX, addr msgBoxText
mov byte ptr [msgBoxText+8],"h"
INVOKE MessageBox, NULL, ADDR msgBoxText, ADDR data_sec, MB_OK
INVOKE dw2hex, EAX, addr msgBoxText
mov byte ptr [msgBoxText+8],"h"
INVOKE MessageBox, NULL, ADDR msgBoxText, ADDR code_sec, MB_OK
invoke ExitProcess, NULL
END start


Many thanks for all your replies. Makes much sense.

Out of curiosity, how would someone normally know things like where Windows loads executables (i.e. the address 00401000h). Are they described in any book in that sort of detail, or do people normally learn these just as I now have  -  i.e. from other programmers?


Quote from: jj2007 on May 10, 2011, 08:24:59 PM
You can see that ModuleEntryPoint is at 00401000h, while the data segment starts 1000h bytes later at 00402000h. Under Windows, all executables believe that they start at 00401000h - the OS translates a physical location on your memory chip into a virtual address that the CPU can interpret as if it was one contiguous space starting always at 00401000h.

That used to be true but isn't anymore as of Windows Vista. It all depends on whether the application is ASLR enabled or not. Here's the entry points from 4 consecutive runs of a MASM 10.x program

Run#1:Entry point = 0x00A01030 Data section = 0x00A09470
Run#2:Entry point = 0x01121030 Data section = 0x01129470
Run#3:Entry point = 0x01131030 Data section = 0x01139470
Run#4:Entry point = 0x00DB1030 Data section = 0x00DB9470

Using any assumed hard coded value is no longer viable in Windows.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable


If you want to get an accurate idea of what the virtual and physical memory of any process is, go over to the SysInternals site (TechNet), and download: VMMap.
...The main address for the SysInternals site is here: Windows SysInternals,...


also, i quite often place code before the "main" module
in such cases, the code may be loaded at 00401000h, but it is certainly not the entry point   :P


Quote from: donkey on May 11, 2011, 02:53:35 PM
Using any assumed hard coded value is no longer viable in Windows.

That's correct; although, it has never been a useful approach, except for virus writers. And they will search the randomised address space for known patterns, such as a push MB_OK. I love Microsoft security :bg

So, since we are in the Campus: Never assume your data starts at 402000h. And stay away from badly documented arbitrary pointers, too :naughty:


Quote from: jj2007
So, since we are in the Campus: Never assume your data starts at 402000h. And stay away from badly documented arbitrary pointers, too :naughty:

Correct. In my original code at the beginning of this thread the data section starts at 00403000h.

So what's the significance of the entry point address 00401000h (ignoring Vista for the time being)? Does this mean the lower 4MB of the 4Gb address space is reserved by the OS for some reason?


Quote from: jj2007 on May 11, 2011, 04:54:46 PMAnd they will search the randomised address space for known patterns, such as a push MB_OK.
a very unique pattern push 0
FPU in a trice: SmplMath
It's that simple!