Hi,
I am been reading Kip Irvine's book. In page 112 Kip says the OFFSET operator returns the offset of a data label from the beginning of the data segment.
Is this correct, or does OFFSET return the physical address of the data label?
For example, I ran the code below:
.386
OPTION CASEMAP:NONE
include \masm32\include\masm32rt.inc
.DATA
msgBoxCaption DB "Caption", 0
msgBoxText DB ?
var1 DD 130
.CODE
start:
MOV EAX, OFFSET msgBoxCaption
INVOKE dwtoa, EAX, ADDR msgBoxText
INVOKE MessageBox, NULL, ADDR msgBoxText, ADDR msgBoxCaption, MB_OK
MOV EAX, OFFSET msgBoxText
INVOKE dwtoa, EAX, ADDR msgBoxText
INVOKE MessageBox, NULL, ADDR msgBoxText, ADDR msgBoxCaption, MB_OK
MOV EAX, OFFSET var1
INVOKE dwtoa, EAX, ADDR msgBoxText
INVOKE MessageBox, NULL, ADDR msgBoxText, ADDR msgBoxCaption, MB_OK
invoke ExitProcess, NULL
END start
Every time the program is run, the three message boxes show the following values on my machine: 4206592, 4206600 and 4206601.
These values don't look like offsets from the beginning of the data segment (which would be 0, 8 and 9). They look more like physical addresses.
On the other hand how can the values be exactly the same regardless of when I run the program? Surely the .exe is not getting loaded to the exact same physical memory location every time?
PS (added later): Just realised 4206592, 4206600 and 4206601 stand for 403000h, 403008h and 403009h respectively.
If you want to use dwtoa, msgBoxText cannot be a byte
.DATA
msgBoxCaption DB "Caption", 0
msgBoxText DB 16 dup (?)
var1 DD 130
There was an excellent, but, contentious thread just recently on this subject: INVOKE and ADDR directives (http://www.masm32.com/board/index.php?topic=16388.0).
MOV EAX, OFFSET msgBoxCaption = lea eax, msgBoxCaption
command lea need less tacks of processor
MOV OFFSET is the right choice, in this case
the reason the address appears to be higher than expected is that win32 programs use the flat memory model
simply stated, that means the code, data, and stack segments are combined
the operating system also has code and data in the 4 gB of addressable space
Quote from: bf2 on May 10, 2011, 04:30:28 PM
Every time the program is run, the three message boxes show the following values on my machine: 4206592, 4206600 and 4206601.
These values don't look like offsets from the beginning of the data segment (which would be 0, 8 and 9). They look more like physical addresses.
On the other hand how can the values be exactly the same regardless of when I run the program? Surely the .exe is not getting loaded to the exact same physical memory location every time?
The data segment is not limited to the data you define. Under Windows each process has its own virtual address space, and Windows has complete control of that address space. From the POV of your application, all addresses are offsets within that address space. Unlike DOS applications, with few exceptions Windows applications have nothing to do with physical addresses.
Here is a minimalistic snippet that assembles at 1,536 bytes with ml.exe and polink.exe:
include \masm32\include\masm32rt.inc
.data
AppName db "Masm32:", 0
TheText db "Hello World", 0
.code
start:
invoke MessageBox, 0, offset TheText, addr AppName, MB_OK
invoke ExitProcess, 0
end start
The same through the eyes of OllyDbg:
QuoteAddress Hex dump Command Comments
<ModuleEntryPoint> /. 6A 00 push 0 ; /Type = MB_OK|MB_DEFBUTTON1|MB_APPLMODAL
00401002 |. 68 00204000 push offset 00402000 ; |Caption = "Masm32:"
00401007 |. 68 08204000 push offset 00402008 ; |Text = "Hello World"
0040100C |. 6A 00 push 0 ; |hOwner = NULL
0040100E |. E8 07000000 call <jmp.&user32.MessageBoxA> ; \USER32.MessageBoxA
00401013 |. 6A 00 push 0 ; /ExitCode = 0
00401015 \. E8 06000000 call <jmp.&kernel32.ExitProcess> ; \KERNEL32.ExitProcess
0040101A $- FF25 60204000 jmp [<&user32.MessageBoxA>]
00401020 $- FF25 68204000 jmp [<&kernel32.ExitProcess>]
00401026 00 db 00
00401027 00 db 00
Same for data:
QuoteAddress Hex dump ASCII
00402000 4D 61 73 6D|33 32 3A 00|48 65 6C 6C|6F 20 57 6F| Masm32:.Hello Wo
00402010 72 6C 64 00|50 20 00 00|00 00 00 00|00 00 00 00| rld.P ..........
You can see that ModuleEntryPoint is at 00401000h, while the data segment starts 1000h bytes later at 00402000h. Under Windows, all executables believe that they start at 00401000h - the OS translates a physical location on your memory chip into a virtual address that the CPU can interpret as if it was one contiguous space starting always at 00401000h.
Only DLLs are loaded to different addresses, but that's another story.
Quote from: bf2 on May 10, 2011, 04:30:28 PM
I am been reading Kip Irvine's book. In page 112 Kip says the OFFSET operator returns the offset of a data label from the beginning of the data segment.
The code below is not good to follow, bad habits, but works.
.386
OPTION CASEMAP:NONE
include \masm32\include\masm32rt.inc
.DATA
msgBoxText DB 8+1+1 dup (?)
.CODE
var1 DD 130 ;a data label in code section
code_sec DB "this code section starts at:", 0
data_sec DB "this data section starts at:", 0
start:
MOV EAX, OFFSET msgBoxText
INVOKE dw2hex, EAX, addr msgBoxText
mov byte ptr [msgBoxText+8],"h"
INVOKE MessageBox, NULL, ADDR msgBoxText, ADDR data_sec, MB_OK
MOV EAX, OFFSET var1
INVOKE dw2hex, EAX, addr msgBoxText
mov byte ptr [msgBoxText+8],"h"
INVOKE MessageBox, NULL, ADDR msgBoxText, ADDR code_sec, MB_OK
invoke ExitProcess, NULL
END start
Many thanks for all your replies. Makes much sense.
Out of curiosity, how would someone normally know things like where Windows loads executables (i.e. the address 00401000h). Are they described in any book in that sort of detail, or do people normally learn these just as I now have - i.e. from other programmers?
Quote from: jj2007 on May 10, 2011, 08:24:59 PM
You can see that ModuleEntryPoint is at 00401000h, while the data segment starts 1000h bytes later at 00402000h. Under Windows, all executables believe that they start at 00401000h - the OS translates a physical location on your memory chip into a virtual address that the CPU can interpret as if it was one contiguous space starting always at 00401000h.
That used to be true but isn't anymore as of Windows Vista. It all depends on whether the application is ASLR enabled or not. Here's the entry points from 4 consecutive runs of a MASM 10.x program
Run#1:Entry point = 0x00A01030 Data section = 0x00A09470
Run#2:Entry point = 0x01121030 Data section = 0x01129470
Run#3:Entry point = 0x01131030 Data section = 0x01139470
Run#4:Entry point = 0x00DB1030 Data section = 0x00DB9470
Using any assumed hard coded value is no longer viable in Windows.
bf2,
If you want to get an accurate idea of what the virtual and physical memory of any process is, go over to the SysInternals site (TechNet), and download: VMMap (http://technet.microsoft.com/en-us/sysinternals/dd535533).
...The main address for the SysInternals site is here: Windows SysInternals (http://technet.microsoft.com/en-us/sysinternals/default),...
also, i quite often place code before the "main" module
in such cases, the code may be loaded at 00401000h, but it is certainly not the entry point :P
Quote from: donkey on May 11, 2011, 02:53:35 PM
Using any assumed hard coded value is no longer viable in Windows.
That's correct; although, it has never been a useful approach, except for virus writers. And they will search the randomised address space for known patterns, such as a push MB_OK. I love Microsoft security :bg
So, since we are in the Campus: Never assume your data starts at 402000h. And stay away from badly documented arbitrary pointers, too :naughty:
Quote from: jj2007
So, since we are in the Campus: Never assume your data starts at 402000h. And stay away from badly documented arbitrary pointers, too :naughty:
Correct. In my original code at the beginning of this thread the data section starts at 00403000h.
So what's the significance of the entry point address 00401000h (ignoring Vista for the time being)? Does this mean the lower 4MB of the 4Gb address space is reserved by the OS for some reason?
Quote from: jj2007 on May 11, 2011, 04:54:46 PMAnd they will search the randomised address space for known patterns, such as a push MB_OK.
a very unique pattern
push 0 :lol
Quote from: bf2 on May 11, 2011, 05:01:26 PM
Quote from: jj2007
So, since we are in the Campus: Never assume your data starts at 402000h. And stay away from badly documented arbitrary pointers, too :naughty:
Correct. In my original code at the beginning of this thread the data section starts at 00403000h.
So what's the significance of the entry point address 00401000h (ignoring Vista for the time being)? Does this mean the lower 4MB of the 4Gb address space is reserved by the OS for some reason?
Windows loads your non-ASLR PE at 00400000h (your module handle BTW), at that address you will find your image headers and various linking and loading information. The next page 00401000h is generally used for the code section and the first page after that is used for data. If your code section is less than 4096 bytes (the system page size) it will be at 00402000h, if not it will be at the next page boundary rounded up from the last code address. The lower memory is generally reserved for things like the process heap, the stack and other essential items.
I have reread your subject about offset, and one thing that give me so much headcache is what is data and what is code? I'm saying this because that 'data label" quoted before. Years ago, before windows exists, I have tryed write a text in my language with sense that in true is a program,a double sense, after stay a bit mad I get that, but is so much difficult. To me, an offset is only a pointer to some place.
.386
OPTION CASEMAP:NONE
include \masm32\include\masm32rt.inc
.DATA
msgBoxText DB 8+1+1 dup (?)
.CODE
var1 DD 130
code_sec DB "this code section starts at:", 0
data_sec DB "this data section starts at:", 0
start1 DB "this start at offset:",0
nopping db 4096 dup (90h)
start:
MOV EAX, OFFSET start
INVOKE dw2hex, EAX, addr msgBoxText
mov byte ptr [msgBoxText+8],"h"
INVOKE MessageBox, NULL, ADDR msgBoxText, ADDR start1, MB_OK
invoke ExitProcess, NULL
END start
Code and data are just numbers in memory. The difference is how the memory is protected, code is PAGE_EXECUTE and data is generally PAGE_READWRITE, while constants are generally PAGE_READONLY. Each section is aligned to a page boundary and the entire section is given the protection attribute depending on the type of data. You can change (within limits) the attributes of a section using VirtualProtect.
Memory Protection Constants (http://msdn.microsoft.com/en-us/library/aa366786%28v=vs.85%29.aspx)