Hi Guys
I have a question that is bugging me and may be useful for others learning MASM.
After scouring the net looking for more detail on how the .data? I am still a bit puzzled of how it works internally.
After reading numerous posts (including http://www.masm32.com/board/index.php?topic=6973.0), I still can't find details of what happens internally.
Background
From Kip irvines book
The .DATA? directive declares uninitialized data. When defining a large block of uninitialized data, the .DATA? directive reduces the size of a compiled program. For example, the followingcode is declared efficiently:
.data?
smallArray DWORD 10 DUP(?) ; 40 bytes .data?
bigArray DWORD 5000 DUP(?) ; 20,000 bytes, not initialized
The following code, on the other hand, produces a compiled program 20,000 bytes larger:
.data
smallArray DWORD 10 DUP(0) ; 40 bytes
bigArray DWORD 5000 DUP(?) ; 20,000 bytes
What exactly is the .data? directive doing under the hood in the above example to make the program 20k smaller.
The issue
Being a sucker for details ( :dazzled:) I am very curious of how this actually works.
As I understand all the data in the .data section is actually allocated within the .exe image itself. However the data in the .data? section is not in the image itself but magically gets allocated when the application starts up.
So the question is what happens during the magical allocation time when the application starts up. After reading around I am still puzzled but this is what I have deduced (correctly or incorrectly)
1. User runs the exe by double clicking
2. The os tells the OS image/exe loader to load the exe into a virtual address space
3. The data in the .data section are loaded automatically into the images virtual space.
4. What happens next ......The data in the .data? section ??????????
So the next thing I did was to create a small sample exe and examine the output of dumpbin.exe /headers of the following code
INCLUDE Irvine32.inc
.data?
smallArray DWORD 10 DUP(?) ; 40 bytes
bigArray DWORD 5000 DUP(?) ; 20,000 bytes
.code
main PROC
exit ; quit
main ENDP
END main
and do the same thing again but this time change .data? to just .data and compare the outputs. Please forgive me if I am a bit ignorant as I am no a pro like you guys but the only difference I could see was in the optional header value which looks like this
when using .dataOPTIONAL HEADER VALUES
.....
1200 size of code
6800 size of initialized data 0 size of uninitialized data
.....
when using .data?OPTIONAL HEADER VALUES
.....
1200 size of code
800 size of initialized data 0 size of uninitialized data
And sure enough if you do 6800h-800h you get 6000h = 24576 bytes (have I understood this correctly) The question
The questions I have are:
1. In win32 or win64 how does the the 20KB of memory get allocated for the bigArray DWORD array when using .data? directive
2. WHERE Is this memory for the bigArray DWORD array allocated (is it in the stack, heap or within the actual loaded exe itself?)
3. In the bumpbin output about what is
size of uninitialized data line
You answers are very much appreciated:
P.s this is my first post on MASM32 forum and wish to be respectful of your community so if I have not given enough details or have asked an inappropriate question for this forum please let me know and I will correct my question/behaviour....
in the EXE header, there is a value that tells the EXE loader how much extra memory to allocate for the .DATA? section at load time
from what i understand, this memory is always filled with 0's
while allocating memory in the uninitialized data section does make the EXE file smaller, it still uses memory at run-time :P
that is ok for data that you want to use over the entire lifespan of program execution
if you have a large temporary allocation requirement, it is better to use one of the functions like HeapAlloc
GlobalAlloc, VirtualAlloc, LocalAlloc are similar alternatives
for smaller temporary allocations, you may use stack space
Quote from: Tedros on August 22, 2011, 02:34:02 PM
From Kip irvines book
The .DATA? directive declares uninitialized data. When defining a large block of uninitialized data, the .DATA? directive reduces the size of a compiled program. For example, the followingcode is declared efficiently:
.data
bigArray DWORD 5000 DUP(?) ; 20,000 bytes, not initialized
Have you verified that statement...?
:bg
he has his cases reversed
it should read...
.data?
smallArray DWORD 10 DUP(?) ; 40 bytes .data?
bigArray DWORD 5000 DUP(?) ; 20,000 bytes, not initialized
The following code, on the other hand, produces a compiled program 20,000 bytes larger:
.data
smallArray DWORD 10 DUP(0) ; 40 bytes
bigArray DWORD 5000 DUP(0) ; 20,000 bytes
notice the ?'s and 0's
you cannot assign initial values in the .DATA? section
That was not kip that was actually me
Apologies I have corrected the typo!
How embarassing :red
Quote from: jj2007 on August 22, 2011, 03:07:35 PM
Quote from: Tedros on August 22, 2011, 02:34:02 PM
From Kip irvines book
The .DATA? directive declares uninitialized data. When defining a large block of uninitialized data, the .DATA? directive reduces the size of a compiled program. For example, the followingcode is declared efficiently:
.data
bigArray DWORD 5000 DUP(?) ; 20,000 bytes, not initialized
Have you verified that statement...?
Don't worry :bg
But it made me discover a cute new function...:
include \masm32\include\masm32rt.inc
.data
smallArray DWORD 50 DUP(0) ; 200 bytes .data?
.data ; now add a question mark and try again...
bigArray DWORD 5000 DUP(?) ; 20,000 bytes, not initialized
.code
start: invoke GetModuleFileName, 0, offset smallArray, 200
invoke GetCompressedFileSize, offset smallArray, 0
MsgBox 0, str$(eax), "The executable size:", MB_OK
exit
end start
Thanks guys .
2 questions remained unanswered
1. Is the memory for the uninitalised data (.data?) still within the exe image itself when the program is loaded by the os loader (as opposed to it being in the stack or heap)
2. In the dump bin output I take it the line that says 0 size of uninitialized data does not refer to the data in the .data? section.
to your first question, the space allocated for uninitialised data is within the running thread's address range. An executable is loaded by the OS into its own address space and both the initialised and uninitialised data sections are within that range. The difference is that the uninitialised data section does not need to take up space in the disk image of the executable, it is provided by the OS at run time.
here are a couple items that you will enjoy playing with :bg
first, the PE/COFF specification, v8
it tells you how the EXE header is put together
http://www.masm32.com/board/index.php?topic=13135.0
second, Wayne's PeView program is available, here...
http://www.magma.ca/~wjr/
there are a number of programs that can do this - some allow editing
but Wayne's is small, easy to use, and trusted :P
here's another fun toy
not related to the DATA sections, but it is related to how EXE's are put together
it allows you to view resources that are included in EXE files
http://angusj.com/resourcehacker/
and, of course, Mark Russinovich's SysInternals Suite has a variety of tools that are excellent
http://technet.microsoft.com/en-us/sysinternals/bb842062
Hi,
You could generate a listing with the ML /Fl command line
switch. and a map file with the /Fm switch. So:
ml /Fl /Fm example.asm
That should show you where things end up.
Regards
Steve N.
you guys are awesome :clap: :clap:
Just to wrap up this questions for others learning MASM
I have read the Microsoft Portable Executable and Common Object File Format Specification v9 and extracted the following
HEADERS
SizeOfInitializedData
The size of the initialized data section,
or the sum of all such sections if there
are multiple data sections.
SizeOfUninitializedData
The size of the uninitialized data section (BSS),
or the sum of all such sections if there are multiple BSS sections.
For those who don't know what BSS means it is Block Started by Symbol, its a historical naming thing thats kind of stuck around but basically in many compilers and linkers the bss is used as the name of the data segment containing uninitialized variables. It is often referred to as the "bss section" or "bss segment".
And from Matt Pietrek PE/COFF article (http://msdn.microsoft.com/en-us/library/ms809762.aspx) I got this :
The .bss section is where any uninitialized static and global variables are stored.
The linker combines all the .bss sections in the OBJ and LIB files into one .bss section in the EXE.
In the section table, the RawDataOffset field for the .bss section is set to 0, indicating that this section doesn't take up any space in the file.
TLINK doesn't emit this section. Instead it extends the virtual size of the DATA section.
in my example above I was looking at the headers and not the section table as Matt has indicated.
For some reason I thought the header and section table were the same but I guess not
Thank you all for the great pointers and patience :bg
Is DATA? section guaranteed to be initialized to 0?
[Going to read PE, MASM etc docs and Google more now....]
--
HerbM
for 32-bit code, yes - although, i can't point to a document that says so - JJ found it once
for 16-bit code, no - whatever was left in memory
Quote from: dedndave on January 07, 2012, 03:55:34 AM
for 32-bit code, yes - although, i can't point to a document that says so - JJ found it once
Found this at http://www.tenouk.com/Bufferoverflowc/Bufferoverflow1c.html:
BSS stands for 'Block Started by Symbol'. Global and statically allocated data that initialized to zero by default are kept in what is called the BSS area of the process.
Unfortunately it's for Linux :bg
Microsoft seems very shy to reveal its secrets, but I also remember that I once saw a MSDN or so reference. Probably it has to do with the fact that VirtualAlloc has no option for non-zeroed memory.
:bg
Do yourself a favour with uninitialised memory, DON'T assume it will be zero filled if its not in the spec for the OS, Compiler, Assembler or Linker or you could be unpleasantly surprised. If you need zero filled uninitialised data, zero fill it yourself. If its enough to worry about, use dynamic memory, its easier to deal with and you can deallocate it when you are finished with it.
Quote from: hutch-- on January 07, 2012, 09:28:00 AM
Do yourself a favour with uninitialised memory, DON'T assume it will be zero filled if its not in the spec for the OS, Compiler, Assembler or Linker
None of them is involved, it's the OS loader that takes care of the bss segment. It is really difficult to find documentation, but for Win32 you can assume that .data? gets filled with VirtualAlloc. Zeros, not garbage.