Print Page - How does .data? directive work under the hood

Title: How does .data? directive work under the hood
Post by: Tedros on August 22, 2011, 02:34:02 PM

Hi Guys

I have a question that is bugging me and may be useful for others learning MASM.

After scouring the net looking for more detail on how the .data? I am still a bit puzzled of how it works internally.
After reading numerous posts (including http://www.masm32.com/board/index.php?topic=6973.0), I still can't find details of what happens internally.

Background

From Kip irvines book
The .DATA? directive declares uninitialized data. When defining a large block of uninitialized data, the .DATA? directive reduces the size of a compiled program. For example, the followingcode is declared efficiently:

Code Select

 .data?

    smallArray DWORD 10 DUP(?) ; 40 bytes .data?

    bigArray DWORD 5000 DUP(?) ; 20,000 bytes, not initialized

The following code, on the other hand, produces a compiled program 20,000 bytes larger:

Code Select

 .data

    smallArray DWORD 10 DUP(0) ; 40 bytes

    bigArray DWORD 5000 DUP(?) ; 20,000 bytes

What exactly is the .data? directive doing under the hood in the above example to make the program 20k smaller.

The issue

Being a sucker for details ( :dazzled:) I am very curious of how this actually works.

As I understand all the data in the .data section is actually allocated within the .exe image itself. However the data in the .data? section is not in the image itself but magically gets allocated when the application starts up.

So the question is what happens during the magical allocation time when the application starts up. After reading around I am still puzzled but this is what I have deduced (correctly or incorrectly)

1. User runs the exe by double clicking
2. The os tells the OS image/exe loader to load the exe into a virtual address space
3. The data in the .data section are loaded automatically into the images virtual space.
4. What happens next ......The data in the .data? section ??????????

So the next thing I did was to create a small sample exe and examine the output of dumpbin.exe /headers of the following code

Code Select

INCLUDE Irvine32.inc

.data?
smallArray DWORD 10 DUP(?) ; 40 bytes
bigArray DWORD 5000 DUP(?) ; 20,000 bytes

.code
main PROC
	
	exit ; quit
main ENDP

END main

and do the same thing again but this time change .data? to just .data and compare the outputs. Please forgive me if I am a bit ignorant as I am no a pro like you guys but the only difference I could see was in the optional header value which looks like this

when using .data

OPTIONAL HEADER VALUES
.....
1200 size of code
6800 size of initialized data
0 size of uninitialized data
.....

when using .data?

OPTIONAL HEADER VALUES
.....
1200 size of code
800 size of initialized data
0 size of uninitialized data

And sure enough if you do 6800h-800h you get 6000h = 24576 bytes (have I understood this correctly)

The question

The questions I have are:

1. In win32 or win64 how does the the 20KB of memory get allocated for the bigArray DWORD array when using .data? directive
2. WHERE Is this memory for the bigArray DWORD array allocated (is it in the stack, heap or within the actual loaded exe itself?)
3. In the bumpbin output about what is size of uninitialized data line

You answers are very much appreciated:

P.s this is my first post on MASM32 forum and wish to be respectful of your community so if I have not given enough details or have asked an inappropriate question for this forum please let me know and I will correct my question/behaviour....

Title: Re: How does .data? directive work under the hood
Post by: dedndave on August 22, 2011, 03:02:42 PM

in the EXE header, there is a value that tells the EXE loader how much extra memory to allocate for the .DATA? section at load time
from what i understand, this memory is always filled with 0's

while allocating memory in the uninitialized data section does make the EXE file smaller, it still uses memory at run-time :P
that is ok for data that you want to use over the entire lifespan of program execution

if you have a large temporary allocation requirement, it is better to use one of the functions like HeapAlloc
GlobalAlloc, VirtualAlloc, LocalAlloc are similar alternatives

for smaller temporary allocations, you may use stack space

Title: Re: How does .data? directive work under the hood
Post by: jj2007 on August 22, 2011, 03:07:35 PM

Quote from: Tedros on August 22, 2011, 02:34:02 PM

From Kip irvines book
The .DATA? directive declares uninitialized data. When defining a large block of uninitialized data, the .DATA? directive reduces the size of a compiled program. For example, the followingcode is declared efficiently:

Code Select Expand
.data bigArray DWORD 5000 DUP(?) ; 20,000 bytes, not initialized

Have you verified that statement...?

Title: Re: How does .data? directive work under the hood
Post by: dedndave on August 22, 2011, 03:10:56 PM

:bg
he has his cases reversed
it should read...

Code Select

.data?

    smallArray DWORD 10 DUP(?) ; 40 bytes .data?
    bigArray DWORD 5000 DUP(?) ; 20,000 bytes, not initialized

The following code, on the other hand, produces a compiled program 20,000 bytes larger:

Code Select

.data

    smallArray DWORD 10 DUP(0) ; 40 bytes
    bigArray DWORD 5000 DUP(0) ; 20,000 bytes

notice the ?'s and 0's
you cannot assign initial values in the .DATA? section

Title: Re: How does .data? directive work under the hood
Post by: Tedros on August 22, 2011, 03:12:16 PM

That was not kip that was actually me
Apologies I have corrected the typo!

How embarassing :red

Quote from: jj2007 on August 22, 2011, 03:07:35 PM
Quote from: Tedros on August 22, 2011, 02:34:02 PM

From Kip irvines book
The .DATA? directive declares uninitialized data. When defining a large block of uninitialized data, the .DATA? directive reduces the size of a compiled program. For example, the followingcode is declared efficiently:

Code Select Expand
.data bigArray DWORD 5000 DUP(?) ; 20,000 bytes, not initialized

Have you verified that statement...?

Title: Re: How does .data? directive work under the hood
Post by: jj2007 on August 22, 2011, 03:19:43 PM

Don't worry :bg
But it made me discover a cute new function...:

Code Select

include \masm32\include\masm32rt.inc

.data
    smallArray DWORD 50 DUP(0) 	; 200 bytes .data?

.data		; now add a question mark and try again...
    bigArray DWORD 5000 DUP(?) 	; 20,000 bytes, not initialized

.code
start:	invoke GetModuleFileName, 0, offset smallArray, 200
	invoke GetCompressedFileSize, offset smallArray, 0
	MsgBox 0, str$(eax), "The executable size:", MB_OK
	exit

end start

Title: Re: How does .data? directive work under the hood
Post by: Tedros on August 22, 2011, 03:51:06 PM

Thanks guys .

2 questions remained unanswered

1. Is the memory for the uninitalised data (.data?) still within the exe image itself when the program is loaded by the os loader (as opposed to it being in the stack or heap)

2. In the dump bin output I take it the line that says 0 size of uninitialized data does not refer to the data in the .data? section.

Title: Re: How does .data? directive work under the hood
Post by: hutch-- on August 22, 2011, 04:41:09 PM

to your first question, the space allocated for uninitialised data is within the running thread's address range. An executable is loaded by the OS into its own address space and both the initialised and uninitialised data sections are within that range. The difference is that the uninitialised data section does not need to take up space in the disk image of the executable, it is provided by the OS at run time.

Title: Re: How does .data? directive work under the hood
Post by: dedndave on August 22, 2011, 04:52:28 PM

here are a couple items that you will enjoy playing with :bg

first, the PE/COFF specification, v8
it tells you how the EXE header is put together

http://www.masm32.com/board/index.php?topic=13135.0

second, Wayne's PeView program is available, here...

http://www.magma.ca/~wjr/

there are a number of programs that can do this - some allow editing
but Wayne's is small, easy to use, and trusted :P

Title: Re: How does .data? directive work under the hood
Post by: dedndave on August 22, 2011, 05:01:21 PM

here's another fun toy
not related to the DATA sections, but it is related to how EXE's are put together
it allows you to view resources that are included in EXE files

http://angusj.com/resourcehacker/

and, of course, Mark Russinovich's SysInternals Suite has a variety of tools that are excellent

http://technet.microsoft.com/en-us/sysinternals/bb842062

Title: Re: How does .data? directive work under the hood
Post by: FORTRANS on August 22, 2011, 08:03:12 PM

Hi,

You could generate a listing with the ML /Fl command line
switch. and a map file with the /Fm switch. So:

ml /Fl /Fm example.asm

That should show you where things end up.

Regards

Steve N.

Title: Re: How does .data? directive work under the hood
Post by: Tedros on August 24, 2011, 05:27:03 PM

you guys are awesome :clap: :clap:

Just to wrap up this questions for others learning MASM

I have read the Microsoft Portable Executable and Common Object File Format Specification v9 and extracted the following

HEADERS
SizeOfInitializedData
The size of the initialized data section,
or the sum of all such sections if there
are multiple data sections.

SizeOfUninitializedData
The size of the uninitialized data section (BSS),
or the sum of all such sections if there are multiple BSS sections.

For those who don't know what BSS means it is Block Started by Symbol, its a historical naming thing thats kind of stuck around but basically in many compilers and linkers the bss is used as the name of the data segment containing uninitialized variables. It is often referred to as the "bss section" or "bss segment".

And from Matt Pietrek PE/COFF article (http://msdn.microsoft.com/en-us/library/ms809762.aspx) I got this :
The .bss section is where any uninitialized static and global variables are stored.
The linker combines all the .bss sections in the OBJ and LIB files into one .bss section in the EXE.
In the section table, the RawDataOffset field for the .bss section is set to 0, indicating that this section doesn't take up any space in the file.
TLINK doesn't emit this section. Instead it extends the virtual size of the DATA section.

in my example above I was looking at the headers and not the section table as Matt has indicated.
For some reason I thought the header and section table were the same but I guess not

Thank you all for the great pointers and patience :bg

Title: Re: How does .data? directive work under the hood -- DATA? init'd to 0?
Post by: HerbM on January 07, 2012, 03:34:56 AM

Is DATA? section guaranteed to be initialized to 0?

[Going to read PE, MASM etc docs and Google more now....]

--
HerbM

Title: Re: How does .data? directive work under the hood
Post by: dedndave on January 07, 2012, 03:55:34 AM

for 32-bit code, yes - although, i can't point to a document that says so - JJ found it once
for 16-bit code, no - whatever was left in memory

Title: Re: How does .data? directive work under the hood
Post by: jj2007 on January 07, 2012, 08:57:25 AM

Quote from: dedndave on January 07, 2012, 03:55:34 AM
for 32-bit code, yes - although, i can't point to a document that says so - JJ found it once

Found this at http://www.tenouk.com/Bufferoverflowc/Bufferoverflow1c.html:
BSS stands for 'Block Started by Symbol'. Global and statically allocated data that initialized to zero by default are kept in what is called the BSS area of the process.

Unfortunately it's for Linux :bg

Microsoft seems very shy to reveal its secrets, but I also remember that I once saw a MSDN or so reference. Probably it has to do with the fact that VirtualAlloc has no option for non-zeroed memory.

Title: Re: How does .data? directive work under the hood
Post by: hutch-- on January 07, 2012, 09:28:00 AM

:bg

Do yourself a favour with uninitialised memory, DON'T assume it will be zero filled if its not in the spec for the OS, Compiler, Assembler or Linker or you could be unpleasantly surprised. If you need zero filled uninitialised data, zero fill it yourself. If its enough to worry about, use dynamic memory, its easier to deal with and you can deallocate it when you are finished with it.

Title: Re: How does .data? directive work under the hood
Post by: jj2007 on January 07, 2012, 11:29:16 AM

Quote from: hutch-- on January 07, 2012, 09:28:00 AM
Do yourself a favour with uninitialised memory, DON'T assume it will be zero filled if its not in the spec for the OS, Compiler, Assembler or Linker

None of them is involved, it's the OS loader that takes care of the bss segment. It is really difficult to find documentation, but for Win32 you can assume that .data? gets filled with VirtualAlloc. Zeros, not garbage.

The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: Tedros on August 22, 2011, 02:34:02 PM