News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

How does .data? directive work under the hood

Started by Tedros, August 22, 2011, 02:34:02 PM

Previous topic - Next topic

Tedros

Hi Guys

I have a question that is bugging me and may be useful for others learning MASM.

After scouring the net looking for more detail on how the .data? I am still a bit puzzled of how it works internally.
After reading numerous posts (including http://www.masm32.com/board/index.php?topic=6973.0), I still can't find details of what happens internally.

Background

From Kip irvines book
The .DATA? directive declares uninitialized data. When defining a large block of uninitialized data, the .DATA? directive reduces the size of a compiled program. For example, the followingcode is declared efficiently:

    .data?

    smallArray DWORD 10 DUP(?) ; 40 bytes .data?

    bigArray DWORD 5000 DUP(?) ; 20,000 bytes, not initialized


The following code, on the other hand, produces a compiled program 20,000 bytes larger:

    .data

    smallArray DWORD 10 DUP(0) ; 40 bytes

    bigArray DWORD 5000 DUP(?) ; 20,000 bytes

What exactly is the .data? directive doing under the hood in the above example to make the program 20k smaller.


The issue

Being a sucker for details ( :dazzled:) I am very curious of how this actually works.

As I understand all the data  in the .data section is actually allocated within the .exe image itself.  However the data in the .data? section is not in the image itself but magically gets allocated when the application starts up.

So the question is what happens during the magical allocation time when the application starts up. After reading around I am still puzzled but this is what I have deduced (correctly or incorrectly)

1. User runs the exe by double clicking
2. The os tells the OS image/exe loader to load the exe into a virtual address space
3. The data in the .data section are loaded automatically into the images virtual space.
4. What happens next ......The data in the .data? section ??????????

So the next thing I did was to create a small sample exe and examine the output of dumpbin.exe /headers of the following code


INCLUDE Irvine32.inc

.data?
smallArray DWORD 10 DUP(?) ; 40 bytes
bigArray DWORD 5000 DUP(?) ; 20,000 bytes

.code
main PROC

exit ; quit
main ENDP

END main



and do the same thing again but this time change .data? to just .data and compare the outputs. Please forgive me if I am a bit ignorant as I am no a pro like you guys but the only difference I could see was in the optional header value which looks like this

when using .data

OPTIONAL HEADER VALUES
            .....
            1200 size of code
        6800 size of initialized data
             0 size of uninitialized data
            .....



when using .data?

OPTIONAL HEADER VALUES
            .....
            1200 size of code
             800 size of initialized data
             0 size of uninitialized data

And sure enough if you do 6800h-800h you get 6000h = 24576 bytes (have I understood this correctly)



The question

The questions I have are:

1. In win32 or win64 how does the the 20KB of memory get allocated for the bigArray DWORD array when using .data? directive
2. WHERE Is this memory for the bigArray DWORD array  allocated (is it in the stack, heap or within the actual loaded exe itself?)
3. In the bumpbin output about what is size of uninitialized data line


You answers are very much appreciated:

P.s this is my first post on MASM32 forum and wish to be respectful of your community so if I have not given enough details or have asked an inappropriate question for this forum please let me know and I will correct my question/behaviour....

dedndave

in the EXE header, there is a value that tells the EXE loader how much extra memory to allocate for the .DATA? section at load time
from what i understand, this memory is always filled with 0's

while allocating memory in the uninitialized data section does make the EXE file smaller, it still uses memory at run-time   :P
that is ok for data that you want to use over the entire lifespan of program execution

if you have a large temporary allocation requirement, it is better to use one of the functions like HeapAlloc
GlobalAlloc, VirtualAlloc, LocalAlloc are similar alternatives

for smaller temporary allocations, you may use stack space

jj2007

Quote from: Tedros on August 22, 2011, 02:34:02 PM

From Kip irvines book
The .DATA? directive declares uninitialized data. When defining a large block of uninitialized data, the .DATA? directive reduces the size of a compiled program. For example, the followingcode is declared efficiently:

    .data
    bigArray DWORD 5000 DUP(?) ; 20,000 bytes, not initialized



Have you verified that statement...?

dedndave

 :bg
he has his cases reversed
it should read...

.data?

    smallArray DWORD 10 DUP(?) ; 40 bytes .data?
    bigArray DWORD 5000 DUP(?) ; 20,000 bytes, not initialized


The following code, on the other hand, produces a compiled program 20,000 bytes larger:

.data

    smallArray DWORD 10 DUP(0) ; 40 bytes
    bigArray DWORD 5000 DUP(0) ; 20,000 bytes


notice the ?'s and 0's
you cannot assign initial values in the .DATA? section

Tedros

That was not kip that was actually me
Apologies I have corrected the typo! 

How embarassing  :red


Quote from: jj2007 on August 22, 2011, 03:07:35 PM
Quote from: Tedros on August 22, 2011, 02:34:02 PM

From Kip irvines book
The .DATA? directive declares uninitialized data. When defining a large block of uninitialized data, the .DATA? directive reduces the size of a compiled program. For example, the followingcode is declared efficiently:

    .data
    bigArray DWORD 5000 DUP(?) ; 20,000 bytes, not initialized



Have you verified that statement...?

jj2007

Don't worry :bg
But it made me discover a cute new function...:
include \masm32\include\masm32rt.inc

.data
    smallArray DWORD 50 DUP(0) ; 200 bytes .data?

.data ; now add a question mark and try again...
    bigArray DWORD 5000 DUP(?) ; 20,000 bytes, not initialized

.code
start: invoke GetModuleFileName, 0, offset smallArray, 200
invoke GetCompressedFileSize, offset smallArray, 0
MsgBox 0, str$(eax), "The executable size:", MB_OK
exit

end start

Tedros

Thanks guys .

2 questions remained unanswered

1. Is the memory for the uninitalised data (.data?) still within the exe image itself when the program is loaded by the os loader (as opposed to it being in the stack or heap)

2. In the dump bin output I take it the line that says 0 size of uninitialized data does not refer to the data in the .data? section.

hutch--

to your first question, the space allocated for uninitialised data is within the running thread's address range. An executable is loaded by the OS into its own address space and both the initialised and uninitialised data sections are within that range. The difference is that the uninitialised data section does not need to take up space in the disk image of the executable, it is provided by the OS at run time.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

here are a couple items that you will enjoy playing with   :bg

first, the PE/COFF specification, v8
it tells you how the EXE header is put together

http://www.masm32.com/board/index.php?topic=13135.0

second, Wayne's PeView program is available, here...

http://www.magma.ca/~wjr/

there are a number of programs that can do this - some allow editing
but Wayne's is small, easy to use, and trusted   :P

dedndave

here's another fun toy
not related to the DATA sections, but it is related to how EXE's are put together
it allows you to view resources that are included in EXE files

http://angusj.com/resourcehacker/

and, of course, Mark Russinovich's SysInternals Suite has a variety of tools that are excellent

http://technet.microsoft.com/en-us/sysinternals/bb842062

FORTRANS

Hi,

   You could generate a listing with the ML /Fl command line
switch.  and a map file with the /Fm switch.  So:

ml /Fl /Fm example.asm

That should show you where things end up.

Regards

Steve N.

Tedros

you guys are awesome  :clap: :clap:

Just to wrap up this questions for others learning MASM

I have read the Microsoft Portable Executable and Common Object File Format Specification v9 and extracted the following

HEADERS
SizeOfInitializedData   
The size of the initialized data section,
or the sum of all such sections if there
are multiple data sections.

SizeOfUninitializedData
The size of the uninitialized data section (BSS),
or the sum of all such sections if there are multiple BSS sections.



For those who don't know what BSS means it is Block Started by Symbol, its a historical naming thing thats kind of stuck around but basically in many compilers and linkers the bss is used as the name of the data segment containing uninitialized variables. It is often referred to as the "bss section" or "bss segment".


And from Matt Pietrek PE/COFF article (http://msdn.microsoft.com/en-us/library/ms809762.aspx) I got this :
The .bss section is where any uninitialized static and global variables are stored.
The linker combines all the .bss sections in the OBJ and LIB files into one .bss section in the EXE.
In the section table, the RawDataOffset field for the .bss section is set to 0, indicating that this section doesn't take up any space in the file.
TLINK doesn't emit this section. Instead it extends the virtual size of the DATA section.

in my example above I was looking at the headers and not the section table as Matt has indicated.
For some reason I thought the header and section table were the same but I guess not

Thank you all for the great pointers and patience  :bg




HerbM

Is DATA? section guaranteed to be initialized to 0?


[Going to read PE, MASM etc docs and Google more now....]

--
HerbM

dedndave

for 32-bit code, yes - although, i can't point to a document that says so - JJ found it once
for 16-bit code, no - whatever was left in memory

jj2007

Quote from: dedndave on January 07, 2012, 03:55:34 AM
for 32-bit code, yes - although, i can't point to a document that says so - JJ found it once

Found this at http://www.tenouk.com/Bufferoverflowc/Bufferoverflow1c.html:
BSS stands for 'Block Started by Symbol'.  Global and statically allocated data that initialized to zero by default are kept in what is called the BSS area of the process.

Unfortunately it's for Linux :bg

Microsoft seems very shy to reveal its secrets, but I also remember that I once saw a MSDN or so reference. Probably it has to do with the fact that VirtualAlloc has no option for non-zeroed memory.