News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Reading raw debug symbol data

Started by donkey, March 28, 2010, 07:34:12 PM

Previous topic - Next topic

donkey

Hi clive,

The prologue/epilogue is probably the way I am going to go but since GoAsm does not store the line number data I needed a way to allow the user to select specific parts of the code and also a way to identify parts of code that were profiled. Without line numbers source level profiling is pretty much out of the question so symbols are the only option left that I can think of short of a long list of meaningless addresses. As far as I know GoAsm does not include any aux symbols but if I decide to do a real (non-experimental) raw symbol data extractor I will have to make allowance for them.

Edgar
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

donkey

For the bad section entries, a bounds check can be added to the loop:

:
mov edx,ebx
mov ecx,[ebx+IMAGE_SYMBOL.Name.Long]
add ecx,edi
cmp B[ebx],0
cmovz edx,ecx
movsx eax,W[ebx+IMAGE_SYMBOL.SectionNumber]
cmp eax,1
jl >.BADSECTION  // signed or 0

// So how do I get the symbols address ?????
// symbol name is in EDX
invoke AddSymbol,[hSymbolListview],edx,NULL

.BADSECTION
add ebx,SIZEOF IMAGE_SYMBOL
dec esi
jnz <
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

donkey

Thanks for the article in the earlier post Dave, it solved the biggest problem I had, the address of the symbol. The field IMAGE_SYMBOL.Value holds the RVA for the symbol, just add this value to the image base (usually 0x00400000) and you have an address in the loaded executable ! The value is found in [ebx+IMAGE_SYMBOL.Value], the normal load address can be found in IMAGE_NT_HEADERS.OptionalHeader.ImageBase. So the code with virtual addresses would look like this:

GetDebugSymbolsFromFile FRAME pMapFile
uses edi,esi,ebx
LOCAL ImageBase:%PTR

// Get a pointer to IMAGE_NT_HEADERS
mov edi,[pMapFile]
add edi,[edi+IMAGE_DOS_HEADER.e_lfanew]

mov eax,[edi+IMAGE_NT_HEADERS.OptionalHeader.ImageBase]
mov [ImageBase],eax

// Get a pointer to the COFF symbol table
mov ebx,[edi+IMAGE_NT_HEADERS.FileHeader.PointerToSymbolTable]
// EBX will hold the memory address of the symbol table
add ebx,[pMapFile]

// Get the number of symbols
mov eax,[edi+IMAGE_NT_HEADERS.FileHeader.NumberOfSymbols]
// ESI will hold the symbol count
mov esi,eax

// Calculate the size of the IMAGE_SYMBOL array
mov ecx,SIZEOF IMAGE_SYMBOL
mul ecx

// The long names are stored right after the IMAGE_SYMBOL array
// EDI will hold the memory address of the long names array
mov edi,eax
add edi,ebx

:
mov edx,ebx
mov ecx,[ebx+IMAGE_SYMBOL.Name.Long]
add ecx,edi
cmp B[ebx],0
cmovz edx,ecx
movsx eax,W[ebx+IMAGE_SYMBOL.SectionNumber]
cmp eax,1
jl >.BADSECTION  // signed or 0

// symbol name is in EDX

// Calculate the VA of the symbol
mov eax,[ebx+IMAGE_SYMBOL.Value]
add eax,[ImageBase]

invoke AddSymbol,[hSymbolListview],edx,eax

movzx ecx,B[ebx+IMAGE_SYMBOL.NumberOfAuxSymbols]
test ecx,ecx
jz >.BADSECTION // No AUX symbols

// Not sure if AUX symbols are counted in FileHeader.NumberOfSymbols but if they are
// adjust ESI to remove them from the count - I am assuming they are
sub esi,ecx

// skip past the AUX symbols
mov eax,SIZEOF IMAGE_SYMBOL
mul ecx
add ebx,eax

.BADSECTION
add ebx,SIZEOF IMAGE_SYMBOL
dec esi
jnz <

RET
ENDF


This manages to actually get more information from the symbol than from the DbgHelp API since I also can use the section number and read the data type in the section it will more reliably tell me if the section is code or data, something I have found lacking in the DbgHelp API.

Edgar
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

donkey

Nice !!!



Used the following to make an array holding the section types, the section number was passed to AddSymbol and the strings were chosen based on the section characteristics flags. The section number - 1 was the element of the array holding the characteristics for that section.

ReadSectionTable FRAME pMapFile
uses edi,esi,ebx

// All this function does is create an array of DWORDs to
// hold the characteristics of each section

// Get a pointer to IMAGE_NT_HEADERS
mov edi,[pMapFile]
add edi,[edi+IMAGE_DOS_HEADER.e_lfanew]

movzx esi,W[edi+IMAGE_NT_HEADERS.FileHeader.NumberOfSections]

// Get the size of the otpional header and add it to EDI
movzx eax,W[edi+IMAGE_NT_HEADERS.FileHeader.SizeOfOptionalHeader]
add edi,eax
add edi,4 // Here's that stupid +4 again
add edi, SIZEOF IMAGE_FILE_HEADER

// EDI holds a memory pointer to the section table
// ESI holds the number of IMAGE_SECTION_HEADER entries in the table

// Create the array of Characteristics values
mov eax,esi
shl eax,2
invoke GlobalAlloc,GMEM_FIXED,eax
mov [paSectionTypes],eax
mov edx,eax
xor ecx,ecx
:
mov eax,[edi+IMAGE_SECTION_HEADER.Characteristics]
mov [edx+ecx*4], eax
add edi,SIZEOF IMAGE_SECTION_HEADER
inc ecx
dec esi
jnz <
:

RET
ENDF


Edgar
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

jj2007

Quote from: donkey on March 30, 2010, 06:03:44 AM
Nice !!!

Very nice indeed. However, I am worried about the window title. Until now, we had occasional rows on coding issues but at least politically we seemed to be very close. Now you make propaganda for GOP...?
:wink

donkey

"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

donkey

#21
Looking at Microsoft Link.exe, it seems that there is no way possible to have it include debug symbols in the executable so I have to go the PDB route for any MASM programs. That's a shame because I don't know how to extract section information from the symbol data (ie code, initialized data etc) . The link.exe command line option states here:

Quote from: MSDNIt is not possible to create an .exe or .dll that contains debug information. Debug information is always placed in a .pdb file.

Since both GoAsm and MASM debug builds will work with the DbgHelp API I have chosen to use that route exclusively. The other option is to RE the pdb file format and since a look at that indicates there are quite a few different versions I think it would be far too much work and leave my program vulnerable to changes that might come later or some version I didn't have. Unfortunately this means that I have to rethink my method to determine which section a symbol resides in and that might be a bit of extra detective work.

EDIT: Well, the tag of the MASM symbol tells me what kind of symbol it is but that is not available in the GoAsm enumeration so I will have to use both and determine it based an the IMAGE_NT_HEADERS.FileHeader.PointerToSymbolTable contents, if there is a pointer I get the info from the file, if not then I use the DbgHelp API.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

clive

Depends on the version of LINK, normally I would use /DEBUG  /DEBUGTYPE:COFF

-Clive
It could be a random act of randomness. Those happen a lot as well.

jj2007

Quote from: donkey on March 31, 2010, 02:18:27 AM
Looking at Microsoft Link.exe, it seems that there is no way possible to have it include debug symbols in the executable so I have to go the PDB route for any MASM programs. That's a shame because I don't know how to extract section information from the symbol data (ie code, initialized data etc) . The link.exe command line option states here:

Quote from: MSDNIt is not possible to create an .exe or .dll that contains debug information. Debug information is always placed in a .pdb file.


That is a surprising statement. Attached an example that seems to prove the contrary - unless I completely misunderstand the meaning of "debug information". I have renamed pdb to xpdb and ilk to xilk, but Olly still sees the symbolic names.
Built with /Zi for the assembler and /debug for the linker (version 5.12).

clive

Quote from: donkey
Since both GoAsm and MASM debug builds will work with the DbgHelp API I have chosen to use that route exclusively. The other option is to RE the pdb file format and since a look at that indicates there are quite a few different versions I think it would be far too much work and leave my program vulnerable to changes that might come later or some version I didn't have. Unfortunately this means that I have to rethink my method to determine which section a symbol resides in and that might be a bit of extra detective work.

Yes, the PDB file format is a bit of a bugger, there are around 5 variants, and any given version of Microsoft's helper tools can't read some of them because they are not clearly identified. It's not well documented. The file is a self contained file system that permits incremental changes as the source code is updated and recompiled/linked. Some of the symbol and type information has roots in the CodeView format(s) so a familiarity with those is helpful.

Depending on how they were built you can break the file down at the library/object/source/function level, and source line number.

-Clive
It could be a random act of randomness. Those happen a lot as well.

donkey

Quote from: jj2007 on March 31, 2010, 01:31:17 PM
Quote from: donkey on March 31, 2010, 02:18:27 AM
Looking at Microsoft Link.exe, it seems that there is no way possible to have it include debug symbols in the executable so I have to go the PDB route for any MASM programs. That's a shame because I don't know how to extract section information from the symbol data (ie code, initialized data etc) . The link.exe command line option states here:

Quote from: MSDNIt is not possible to create an .exe or .dll that contains debug information. Debug information is always placed in a .pdb file.


That is a surprising statement. Attached an example that seems to prove the contrary - unless I completely misunderstand the meaning of "debug information". I have renamed pdb to xpdb and ilk to xilk, but Olly still sees the symbolic names.
Built with /Zi for the assembler and /debug for the linker (version 5.12).
Hi JJ2007,

I tried to build with /Zi and /Debug and the information is not present in the build, only the absolute path to the PDB file (use a hex editor and look at the very end of the file). I was using link 5.12.8078.0 which is the only version I have since I don't use MASM much at all I have never upgraded it. You can change the extension of the PDB file without losing your symbols as the search path is set by trying to open that file, if it is not found then the DbgHelp API (which Olly uses) will attempt to find any file in that path that matches the specs for a PDB and contains debug information for the executable. Try deleting the file completely or better still check the IMAGE_NT_HEADERS.FileHeader.PointerToSymbolTable entry, you will find it is NULL.

Hi Clive,

Yes, I have 2 examples of MS PDB files that vary widely. The file is structured as you said like a file system with a number of streams that contain different information, there appears to be a codeview stream but I am not sure if the information matches the codeview spec and the shear complexity of RE'ing the file format is more than I am willing to do when the DbgHelp API can extract the information I need. I have pretty much moved past the symbol extraction since the dual method appears to work and have landed squarely in the procedure mapping part of the profiler, that looks like it will be mostly searching for patterns and taking some best guesses but they should be pretty accurate. Distinguishing data symbols in the code section will be much tougher though I am not sure I will need to do that for the profiler as the person running it will know the source of the symbol. I am also having problems finding information about single stepping a program, all of the information that I can find deals with checking an option in some MS tool, not how it is implemented.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

clive

With LINK 5.12.8078 (in MASM32) you need both the /DEBUG and /DEBUGTYPE:COFF to get symbols within the file.

It's been decades since I wrote code to single step an 8086, there you generated an interrupt (INT x), and in the handler you would set the trace/single step bit (TF) in the stacked flags, the exiting IRET would then start stepping instructions via INT 1 (interrupt would be called after each instruction, IRET to run the next one).

How to achieve this on a Windows box is beyond my level of tolerance, I suspect the mechanism is much the same and can probably be done in a driver with perhaps some assistance from the OS hooks put in for WinDbg and SoftICE.

Thinks like VTune had static analysis modes with really good internal models of the processor, the dynamic modes used the internal performance counters. I'd have to check if my MSR dumping application still works in XP/Vista/Win7.

Not sure single stepping is good way to profile code, it is going to cause havoc with the pipelining. I would probably attack it with RDTSC in the prologue/epilog, do some course analysis to pin point the hot spots and then drill down further. I rarely use single stepping to debug things (ARM/MIPS w/JTAG), it's a method of last resort as I can usually read the code and understand the flow, or register usage. If I want to tune some code I'll pull it out and put it in a test harness and either use hardware based cycle counters, or timers.

-Clive
It could be a random act of randomness. Those happen a lot as well.

dedndave

here is an MSR tool, Clive

http://www.fileden.com/files/2008/3/3/1794507/MSR.zip

Attention: I am not responsible if someone ph_x up their machine with this program

donkey

Hi Clive,

I believe I can set single step through the eflags, in the profiled process I should be able to do it with the context structure passed to CREATE_PROCESS_DEBUG_EVENT but I have to wait til I get home to test it. As far as I can tell you just set the trace flag in the eflags register (0x100 - bit 8).

The single stepping is a way to track usage of data buffers which cannot be breakpointed, it will not be used for performance testing. Performance testing will be achieved by injecting a breakpoint and when it is reached writing the original data back before execution continues. A second breakpoint will signal the end of the profiled segment of code, when it is reached the breakpoint is restored.

Edgar
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

clive

Quote from: dedndave
here is an MSR tool, Clive

Had to blow the dust of this one. Well this still works in 32-bit XP, will try in Vista.

Edit : Well Vista doesn't like the dynamic injection of a driver.

-Clive
It could be a random act of randomness. Those happen a lot as well.