The MASM Forum Archive 2004 to 2012

Project Support Forums => GoAsm Assembler and Tools => Topic started by: donkey on March 28, 2010, 07:34:12 PM

Title: Reading raw debug symbol data
Post by: donkey on March 28, 2010, 07:34:12 PM
I need to obtain the symbols from a GoAsm executable file and have been trying to figure out how to decode the raw symbol table data. Finding the data is simple enough but I can't figure out the header information. As far as I know, and I could be wrong, the symbols are stored in IMAGE_SYMBOL structures, if the symbol name is longer than 8 bytes the union points to an offset in the symbol name table. That would mean that these offsets or names would be found every 18 bytes (SIZEOF IMAGE_SYMBOL) but the first recognizable symbol is not at an even multiple of 18 so there has to be some header data at the beginning of the raw data. Below is the definition from winnt.h for the IMAGE_SYMBOL structure and a dump of the raw symbol data from my test program.

In order to obtain a pointer to the symbol data I used the following:

// Get a pointer to IMAGE_NT_HEADERS

mov eax,[pMapFile]
mov edi,[eax+IMAGE_DOS_HEADER.e_lfanew]
add edi,[pMapFile]

// Store the offset of the DataDirectory in EBX
lea ebx,[edi+IMAGE_NT_HEADERS.OptionalHeader.DataDirectory]
// Get the debug symbols entry (7th entry base 0)
// SIZEOF IMAGE_DATA_DIRECTORY is always 8 BYTEs
mov eax,6
shl eax,3

add ebx,eax

// EBX now contains the IMAGE_DATA_DIRECTORY structure for the symbol table
// Convert the RVA to a file position
invoke RVAToFilePos,[pMapFile],[ebx+IMAGE_DATA_DIRECTORY.VirtualAddress]
mov esi,[pMapFile]
add esi,eax

mov eax,[ebx+IMAGE_DATA_DIRECTORY.Size]
mov ecx,SIZEOF IMAGE_DEBUG_DIRECTORY
xor edx,edx
div ecx

// EAX contains the number of IMAGE_DEBUG_DIRECTORY entries
// ESI now points to the IMAGE_DEBUG_DIRECTORY entry for the PE
// get a pointer to the raw data in the file

mov ebx,[esi+IMAGE_DEBUG_DIRECTORY.SizeOfData]
lea eax,[esi+IMAGE_DEBUG_DIRECTORY.PointerToRawData]
mov esi,[eax]
add esi,[pMapFile]

// ESI contains a memory pointer to the base of the raw data
// EBX contains the size of the data


IMAGE_SYMBOL STRUCT
UNION
ShortName DB 8 DUP
Name STRUCT
Short DD
Long DD
ENDS
LongName DD 2 DUP
ENDUNION
Value DD
SectionNumber DW
Type DW
StorageClass DB
NumberOfAuxSymbols DB
ENDS


00380E00:  0B 00 00 00-20 00 00 00-00 00 00 00-00 00 00 00   .... ...........
00380E10:  00 10 00 00-00 12 00 00-00 20 00 00-00 28 00 00   ......... ...(..
00380E20:  00 00 00 00-04 00 00 00-50 20 00 00-02 00 00 00   ........P ......
00380E30:  02 00 73 7A-54 65 73 74-00 00 54 20-00 00 02 00   ..szTest..T ....
00380E40:  00 00 02 00-53 54 41 52-54 00 00 00-00 10 00 00   ....START.......
00380E50:  01 00 00 00-02 00 00 00-00 00 0E 00-00 00 00 20   ...............
00380E60:  00 00 02 00-00 00 02 00-00 00 00 00-1D 00 00 00   ................
00380E70:  00 20 00 00-02 00 00 00-02 00 00 00-00 00 31 00   . ............1.
00380E80:  00 00 1C 20-00 00 02 00-00 00 02 00-44 6C 67 50   ... ........DlgP
00380E90:  72 6F 63 00-E2 10 00 00-01 00 00 00-02 00 00 00   roc.â...........
00380EA0:  00 00 45 00-00 00 F0 10-00 00 01 00-00 00 02 00   ..E...ð.........
00380EB0:  00 00 00 00-58 00 00 00-14 11 00 00-01 00 00 00   ....X...........
00380EC0:  02 00 00 00-00 00 65 00-00 00 FB 10-00 00 01 00   ......e...û.....
00380ED0:  00 00 02 00-00 00 00 00-76 00 00 00-0B 11 00 00   ........v.......
00380EE0:  01 00 00 00-02 00 86 00-00 00 68 49-6E 73 74 61   ......†...hInsta
00380EF0:  6E 63 65 00-45 78 63 65-70 74 69 6F-6E 41 72 67   nce.ExceptionArg
00380F00:  73 31 00 45-78 63 65 70-74 69 6F 6E-41 72 67 73   s1.ExceptionArgs
00380F10:  31 2E 4E 61-6D 65 00 45-78 63 65 70-74 69 6F 6E   1.Name.Exception
00380F20:  41 72 67 73-31 2E 74 79-70 65 00 44-6C 67 50 72   Args1.type.DlgPr
00380F30:  6F 63 2E 57-4D 5F 43 4F-4D 4D 41 4E-44 00 44 6C   oc.WM_COMMAND.Dl
00380F40:  67 50 72 6F-63 2E 45 58-49 54 00 44-6C 67 50 72   gProc.EXIT.DlgPr
00380F50:  6F 63 2E 57-4D 5F 43 4C-4F 53 45 00-44 6C 67 50   oc.WM_CLOSE.DlgP
00380F60:  72 6F 63 2E-44 45 46 50-52 4F 43 00-              roc.DEFPROC.


As you can see the first easily recognized symbol is szTest, using that as a jumping off point it appears that the 18 byte rule holds but since it is at offset 50 in the data there must be a 32 byte header which might indicate that the 2nd DWORD in the data points to the first symbol. To verify that the structure seems correct, I look at the START symbol and count ahead 18 bytes, I find the value 00 00-00 00 0E 00-00 00. If I take the start of the names section to be the 0x86 (†) which seems to be right as it is the number of bytes in the names section, I arrive at ExceptionArgs1 exactly 0E (IMAGE_SYMBOL.Name.Long) into the names, which would appear to be correct. Counting backward 18 bytes from START I find the data 00 00 00 00-04 00 00 00, the IMAGE_SYMBOL.Name.Long offset of hInstance, the first symbol in the table, so with multiple verifications I can be pretty confident that I have chosen the right struct to decode the entries.

So, does anyone know the structure of the data in the 32 byte header, mainly I am looking to obtain the offset to the names section, in this case 0xE6 or maybe an RVA ?

Edgar
Title: Re: Reading raw debug symbol data
Post by: donkey on March 28, 2010, 07:53:51 PM
Mmmmm, IMAGE_COFF_SYMBOLS_HEADER might be it.

EDIT:
Yup, that's the answer to my question. IMAGE_COFF_SYMBOLS_HEADER is 32 bytes long and the second parameter is LvaToFirstSymbol which is what I expected it to be. The lva to the names section is not given but is easily calculated using the following:

mov eax,[esi+IMAGE_COFF_SYMBOLS_HEADER.NumberOfSymbols]
mov ecx, SIZEOF IMAGE_SYMBOL
mul ecx


EAX holds the offset to the names section from the first symbol entry. Now just to find their addresses and values :)
Title: Re: Reading raw debug symbol data
Post by: dedndave on March 28, 2010, 08:01:25 PM
this should help, Edgar   :U

http://www.masm32.com/board/index.php?topic=13135.0
Title: Re: Reading raw debug symbol data
Post by: donkey on March 28, 2010, 08:05:45 PM
Thanks Dave,

It looks like it will help alot, I have spent a few hours reverse engineering the PE file so I could get the raw symbol data. I can already read the data and now understand much of it but it should help (I hope) to assign relocations to the symbols so I can find them in the executable by address. All a part of my grand idea for a profiling tool :) An exceedingly interesting project.

Edgar
Title: Re: Reading raw debug symbol data
Post by: dedndave on March 29, 2010, 12:02:36 AM
it does sound interesting
i have thought some about the idea myself
although, i am a total n00b at windows gui apps - lol
but, i have been trying to do some of the low-level stuff like ID'ing processors (incl frequency) and OS's, etc
Hutch has detailed available memory and disk space for us
Michael has given us some insight to timing
but, to get into ring 0 and use the performance counters is the next big step - a little over my head   :bg

someplace, i found a nice link that might be helpful - let me see if i can find it...

...well - this is one - i thought i had a more interesting one - maybe i saved the whole url page - i will keep my eyes open for it

http://perfinsp.sourceforge.net/
Title: Re: Reading raw debug symbol data
Post by: donkey on March 29, 2010, 12:22:09 AM
Thanks Dave, not too worried about accurate timing yet, just have to get the address of the code symbols figured out. BTW the above code was the long way to do it, once you figure everything out and are sure it boils down to this:

invoke Dbghelp.dll:ImageNtHeader,[pMapFile]
mov edi,eax

mov ebx,[edi+IMAGE_NT_HEADERS.FileHeader.PointerToSymbolTable]
add ebx,[pMapFile]

mov eax,[edi+IMAGE_NT_HEADERS.FileHeader.NumberOfSymbols]
mov esi,eax
mov ecx,SIZEOF IMAGE_SYMBOL
mul ecx
mov edi,eax
add edi,ebx


EDI contains the memory address of the names
EBX contains the base address of the array of IMAGE_SYMBOL structures
ESI contains the number of entries

Pretty easy to decode after that.

Edgar
Title: Re: Reading raw debug symbol data
Post by: dedndave on March 29, 2010, 01:17:44 AM
well - the link i remember (or don't remember) had a lot of great info about programming the performance regsiters
it was over my head, of course, but i found it interesting
Title: Re: Reading raw debug symbol data
Post by: clive on March 29, 2010, 02:51:05 AM
The COFF debug data isn't particularly rich with detail, you'll have to pull any section and relocation information out of the PE structure itself.

The CodeView and PDB files have significantly more information, but again you'd have to leverage data from the PE file as a whole. There is some pre-link segmentation data that comes from the object files, and this gets condensed into the PE sections, via a segment table. Another area of complication with Microsoft system files is the use of post-link optimization mapping (OMAP) where stuff is moved around, and the debug information doesn't get updated.

DumpPE should give a rough translation of the debug data within the file.

-Clive
Title: Re: Reading raw debug symbol data
Post by: donkey on March 29, 2010, 03:47:52 AM
CodeView and PDB are not an option, GoAsm embeds debug symbols in the file, I am currently trying to get the DbgHelp API functions working but not having much luck enumerating the symbols with SymEnumSymbols

The call from the CREATE_PROCESS_DEBUG_EVENT handler:

invoke EnumerateSymbols,[dbe.u.CreateProcessInfo.hProcess],[dbe.u.CreateProcessInfo.lpBaseOfImage]

My symbol enumerator:
EnumerateSymbols FRAME hProcess, ImageBase
LOCAL SymInfo:SYMBOL_INFO
LOCAL ProcessPath[MAX_PATH]:%CHAR

mov D[ProcessPath],0

invoke SetLastError,0
invoke GetProcessImageFileName ,[hProcess],offset ProcessPath,MAX_PATH

invoke SymInitialize,[hProcess], offset ProcessPath, FALSE

// BaseOfDll is a 64 bit number passed as 2 DWORDS ([ImageBase],0)
invoke SymEnumSymbols,[hProcess],[ImageBase],0,0,offset SymEnumSymbolsProc, NULL

invoke SymCleanup,[hProcess]
RET
ENDF

SymEnumSymbolsProc FRAME pSymInfo, SymbolSize, UserContext

mov eax,[pSymInfo]
add eax,SYMBOL_INFO.Name

PrintStringByAddr(eax)

mov eax,TRUE
RET
ENDF


SymInitialize, SymEnumSymbols and SymCleanup all return TRUE (successful) however the enumeration function is not called, it leads me to believe that the DbgApi does not recognize the symbol data. I am not sure what is required but I am trying a few different things.

Edgar
Title: Re: Reading raw debug symbol data
Post by: donkey on March 29, 2010, 05:31:36 AM
Well, I have it working, sort of anyway:

The call from the CREATE_PROCESS_DEBUG_EVENT handler:

invoke EnumerateSymbols,[dbe.u.CreateProcessInfo.hFile],[dbe.u.CreateProcessInfo.hProcess],[dbe.u.CreateProcessInfo.lpBaseOfImage]

EnumerateSymbols FRAME hFile, hProcess, ImageBase
LOCAL fsh:%DWORD32
LOCAL fsl:%DWORD32
LOCAL ProcessPath[MAX_PATH]:%CHAR

mov D[ProcessPath],0

invoke SetLastError,0
invoke GetProcessImageFileName ,[hProcess],offset ProcessPath,MAX_PATH

invoke GetFileSize,[hFile],offset fsh
mov [fsl],eax

invoke SymInitialize,[hProcess], offset ProcessPath, FALSE

invoke SymLoadModuleEx,[hProcess],[hFile],offset ProcessPath,NULL,[ImageBase],0,[fsl],0,0

invoke SymEnumSymbols,[hProcess],[ImageBase],0,"*",offset SymEnumSymbolsProc, NULL

invoke SymUnloadModule64,[hProcess],[ImageBase],0

invoke SymCleanup,[hProcess]

RET
ENDF

SymEnumSymbolsProc FRAME pSymInfo, SymbolSize, UserContext
mov eax,[pSymInfo]
add eax,4

mov edx,[eax+SYMBOL_INFO.Address]
add eax,SYMBOL_INFO.Name
invoke AddSymbol,[hSymbolListview],eax,edx
mov eax,TRUE
RET
ENDF


This will enumerate the symbols and everything is perfect as long as I add 4 to the pSymInfo address in the callback. I can't figure that one out at all but the structures address is actually 4 bytes above the address in pSymInfo. Once I figured that out, which was no small puzzle, everything fell into place, I can get the address of the symbol, its name and a lot of other information about the symbol. This is definitely the way to go but I am worried that I'll be bitten in the ass because of the 4 byte offset thing.

Edgar
Title: Re: Reading raw debug symbol data
Post by: jj2007 on March 29, 2010, 06:40:49 AM
Great stuff. I see getting bored is a good incentive for becoming really productive :green
So we see a variable & labels dump function at the horizon, right?
:bg
Title: Re: Reading raw debug symbol data
Post by: donkey on March 29, 2010, 06:48:14 AM
Quote from: jj2007 on March 29, 2010, 06:40:49 AM
Great stuff. I see getting bored is a good incentive for becoming really productive :green
So we see a variable & labels dump function at the horizon, right?
:bg

The tool will do that but it is going to be a profiling tool (time profiler) when it grows up, pick 2 labels and measure the time to execute the code between them. Frequency of calls to a particular procedure or references to a particular memory location etc...
Title: Re: Reading raw debug symbol data
Post by: clive on March 29, 2010, 03:29:46 PM
Quote from: donkey
CodeView and PDB are not an option, GoAsm embeds debug symbols in the file, I am currently trying to get the DbgHelp API functions working but not having much luck enumerating the symbols with SymEnumSymbols

Fair enough, the COFF stuff is pretty straight forward. Most of the relocation is done, just need to add in the base address. The symbol records are nominally 0x12 (IMAGE_SYMBOL) bytes long, but can spill  into multiple records (NumberOfAuxSymbols). Short symbols are stored withing the symbol (8 chars), with long ones indexed into the appended symbol table (at sizeof(IMAGE_SYMBOL) * NumberOfSymbols + LvaToFirstSymbol). With GoAsm you probably don't have to worry about OMAPing.

From Testbug3.dll


Debug Entry

Chars    TimeDate Maj  Min  Type                   Size     AddrRaw  PtrRaw
-------- -------- ---- ---- ---------------------- -------- -------- --------
00000000 41587495 0000 0000 00000001 COFF          000000CA 00000000 00001000

COFF Debug Info Header

  NumberOfSymbols:      00000008
  LvaToFirstSymbol:     00000020
  NumberOfLinenumbers:  00000000
  LvaToFirstLinenumber: 00000000
  RvaToFirstByteOfCode: 00001000
  RvaToLastByteOfCode:  00001200
  RvaToFirstByteOfData: 00002000
  RvaToLastByteOfData:  00002A00

Val 00002000, Sec 0002, Typ 0000, Sto 02, Aux 00, DLLMESS1

Val 00002078, Sec 0002, Typ 0000, Sto 02, Aux 00, M1

Val 0000207F, Sec 0002, Typ 0000, Sto 02, Aux 00, DLLMESS2

Val 000020C0, Sec 0002, Typ 0000, Sto 02, Aux 00, DLLMESS3

Val 00002128, Sec 0002, Typ 0000, Sto 02, Aux 00, M2

Val 00001000, Sec 0001, Typ 0000, Sto 02, Aux 00, HEXROTATE4

Val 0000101F, Sec 0001, Typ 0000, Sto 02, Aux 00, START

Val 00001065, Sec 0001, Typ 0000, Sto 02, Aux 00, DLL_TEST3B


-Clive
Title: Re: Reading raw debug symbol data
Post by: donkey on March 29, 2010, 04:27:03 PM
Hi clive,

Thanks, with a bit of RE work I got the original one working though in the end I decided to use DbgHelp to do the symbol extraction for me, that way if I decide to support other formats it will be a minor addition. Also the amount of information returned from SymEnumSymbols would have been a lot of work to duplicate. The actual code I used to read the raw debug symbol table is this:

GetDebugSymbolsFromFile FRAME pMapFile
uses edi,esi,ebx

// Get a pointer to IMAGE_NT_HEADERS
mov edi,[pMapFile]
add edi,[edi+IMAGE_DOS_HEADER.e_lfanew]

// Get a pointer to the COFF symbol table
mov ebx,[edi+IMAGE_NT_HEADERS.FileHeader.PointerToSymbolTable]
// EBX will hold the memory address of the symbol table
add ebx,[pMapFile]

// Get the number of symbols
mov eax,[edi+IMAGE_NT_HEADERS.FileHeader.NumberOfSymbols]
// ESI will hold the symbol count
mov esi,eax

// Calculate the size of the IMAGE_SYMBOL array
mov ecx,SIZEOF IMAGE_SYMBOL
mul ecx

// The long names are stored right after the IMAGE_SYMBOL array
// EDI will hold the memory address of the long names array
mov edi,eax
add edi,ebx

:
mov edx,ebx
mov ecx,[ebx+IMAGE_SYMBOL.Name.Long]
add ecx,edi
cmp B[ebx],0
cmovz edx,ecx
movzx eax,W[ebx+IMAGE_SYMBOL.SectionNumber]

// So how do I get the symbols address ?????
invoke AddSymbol,[hSymbolListview],edx,NULL

add ebx,SIZEOF IMAGE_SYMBOL
dec esi
jnz <

RET
ENDF


However, getting the VA for a symbol would have been quite a lot of detective work added on to a day of RE'ing the PE symbol format so DbgHelp was the only solution that fit into the schedule I had (have to do this around work and other responsibilities). I may still tackle the raw way to do it but that is for another Sunday, in this project the symbols and VA though critical are only a small part of the overall code. The 4 byte offset thing still has me worried though, it seems to always contain DWORD 0x58

Edgar

EDIT: tidied up the code a bit.
Title: Re: Reading raw debug symbol data
Post by: clive on March 29, 2010, 05:15:53 PM
I'd probably look at the NumberOfAuxSymbols being zero, and the SectionNumber not being 0x0000 or 0xFFFF, as a quick junk filter.

Also when NumberOfAuxSymbols is non zero you have to skip Aux * IMAGE_SYMBOL additional records. Not sure if GoAsm would generate them, but regular COFF debug records from LINK do.

Most profiling type applications tend to instrument the prolog/epilog code, or use FPO records.

-Clive
Title: Re: Reading raw debug symbol data
Post by: donkey on March 29, 2010, 05:37:30 PM
Hi clive,

The prologue/epilogue is probably the way I am going to go but since GoAsm does not store the line number data I needed a way to allow the user to select specific parts of the code and also a way to identify parts of code that were profiled. Without line numbers source level profiling is pretty much out of the question so symbols are the only option left that I can think of short of a long list of meaningless addresses. As far as I know GoAsm does not include any aux symbols but if I decide to do a real (non-experimental) raw symbol data extractor I will have to make allowance for them.

Edgar
Title: Re: Reading raw debug symbol data
Post by: donkey on March 29, 2010, 06:03:52 PM
For the bad section entries, a bounds check can be added to the loop:

:
mov edx,ebx
mov ecx,[ebx+IMAGE_SYMBOL.Name.Long]
add ecx,edi
cmp B[ebx],0
cmovz edx,ecx
movsx eax,W[ebx+IMAGE_SYMBOL.SectionNumber]
cmp eax,1
jl >.BADSECTION  // signed or 0

// So how do I get the symbols address ?????
// symbol name is in EDX
invoke AddSymbol,[hSymbolListview],edx,NULL

.BADSECTION
add ebx,SIZEOF IMAGE_SYMBOL
dec esi
jnz <
Title: Re: Reading raw debug symbol data
Post by: donkey on March 30, 2010, 04:48:47 AM
Thanks for the article in the earlier post Dave, it solved the biggest problem I had, the address of the symbol. The field IMAGE_SYMBOL.Value holds the RVA for the symbol, just add this value to the image base (usually 0x00400000) and you have an address in the loaded executable ! The value is found in [ebx+IMAGE_SYMBOL.Value], the normal load address can be found in IMAGE_NT_HEADERS.OptionalHeader.ImageBase. So the code with virtual addresses would look like this:

GetDebugSymbolsFromFile FRAME pMapFile
uses edi,esi,ebx
LOCAL ImageBase:%PTR

// Get a pointer to IMAGE_NT_HEADERS
mov edi,[pMapFile]
add edi,[edi+IMAGE_DOS_HEADER.e_lfanew]

mov eax,[edi+IMAGE_NT_HEADERS.OptionalHeader.ImageBase]
mov [ImageBase],eax

// Get a pointer to the COFF symbol table
mov ebx,[edi+IMAGE_NT_HEADERS.FileHeader.PointerToSymbolTable]
// EBX will hold the memory address of the symbol table
add ebx,[pMapFile]

// Get the number of symbols
mov eax,[edi+IMAGE_NT_HEADERS.FileHeader.NumberOfSymbols]
// ESI will hold the symbol count
mov esi,eax

// Calculate the size of the IMAGE_SYMBOL array
mov ecx,SIZEOF IMAGE_SYMBOL
mul ecx

// The long names are stored right after the IMAGE_SYMBOL array
// EDI will hold the memory address of the long names array
mov edi,eax
add edi,ebx

:
mov edx,ebx
mov ecx,[ebx+IMAGE_SYMBOL.Name.Long]
add ecx,edi
cmp B[ebx],0
cmovz edx,ecx
movsx eax,W[ebx+IMAGE_SYMBOL.SectionNumber]
cmp eax,1
jl >.BADSECTION  // signed or 0

// symbol name is in EDX

// Calculate the VA of the symbol
mov eax,[ebx+IMAGE_SYMBOL.Value]
add eax,[ImageBase]

invoke AddSymbol,[hSymbolListview],edx,eax

movzx ecx,B[ebx+IMAGE_SYMBOL.NumberOfAuxSymbols]
test ecx,ecx
jz >.BADSECTION // No AUX symbols

// Not sure if AUX symbols are counted in FileHeader.NumberOfSymbols but if they are
// adjust ESI to remove them from the count - I am assuming they are
sub esi,ecx

// skip past the AUX symbols
mov eax,SIZEOF IMAGE_SYMBOL
mul ecx
add ebx,eax

.BADSECTION
add ebx,SIZEOF IMAGE_SYMBOL
dec esi
jnz <

RET
ENDF


This manages to actually get more information from the symbol than from the DbgHelp API since I also can use the section number and read the data type in the section it will more reliably tell me if the section is code or data, something I have found lacking in the DbgHelp API.

Edgar
Title: Re: Reading raw debug symbol data
Post by: donkey on March 30, 2010, 06:03:44 AM
Nice !!!

(http://img706.imageshack.us/img706/386/symbolsh.jpg)

Used the following to make an array holding the section types, the section number was passed to AddSymbol and the strings were chosen based on the section characteristics flags. The section number - 1 was the element of the array holding the characteristics for that section.

ReadSectionTable FRAME pMapFile
uses edi,esi,ebx

// All this function does is create an array of DWORDs to
// hold the characteristics of each section

// Get a pointer to IMAGE_NT_HEADERS
mov edi,[pMapFile]
add edi,[edi+IMAGE_DOS_HEADER.e_lfanew]

movzx esi,W[edi+IMAGE_NT_HEADERS.FileHeader.NumberOfSections]

// Get the size of the otpional header and add it to EDI
movzx eax,W[edi+IMAGE_NT_HEADERS.FileHeader.SizeOfOptionalHeader]
add edi,eax
add edi,4 // Here's that stupid +4 again
add edi, SIZEOF IMAGE_FILE_HEADER

// EDI holds a memory pointer to the section table
// ESI holds the number of IMAGE_SECTION_HEADER entries in the table

// Create the array of Characteristics values
mov eax,esi
shl eax,2
invoke GlobalAlloc,GMEM_FIXED,eax
mov [paSectionTypes],eax
mov edx,eax
xor ecx,ecx
:
mov eax,[edi+IMAGE_SECTION_HEADER.Characteristics]
mov [edx+ecx*4], eax
add edi,SIZEOF IMAGE_SECTION_HEADER
inc ecx
dec esi
jnz <
:

RET
ENDF


Edgar
Title: Re: Reading raw debug symbol data
Post by: jj2007 on March 30, 2010, 06:33:48 AM
Quote from: donkey on March 30, 2010, 06:03:44 AM
Nice !!!

Very nice indeed. However, I am worried about the window title. Until now, we had occasional rows on coding issues but at least politically we seemed to be very close. Now you make propaganda for GOP...?
:wink
Title: Re: Reading raw debug symbol data
Post by: donkey on March 30, 2010, 07:31:59 AM
GoP = GoAsm Profiler
Title: Re: Reading raw debug symbol data
Post by: donkey on March 31, 2010, 02:18:27 AM
Looking at Microsoft Link.exe, it seems that there is no way possible to have it include debug symbols in the executable so I have to go the PDB route for any MASM programs. That's a shame because I don't know how to extract section information from the symbol data (ie code, initialized data etc) . The link.exe command line option states here: (http://msdn.microsoft.com/en-us/library/xe4t6fc1%28v=VS.80%29.aspx)

Quote from: MSDNIt is not possible to create an .exe or .dll that contains debug information. Debug information is always placed in a .pdb file.

Since both GoAsm and MASM debug builds will work with the DbgHelp API I have chosen to use that route exclusively. The other option is to RE the pdb file format and since a look at that indicates there are quite a few different versions I think it would be far too much work and leave my program vulnerable to changes that might come later or some version I didn't have. Unfortunately this means that I have to rethink my method to determine which section a symbol resides in and that might be a bit of extra detective work.

EDIT: Well, the tag of the MASM symbol tells me what kind of symbol it is but that is not available in the GoAsm enumeration so I will have to use both and determine it based an the IMAGE_NT_HEADERS.FileHeader.PointerToSymbolTable contents, if there is a pointer I get the info from the file, if not then I use the DbgHelp API.
Title: Re: Reading raw debug symbol data
Post by: clive on March 31, 2010, 01:22:35 PM
Depends on the version of LINK, normally I would use /DEBUG  /DEBUGTYPE:COFF

-Clive
Title: Re: Reading raw debug symbol data
Post by: jj2007 on March 31, 2010, 01:31:17 PM
Quote from: donkey on March 31, 2010, 02:18:27 AM
Looking at Microsoft Link.exe, it seems that there is no way possible to have it include debug symbols in the executable so I have to go the PDB route for any MASM programs. That's a shame because I don't know how to extract section information from the symbol data (ie code, initialized data etc) . The link.exe command line option states here: (http://msdn.microsoft.com/en-us/library/xe4t6fc1%28v=VS.80%29.aspx)

Quote from: MSDNIt is not possible to create an .exe or .dll that contains debug information. Debug information is always placed in a .pdb file.


That is a surprising statement. Attached an example that seems to prove the contrary - unless I completely misunderstand the meaning of "debug information". I have renamed pdb to xpdb and ilk to xilk, but Olly still sees the symbolic names.
Built with /Zi for the assembler and /debug for the linker (version 5.12).
Title: Re: Reading raw debug symbol data
Post by: clive on March 31, 2010, 03:57:21 PM
Quote from: donkey
Since both GoAsm and MASM debug builds will work with the DbgHelp API I have chosen to use that route exclusively. The other option is to RE the pdb file format and since a look at that indicates there are quite a few different versions I think it would be far too much work and leave my program vulnerable to changes that might come later or some version I didn't have. Unfortunately this means that I have to rethink my method to determine which section a symbol resides in and that might be a bit of extra detective work.

Yes, the PDB file format is a bit of a bugger, there are around 5 variants, and any given version of Microsoft's helper tools can't read some of them because they are not clearly identified. It's not well documented. The file is a self contained file system that permits incremental changes as the source code is updated and recompiled/linked. Some of the symbol and type information has roots in the CodeView format(s) so a familiarity with those is helpful.

Depending on how they were built you can break the file down at the library/object/source/function level, and source line number.

-Clive
Title: Re: Reading raw debug symbol data
Post by: donkey on April 01, 2010, 05:55:56 AM
Quote from: jj2007 on March 31, 2010, 01:31:17 PM
Quote from: donkey on March 31, 2010, 02:18:27 AM
Looking at Microsoft Link.exe, it seems that there is no way possible to have it include debug symbols in the executable so I have to go the PDB route for any MASM programs. That's a shame because I don't know how to extract section information from the symbol data (ie code, initialized data etc) . The link.exe command line option states here: (http://msdn.microsoft.com/en-us/library/xe4t6fc1%28v=VS.80%29.aspx)

Quote from: MSDNIt is not possible to create an .exe or .dll that contains debug information. Debug information is always placed in a .pdb file.


That is a surprising statement. Attached an example that seems to prove the contrary - unless I completely misunderstand the meaning of "debug information". I have renamed pdb to xpdb and ilk to xilk, but Olly still sees the symbolic names.
Built with /Zi for the assembler and /debug for the linker (version 5.12).
Hi JJ2007,

I tried to build with /Zi and /Debug and the information is not present in the build, only the absolute path to the PDB file (use a hex editor and look at the very end of the file). I was using link 5.12.8078.0 which is the only version I have since I don't use MASM much at all I have never upgraded it. You can change the extension of the PDB file without losing your symbols as the search path is set by trying to open that file, if it is not found then the DbgHelp API (which Olly uses) will attempt to find any file in that path that matches the specs for a PDB and contains debug information for the executable. Try deleting the file completely or better still check the IMAGE_NT_HEADERS.FileHeader.PointerToSymbolTable entry, you will find it is NULL.

Hi Clive,

Yes, I have 2 examples of MS PDB files that vary widely. The file is structured as you said like a file system with a number of streams that contain different information, there appears to be a codeview stream but I am not sure if the information matches the codeview spec and the shear complexity of RE'ing the file format is more than I am willing to do when the DbgHelp API can extract the information I need. I have pretty much moved past the symbol extraction since the dual method appears to work and have landed squarely in the procedure mapping part of the profiler, that looks like it will be mostly searching for patterns and taking some best guesses but they should be pretty accurate. Distinguishing data symbols in the code section will be much tougher though I am not sure I will need to do that for the profiler as the person running it will know the source of the symbol. I am also having problems finding information about single stepping a program, all of the information that I can find deals with checking an option in some MS tool, not how it is implemented.
Title: Re: Reading raw debug symbol data
Post by: clive on April 01, 2010, 02:51:17 PM
With LINK 5.12.8078 (in MASM32) you need both the /DEBUG and /DEBUGTYPE:COFF to get symbols within the file.

It's been decades since I wrote code to single step an 8086, there you generated an interrupt (INT x), and in the handler you would set the trace/single step bit (TF) in the stacked flags, the exiting IRET would then start stepping instructions via INT 1 (interrupt would be called after each instruction, IRET to run the next one).

How to achieve this on a Windows box is beyond my level of tolerance, I suspect the mechanism is much the same and can probably be done in a driver with perhaps some assistance from the OS hooks put in for WinDbg and SoftICE.

Thinks like VTune had static analysis modes with really good internal models of the processor, the dynamic modes used the internal performance counters. I'd have to check if my MSR dumping application still works in XP/Vista/Win7.

Not sure single stepping is good way to profile code, it is going to cause havoc with the pipelining. I would probably attack it with RDTSC in the prologue/epilog, do some course analysis to pin point the hot spots and then drill down further. I rarely use single stepping to debug things (ARM/MIPS w/JTAG), it's a method of last resort as I can usually read the code and understand the flow, or register usage. If I want to tune some code I'll pull it out and put it in a test harness and either use hardware based cycle counters, or timers.

-Clive
Title: Re: Reading raw debug symbol data
Post by: dedndave on April 01, 2010, 02:56:22 PM
here is an MSR tool, Clive

http://www.fileden.com/files/2008/3/3/1794507/MSR.zip

Attention: I am not responsible if someone ph_x up their machine with this program
Title: Re: Reading raw debug symbol data
Post by: donkey on April 01, 2010, 03:01:33 PM
Hi Clive,

I believe I can set single step through the eflags, in the profiled process I should be able to do it with the context structure passed to CREATE_PROCESS_DEBUG_EVENT but I have to wait til I get home to test it. As far as I can tell you just set the trace flag in the eflags register (0x100 - bit 8).

The single stepping is a way to track usage of data buffers which cannot be breakpointed, it will not be used for performance testing. Performance testing will be achieved by injecting a breakpoint and when it is reached writing the original data back before execution continues. A second breakpoint will signal the end of the profiled segment of code, when it is reached the breakpoint is restored.

Edgar
Title: Re: Reading raw debug symbol data
Post by: clive on April 01, 2010, 03:13:31 PM
Quote from: dedndave
here is an MSR tool, Clive

Had to blow the dust of this one. Well this still works in 32-bit XP, will try in Vista.

Edit : Well Vista doesn't like the dynamic injection of a driver.

-Clive
Title: Re: Reading raw debug symbol data
Post by: donkey on April 01, 2010, 03:41:03 PM
Hi clive,

Thanks very much. The program will be single stepped if you add the following to CREATE_PROCESS_DEBUG_EVENT:

mov D[context.ContextFlags],CONTEXT_CONTROL
invoke SuspendThread, [dbe.u.CreateProcessInfo.hThread]
invoke GetThreadContext,[dbe.u.CreateProcessInfo.hThread],offset context
or D[context.EFlags],0x100
invoke SetThreadContext,[dbe.u.CreateProcessInfo.hThread],offset context
invoke ResumeThread, [dbe.u.CreateProcessInfo.hThread]


The single step is handled in the EXCEPTION_SINGLE_STEP handler. A quick test here shows that the program goes into single step though it appears that I may have to reset the EFlags for each instruction. Not a terribly difficult thing to do but very time consuming, I will have to experiment a bit more when I get back to my dev box. I ran it on a Vista box without the OS objecting at all.

Edgar
Title: Re: Reading raw debug symbol data
Post by: jj2007 on April 01, 2010, 04:02:34 PM
Quote from: donkey on April 01, 2010, 05:55:56 AM
Hi JJ2007,

I tried to build with /Zi and /Debug and the information is not present in the build, only the absolute path to the PDB file (use a hex editor and look at the very end of the file). I was using link 5.12.8078.0 which is the only version I have since I don't use MASM much at all I have never upgraded it. You can change the extension of the PDB file without losing your symbols as the search path is set by trying to open that file, if it is not found then the DbgHelp API (which Olly uses) will attempt to find any file in that path that matches the specs for a PDB and contains debug information for the executable.

Actually, I was tricked by Olly:
When FileName.pdb is present, Olly loads it and creates a file named \masm32\OllyDbg\FileName.udd
Afterwards, you can delete FileName.pdb without losing the symbolic names.
Title: Re: Reading raw debug symbol data
Post by: brixton on June 27, 2010, 10:47:23 PM
Hi donkey,

Did you get any further with this?  I have completed the exact same task (parsing the Symbol Table in a COFF header) in order to find locations of variables once loaded into memory.  I am also single stepping through instructions.  This is to a completely different end to you - and I resorted to C, although it's pretty API and structure intensive, so not much different in the code!

Tom
Title: Re: Reading raw debug symbol data
Post by: donkey on June 28, 2010, 03:41:30 AM
Quote from: brixton on June 27, 2010, 10:47:23 PM
Hi donkey,

Did you get any further with this?  I have completed the exact same task (parsing the Symbol Table in a COFF header) in order to find locations of variables once loaded into memory.  I am also single stepping through instructions.  This is to a completely different end to you - and I resorted to C, although it's pretty API and structure intensive, so not much different in the code!

Tom

Hi Tom,

Yes I have completed the parsing of the symbol table and have also sorted through the sections etc.. I am currently working towards either using a prepackaged library for disassembly or writing my own, this is in order to help make a guess at procedure start and end. As for the single stepping, I haven't implemented that completely yet but it appears to be viable, there is another solution using page guard that I am hoping to try soon.

Edgar
Title: Re: Reading raw debug symbol data
Post by: brixton on June 28, 2010, 09:38:42 AM
Hi donkey,

Sounds promising!  The single stepping is not a problem - For my application, I set an int3 breakpoint on the first instruction of the target binary, and when I process this (not the initial Windows EXCEPTION_BREAKPOINT) I do a GetContext, do a bitwise inclusive-or on the trap flag in the EFLAGS register and then SetContext.  Unfortunately yes, after each EXCEPTION_SINGLE_STEP you need to GetContext, set the trap flag and then SetContext again.

I am actually using the BeaEngine library for disassembly.  You pass it the EIP register (or actually any pointer) and it returns a structure containing information about the first instruction it encounters.  It also gives you a nice string so you can just print the instruction out if need be.  BeaEngine comes with python, pascal, C, NASM/MASM/FASM/GoASM headers, I've been using it and it seems accurate so far..

Tom
Title: Re: Reading raw debug symbol data
Post by: ecube on June 28, 2010, 07:33:24 PM
nice find brixton, it's under the LGPL license which rocks, and it looks very clean and fast, + has 64bit and Goasm/Masm support :D even has a LDE version, wow dude you really hit the motherload here! most other good engines are crappy GPL and difficult to use with GoASM/MASM.
Title: Re: Reading raw debug symbol data
Post by: brixton on June 28, 2010, 08:19:00 PM
Hey ecube,

Yes, I did a lot of research on this subject - it does seem good, but I am yet to put it through more intensive tests  :bdg
Title: Re: Reading raw debug symbol data
Post by: donkey on June 29, 2010, 05:18:12 AM
Hi Brixton,

I have looked at BeaEngine, (as well as DiStorm but that was too language restrictive) a nice package with a very liberal license however there are some issues with GoAsm and the _Disasm@4 export (DLL version). Specifically it requires that you use the /mix switch and that has some adverse affects on the headers, Jeremy is looking at the bug. The lib file that comes with the distribution could not be read by GoAsm, the format was unrecognized. Right now I have almost decided on Drizz's disassembler, I haven't looked it through very deeply but the license is great and the author is known around the forums:

QuoteCopy-left FOREVER by drizz.  No rights reserved. 

All modules in this library are dedicated to the public domain by drizz.

Permission to use, copy, modify, reverse-engineer, crack,
patch and distribute this compilation for any purpose is hereby granted.

Gotta love it !

Edgar
Title: Re: Reading raw debug symbol data
Post by: ecube on July 02, 2010, 04:34:54 AM
donkey I got BeaEngine's goasm staticlib example to assemble fine. just added

#dynamiclinkfile msvcrt.dll to the source and assembled with


set INCLUDE=C:\GoAsm\include
\GoAsm\bin\GoAsm /x86 example.asm
\GoAsm\bin\GoLink /console /fo example.exe example.obj
pause
Title: Re: Reading raw debug symbol data
Post by: brixton on July 02, 2010, 01:34:49 PM
Incidentally, I am trying to tease out the length (size) of a symbol, if it is statically linked data.  Does anyone know how to do this?  I can find the location, but the size eludes me..
Title: Re: Reading raw debug symbol data
Post by: ecube on July 02, 2010, 02:35:04 PM
BeaEngine also has a 32bit/64bit len dissembler on their site, it's still crashing for me, but you can use it to read certain number of bytes and it'll give back the instructions len.
Title: Re: Reading raw debug symbol data
Post by: brixton on July 02, 2010, 03:50:11 PM
I actually mean variable (included in symbol table) lengths, not instruction lengths.  eg. if I have a global variable:

someString BYTE 5 DUP(?)

I can find the location of someString in the data section of memory (garnered from the symbol table), but I don't know how to find its length of 5.
Title: Re: Reading raw debug symbol data
Post by: ecube on July 02, 2010, 06:41:06 PM
if it's your own code you can use sizeof for a lot of things, if it's another processes code you can use heap32first etc...to walk through alloc'd heap memory(globalloc,heapalloc etc...) or virtualquery to walk memory pages ingeneral and get sizes.

for static libs, you can use the /l switch with goasm, to generate a listing, i'm not sure how much help it'd be in that case but it may. if your static lib has debug symbols you can load your assembled program in olly and it'll give you info, aswell as the option to load the source code in, so it gives you line by line.
Title: Re: Reading raw debug symbol data
Post by: ecube on July 02, 2010, 07:26:27 PM
Donkey,

Icezlion does what you want here http://win32assembly.online.fr/tut30.html but he says single stepping large programs can take 10 mins, wtf!
Title: Re: Reading raw debug symbol data
Post by: drizz on July 02, 2010, 09:03:00 PM
Quote from: brixton on July 02, 2010, 03:50:11 PM
I actually mean variable (included in symbol table) lengths, not instruction lengths.  eg. if I have a global variable:

someString BYTE 5 DUP(?)

I can find the location of someString in the data section of memory (garnered from the symbol table), but I don't know how to find its length of 5.

You can't. You could guess the size by deducting label offsets (next-this). But there could be "align" directive for example which would add to size.

VCx0.pdb could have this info (i'm not sure) but only for c/c++.

Here's an old obj2asm utility i made. If you test it you can see that it just dumps the bytes no analysis on the data.



Title: Re: Reading raw debug symbol data
Post by: brixton on July 02, 2010, 09:16:24 PM
Quote from: drizz on July 02, 2010, 09:03:00 PM
Quote from: brixton on July 02, 2010, 03:50:11 PM
I actually mean variable (included in symbol table) lengths, not instruction lengths.  eg. if I have a global variable:

someString BYTE 5 DUP(?)

I can find the location of someString in the data section of memory (garnered from the symbol table), but I don't know how to find its length of 5.

You can't. You could guess the size by deducting label offsets (next-this). But there could be "align" directive for example which would add to size.

VCx0.pdb could have this info (i'm not sure) but only for c/c++.

Here's an old obj2asm utility i made. If you test it you can see that it just dumps the bytes no analysis on the data.

Thanks drizz, I didn't think there was a way (as far as I could tell).  Looks like I'm going to have to resort to guessing (educated guess).

Quote from: E^cube on July 02, 2010, 07:26:27 PM
Donkey,

Icezlion does what you want here http://win32assembly.online.fr/tut30.html but he says single stepping large programs can take 10 mins, wtf!

Not surprised by that, on the old hardware of the time.  Especially worse if you're printing something after each instruction.  I single-stepped theGUN.exe (from the MASM32 package), printing each instruction as it went.  It took about 150k instructions before a window started appearing (30 seconds or so on my machine).  Once loaded the only instructions processed are from GetMessage, like when my mouse moves in/out or it needs to repaint.
Title: Re: Reading raw debug symbol data
Post by: clive on July 02, 2010, 10:09:57 PM
Pretty sure there isn't a way to do strings, the TYPES data can be used to infer details of the fields within a RECORD/STRUCTURE

From the PDB file

sn     iBlk   Size     Stamp    Module
000A : 0029 : 0000019C 0037BA68 test26.obj
iBlk Blk  FileOffs Size
0029 0013 00004C00  19C

sstAlignSym - Module Symbols (Size 0120)

00000000: 0012 0009 S_OBJNAME        00000001 test26.obj
00000014: 0036 0001 S_COMPILE        00000303 Intel 80386 MASM Microsoft (R) Macro Assembler Version 6.15.8803
0000004C: 001A 1007 S_LDATA32_VS97   00000022     3.00000002 g_ArrayPtr
00000068: 000E 1003 S_UDT_VS97       00001001 Array
00000078: 0016 1007 S_LDATA32_VS97   00000020     3.00000006 String
00000090: 002E 100A S_LPROC32_VS97      1.00000000[0000001B] $$$00001
000000C0: 0002 0006 S_END
000000C4: 0012 0209 S_LABEL32                     1.00000000 Start
000000D8: 0012 1003 S_UDT_VS97       00001005 c_msvcrt
000000EC: 0016 1007 S_LDATA32_VS97   00001001     3.00000000 g_Array
00000104: 001A 1007 S_LDATA32_VS97   00000020     3.00000012 SomeString

11 Symbol(s)

sstSrcModule - Line Numbering (Size 0058)

cFile = 0001, cSeg = 0001
    1.00000000->0000001A
00000014 cSeg = 0001, pad = 0000, Name = test26.asm
   00000030    1.00000000->0000001A
               1.00000000    27
               1.00000000    27
               1.00000006    28
               1.00000008    32
               1.0000000B    34
               1.0000001A    36


Here SomeString (DB 5 DUP (?)) has a TYPE of 0x20 which is basically T_UCHAR. No size/length is inferred within the CodeView/PDB data. You'd just have to sort the symbols, and the object boundaries, and compute the total space taken up including any alignment that's thrown in by the compiler and/or linker.

From the OBJ File, the .debug$S (CodeView Symbols)

03  .debug$S Virtual Address         00000000
Physical Address        00000032
Raw Data Offset         0000015C
Raw Data Size           00000102
Relocation Offset       0000025E
Relocation Count        000C
Line Number Offset      00000000
Line Number Count       0000
Characteristics         42100040
Initialized Data
1  Byte Align
Discardable
Readable

00000053 0000000E 000B (SECREL  )    2.00000002 g_ArrayPtr
00000057 0000000E 000A (SECTION )    2.00000002 g_ArrayPtr
00000076 0000000F 000B (SECREL  )    2.00000006 String
0000007A 0000000F 000A (SECTION )    2.00000006 String
000000A1 00000012 000B (SECREL  )    1.00000000 _$$$00001@0
000000A5 00000012 000A (SECTION )    1.00000000 _$$$00001@0
000000BB 00000010 000B (SECREL  )    1.00000000 _Start
000000BF 00000010 000A (SECTION )    1.00000000 _Start
000000DB 00000011 000B (SECREL  )    2.00000000 g_Array
000000DF 00000011 000A (SECTION )    2.00000000 g_Array
000000EF 00000019 000B (SECREL  )    2.00000012 SomeString
000000F3 00000019 000A (SECTION )    2.00000012 SomeString

0011 - 0009 S_OBJNAME        test26.obj

00000000: 11 00 09 00 01 00 00 00 - 0A 74 65 73 74 32 36 2E  .........test26.
00000010: 6F 62 6A                                           obj

0036 - 0001 S_COMPILE        Microsoft (R) Macro Assembler Version 6.15.8803

00000000: 36 00 01 00 03 03 00 00 - 2F 4D 69 63 72 6F 73 6F  6......./Microso
00000010: 66 74 20 28 52 29 20 4D - 61 63 72 6F 20 41 73 73  ft (R) Macro Ass
00000020: 65 6D 62 6C 65 72 20 56 - 65 72 73 69 6F 6E 20 36  embler Version 6
00000030: 2E 31 35 2E 38 38 30 33                            .15.8803

0015 - 0201 S_LDATA32        T_ULONG g_ArrayPtr

00000000: 15 00 01 02 00 00 00 00 - 00 00 22 00 0A 67 5F 41  .........."..g_A
00000010: 72 72 61 79 50 74 72                               rrayPtr

000A - 0004 S_UDT            1003 Array

00000000: 0A 00 04 00 03 10 05 41 - 72 72 61 79              .......Array

0011 - 0201 S_LDATA32        T_UCHAR String

00000000: 11 00 01 02 00 00 00 00 - 00 00 20 00 06 53 74 72  .......... ..Str
00000010: 69 6E 67                                           ing

002C - 0204 S_LPROC32        1005 $$$00001

00000000: 2C 00 04 02 00 00 00 00 - 00 00 00 00 00 00 00 00  ,...............
00000010: 1B 00 00 00 00 00 00 00 - 1B 00 00 00 00 00 00 00  ................
00000020: 00 00 05 10 00 08 24 24 - 24 30 30 30 30 31        ......$$$00001

0002 - 0006 S_END

00000000: 02 00 06 00                                        ....

000F - 0209 S_LABEL32        Start

00000000: 0F 00 09 02 00 00 00 00 - 00 00 00 05 53 74 61 72  ............Star
00000010: 74                                                 t

000D - 0004 S_UDT            1000 c_msvcrt

00000000: 0D 00 04 00 00 10 08 63 - 5F 6D 73 76 63 72 74     .......c_msvcrt

0012 - 0201 S_LDATA32        1003 g_Array

00000000: 12 00 01 02 00 00 00 00 - 00 00 03 10 07 67 5F 41  .............g_A
00000010: 72 72 61 79                                        rray

0015 - 0201 S_LDATA32        T_UCHAR SomeString

00000000: 15 00 01 02 00 00 00 00 - 00 00 20 00 0A 53 6F 6D  .......... ..Som
00000010: 65 53 74 72 69 6E 67                               eString


Total size of .data object record is 0x17 (2.00000000 .. 2.00000016) , with SomeString starting at 0x12 (2.00000012)

02  .data    Virtual Address         00000000
Physical Address        0000001B
Raw Data Offset         0000013A
Raw Data Size           00000017
Relocation Offset       00000152
Relocation Count        0001
Line Number Offset      00000000
Line Number Count       0000
Characteristics         C0300040
Initialized Data
4  Byte Align
Readable
Writeable


Title: Re: Reading raw debug symbol data
Post by: brixton on July 03, 2010, 09:47:26 AM
Hi clive,

Yes, so using the string's location in the data section, along with the locations of other data in there could give me an indication of its length (but that is all, it can't be 100% reliable, right?).
Title: Re: Reading raw debug symbol data
Post by: clive on July 04, 2010, 11:41:22 AM
Quote from: brixton
Yes, so using the string's location in the data section, along with the locations of other data in there could give me an indication of its length (but that is all, it can't be 100% reliable, right?).

That's pretty much it. The assembler knows this information but it doesn't export it in the object file, or listing for that matter. The debug records include the name, address, and type information. So you would know if it was an array of BYTE, WORD, or some custom RECORD/STRUCTURE, but you'd have to confine the symbol between other symbols, or the end of a section within the object file.
Title: Re: Reading raw debug symbol data
Post by: brixton on July 04, 2010, 03:27:55 PM
I'm trying to compile 'ed' (from cygwin) with symbol/debugging information.  I added the -g -gcoff switches to gcc, and indeed I get a (massive) symbol table - but I don't get most of the global variables listed in there at all.  Very strange.

From main.c:


static const char * invocation_name = 0;
static const char * const Program_name    = "GNU Ed";
static const char * const program_name    = "ed";
static const char * const program_year    = "2008";

static char _restricted = 0; /* invoked as "red" */
static char _scripted = 0; /* if set, suppress diagnostics */
static char _traditional = 0; /* if set, be backwards compatible */


I can find invocation_name in the symbol table, but cannot find any of the others at all, even using the famous PEDUMP.exe.  Stumped!
Title: Re: Reading raw debug symbol data
Post by: clive on July 04, 2010, 03:44:06 PM
"static" means that it is local to the object in question, so it isn't going to be exported to pollute the namespace.
Title: Re: Reading raw debug symbol data
Post by: brixton on July 04, 2010, 03:55:18 PM
Interesting - so how come invocation_name is included?
edit:  I removed the 'static const' keywords (eg. char * const Program_name    = "GNU Ed";) and it still doesn't export them..
Title: Re: Reading raw debug symbol data
Post by: beatrix on July 12, 2010, 06:28:49 PM
Hi,

Just an answer to donkey. He says :

QuoteI have looked at BeaEngine, (as well as DiStorm but that was too language restrictive) a nice package with a very liberal license however there are some issues with GoAsm and the _Disasm@4 export (DLL version). Specifically it requires that you use the /mix switch and that has some adverse affects on the headers, Jeremy is looking at the bug. The lib file that comes with the distribution could not be read by GoAsm, the format was unrecognized

that's true. With the last version (4.0) of BeaEngine, it is not possible to link the lib to goasm program. It is quite annoying :) In fact, BeaEngine 4.0 is now compiled with MingW (gcc). I don't know why, GoAsm does not like lib built with gcc. I just try to use a version compiled with PellesC (original compiler used for BeaEngine) and now, GoAsm is linking properly (and you don't need to use /mix option !). Just try :)

Title: Re: Reading raw debug symbol data
Post by: donkey on July 13, 2010, 03:25:08 AM
Thanks beatrix,

I will definitely take a look on the weekend.

Edgar
Title: Re: Reading raw debug symbol data
Post by: dedndave on July 13, 2010, 08:54:54 AM
very cool code, Beatrix - i may put it to use   :U
oh - and nice to have you visit us
Title: Re: Reading raw debug symbol data
Post by: donkey on July 13, 2010, 09:42:07 PM
Hi Beatrix,

Has the Disasm function been changed to C call from STDCALL ? It seems when I loop through and decode 10 instructions ESP is offset by 40 bytes. The code for the disassembly is :

The code section is read directly from a PE file and decoded from there (pMem points the Global buffer that holds the target code)
mov D[usedInstructionsCount],0

// Zero the _Disasm structure
mov ecx,SIZEOF _Disasm
lea edi,DisasmStruct
xor eax,eax
rep stosb

mov eax,[pMem]
mov [DisasmStruct.EIP],eax
mov eax,[dwAddress]
mov [DisasmStruct.VirtualAddr],eax
mov D[DisasmStruct.Options],GoAsmSyntax

mov ebx,[cbSize]

:
invoke beaengine.lib:Disasm,offset DisasmStruct
add esp,4
push eax

invoke wsprintf,offset OutputLine,offset CodeFormat,[DisasmStruct.VirtualAddr],offset DisasmStruct.CompleteInstr
add esp,16

pop eax
add [DisasmStruct.EIP],eax
sub ebx,eax
inc D[usedInstructionsCount]
add [DisasmStruct.VirtualAddr],eax

invoke SendMessage,[hwnd],EM_SETSEL,-1,-1
invoke SendMessage,[hwnd],EM_REPLACESEL,FALSE,offset OutputLine
cmp ebx,0
jg <

Title: Re: Reading raw debug symbol data
Post by: beatrix on July 14, 2010, 06:59:18 PM
Has the Disasm function been changed to C call from STDCALL

gloops. No, the lib released on www.beaengine.org uses stdcall convention. The one I gave you in this forum is using cdecl. Sorry for that, I just forgot to specify the desired output and by default, the compiled lib uses cdecl.

@dedndave : thanks :)
Title: Re: Reading raw debug symbol data
Post by: donkey on July 15, 2010, 03:08:41 AM
No problem beatrix,

I just wanted to make sure I was using it correctly, I will set it up as a C call. Nice lib by the way, fast and accurate.

Edgar
Title: Re: Reading raw debug symbol data
Post by: ecube on July 15, 2010, 03:38:41 AM
donkey can you explain why wsprintf seems to work fine with GoASM on 32bit not using your cdecl macro?
Title: Re: Reading raw debug symbol data
Post by: donkey on July 15, 2010, 05:35:12 AM
Quote from: E^cube on July 15, 2010, 03:38:41 AM
donkey can you explain why wsprintf seems to work fine with GoASM on 32bit not using your cdecl macro?

Hi E^cube,

Not sure what you mean, it has always worked however you have to adjust ESP directly by calculating the bytes pushed (4 * #parameters), the CInvoke macro calculates the amount to adjust ESP for you but it expands to exactly the same thing that I did manually.

Edgar
Title: Re: Reading raw debug symbol data
Post by: ecube on July 15, 2010, 08:42:45 AM
yeah I meant it worked without adjusting anything, I suppose I should just use the macro :)
Title: Re: Reading raw debug symbol data
Post by: ecube on July 20, 2010, 07:21:24 AM
beatrix your lib is weird, it doesnt display the instructions string correctly :\ idk why. you mind recompiling the latest versions or giving directions on how to?

Update I just compiled to a lib myself using visual studio 6.0 and its definitely not outputting correct strings :( is missings mov, jmp etc... some strings are even empty,ugh.
Title: Re: Reading raw debug symbol data
Post by: donkey on July 21, 2010, 02:19:53 AM
Hi E^cube,

Could you post an example project and detail the lines that have problems, I have used the lib in GoP but now I'm worried that the disassembly might have problems.

Edgar
Title: Re: Reading raw debug symbol data
Post by: ecube on July 21, 2010, 06:15:40 AM
Donkey,

here's an example of the latest lib that I compiled myself, aswell as the old lib found before on the site(that worked). you can try beatrix's lib too, neither output the write strings.

also I don't know why it's messing up, i'll have to try and find the reason in the c code.
Title: Re: Reading raw debug symbol data
Post by: donkey on July 23, 2010, 12:26:30 AM
Hi E^cube,

I'm in BC for the week but should be home Sunday, I will take a look at it then.
Title: Re: Reading raw debug symbol data
Post by: beatrix on July 23, 2010, 08:23:17 PM
Hi E^Cube,

I think it is only a problem of compiler. Last version of BeaEngine are compiled with gcc and GoAsm can't use these versions. Just try to use the lib in the joined archive, it works fine.
Title: Re: Reading raw debug symbol data
Post by: ecube on July 24, 2010, 12:53:56 AM
Quote from: beatrix on July 23, 2010, 08:23:17 PM
Hi E^Cube,

I think it is only a problem of compiler. Last version of BeaEngine are compiled with gcc and GoAsm can't use these versions. Just try to use the lib in the joined archive, it works fine.

Thanks beatrix, I appreciate all the hardwork you've put into BeaEngine, it really is amazing. Even more amazing is the license, most people just use regular GPL which makes their project completely useless to me, you on the other hand took the more intelligent, considerate route. I commend you. I hope you consider putting up a donation button on your site  :thumbu
Title: Re: Reading raw debug symbol data
Post by: brixton on August 04, 2010, 03:38:54 PM
Hi all,

I don't know if you've found this, but with different linkers comes different symbol table usage.  For instance, my main gripe is with Cygwin's GCC (may be the same in the original GCC) where the 'Value' field of an IMAGE_SYMBOL structure does not seem to give a value which is an offset from the image base, but rather from the start of a data section (ie. a value of 0x20 rather than something in the thousands).  This is confusing my symbol table parser and I cannot see an obvious way around it..

Tom
Title: Re: Reading raw debug symbol data
Post by: ecube on August 04, 2010, 05:42:24 PM
Quote from: brixton on August 04, 2010, 03:38:54 PM
Hi all,

I don't know if you've found this, but with different linkers comes different symbol table usage.  For instance, my main gripe is with Cygwin's GCC (may be the same in the original GCC) where the 'Value' field of an IMAGE_SYMBOL structure does not seem to give a value which is an offset from the image base, but rather from the start of a data section (ie. a value of 0x20 rather than something in the thousands).  This is confusing my symbol table parser and I cannot see an obvious way around it..

Tom

OffsetToRva function should fix that.
Title: Re: Reading raw debug symbol data
Post by: brixton on August 17, 2010, 04:28:24 PM
Quote from: E^cube on August 04, 2010, 05:42:24 PM
Quote from: brixton on August 04, 2010, 03:38:54 PM
Hi all,

I don't know if you've found this, but with different linkers comes different symbol table usage.  For instance, my main gripe is with Cygwin's GCC (may be the same in the original GCC) where the 'Value' field of an IMAGE_SYMBOL structure does not seem to give a value which is an offset from the image base, but rather from the start of a data section (ie. a value of 0x20 rather than something in the thousands).  This is confusing my symbol table parser and I cannot see an obvious way around it..

Tom

OffsetToRva function should fix that.

Hi E^cube,

Where is this function?
Title: Re: Reading raw debug symbol data
Post by: donkey on August 18, 2010, 01:41:55 AM
Hi Brixton,

Not sure about the OffsetToRva function, but there are many DbgHelp functions and I find ImageNtHeader / ImageRvaToVa very useful when reading PE files. I think its what you're looking for but if you need to go in the other direction SymFromAddr works just fine as well but requires a bit more setup.

http://msdn.microsoft.com/en-us/library/ms679291%28v=VS.85%29.aspx

This code is in GoAsm format and will enumerate symbols but it demonstrates how to use the symbol table and the Sym... functions:

EnumerateSymbols FRAME hFile, hProcess, ImageBase
LOCAL fsh:%DWORD32
LOCAL fsl:%DWORD32
LOCAL ProcessPath[2048]:%CHAR
LOCAL ihmod64:IMAGEHLP_MODULE64

mov D[ProcessPath],0

invoke SetLastError,0
invoke GetProcessImageFileName ,[hProcess],offset ProcessPath,MAX_PATH

invoke GetFileSize,[hFile],offset fsh
mov [fsl],eax

invoke SymInitialize,[hProcess], offset ProcessPath, FALSE

invoke SymLoadModuleEx,[hProcess],[hFile],offset ProcessPath,NULL,[ImageBase],0,[fsl],0,0

push eax,edx

mov D[ihmod64.SizeOfStruct],SIZEOF IMAGEHLP_MODULE64
invoke SymGetModuleInfo64,[hProcess],[ImageBase],0,offset ihmod64

cmp D[ihmod64.SymType],SymNone
je >>.NOSYMBOLS

pop edx,eax
invoke SymEnumSymbols,[hProcess],eax,edx,"*",offset SymEnumSymbolsProc, [hProcess]

invoke SymUnloadModule64,[hProcess],[ImageBase],0

invoke SymCleanup,[hProcess]
xor eax,eax
RET

.NOSYMBOLS
xor eax,eax
dec eax
ret
ENDF

SymEnumSymbolsProc FRAME pSymInfo, SymbolSize, UserContext

...

.CONTINUE
mov eax,TRUE
RET
ENDF

Title: Re: Reading raw debug symbol data
Post by: brixton on August 18, 2010, 07:58:47 AM
Hi donkey,

Thanks a lot for the reply, I will bare this in mind (and perhaps implement it, if I have time.. the deadline is fast approaching).  Yesterday I found the cause of the problem:  the .bss section:

Quote from: wiki... the bss section typically includes all uninitialized variables declared at the file level (i.e., outside of any function) as well as uninitialized local variables declared with the static keyword. An implementation may also assign statically-allocated variables initialized with a value consisting solely of zero-valued bits to the bss section.

Hence, all of the global variables were actually present at (symbol table.Value+ImageBase+offset of .bss section) - this was found with some detective work and olly.

Tom
Title: Re: Reading raw debug symbol data
Post by: ecube on August 18, 2010, 10:45:20 AM
Donkey why did you use %DWORD32? according to your definition it's just a structure defined as DD which doesn't change depending on 34bit or 64bit? %INT_PTR appears to change though. Also here are some mov/cmp macros you can use for 32bit/64bit to auto convert.


%MOV(%DESTN,%SOURCE) MACRO
#IFNDEF WIN64
mov D[%DESTN],%SOURCE
#ELSE
mov Q[%DESTN],%SOURCE
#ENDIF
ENDM

%CMP(%DESTN,%SOURCE) MACRO
#IFNDEF WIN64
cmp D[%DESTN],%SOURCE
#ELSE
cmp Q[%DESTN],%SOURCE
#ENDIF
ENDM




%MOV(myvar,53)
%CMP(myvar,54)
jne >
Title: Re: Reading raw debug symbol data
Post by: donkey on August 18, 2010, 10:55:52 PM
Hi E^cube,

It is a very early draft and I never noticed, nothing more sinister than that :) But I believe that file size high (fsh) and file size low (fsl) remain DWORDs even when in 64 bit mode:

DWORD WINAPI GetFileSize(
  __in       HANDLE hFile,
  __out_opt  LPDWORD lpFileSizeHigh
);


Edgar