I need to obtain the symbols from a GoAsm executable file and have been trying to figure out how to decode the raw symbol table data. Finding the data is simple enough but I can't figure out the header information. As far as I know, and I could be wrong, the symbols are stored in IMAGE_SYMBOL structures, if the symbol name is longer than 8 bytes the union points to an offset in the symbol name table. That would mean that these offsets or names would be found every 18 bytes (SIZEOF IMAGE_SYMBOL) but the first recognizable symbol is not at an even multiple of 18 so there has to be some header data at the beginning of the raw data. Below is the definition from winnt.h for the IMAGE_SYMBOL structure and a dump of the raw symbol data from my test program.
In order to obtain a pointer to the symbol data I used the following:
// Get a pointer to IMAGE_NT_HEADERS
mov eax,[pMapFile]
mov edi,[eax+IMAGE_DOS_HEADER.e_lfanew]
add edi,[pMapFile]
// Store the offset of the DataDirectory in EBX
lea ebx,[edi+IMAGE_NT_HEADERS.OptionalHeader.DataDirectory]
// Get the debug symbols entry (7th entry base 0)
// SIZEOF IMAGE_DATA_DIRECTORY is always 8 BYTEs
mov eax,6
shl eax,3
add ebx,eax
// EBX now contains the IMAGE_DATA_DIRECTORY structure for the symbol table
// Convert the RVA to a file position
invoke RVAToFilePos,[pMapFile],[ebx+IMAGE_DATA_DIRECTORY.VirtualAddress]
mov esi,[pMapFile]
add esi,eax
mov eax,[ebx+IMAGE_DATA_DIRECTORY.Size]
mov ecx,SIZEOF IMAGE_DEBUG_DIRECTORY
xor edx,edx
div ecx
// EAX contains the number of IMAGE_DEBUG_DIRECTORY entries
// ESI now points to the IMAGE_DEBUG_DIRECTORY entry for the PE
// get a pointer to the raw data in the file
mov ebx,[esi+IMAGE_DEBUG_DIRECTORY.SizeOfData]
lea eax,[esi+IMAGE_DEBUG_DIRECTORY.PointerToRawData]
mov esi,[eax]
add esi,[pMapFile]
// ESI contains a memory pointer to the base of the raw data
// EBX contains the size of the data
IMAGE_SYMBOL STRUCT
UNION
ShortName DB 8 DUP
Name STRUCT
Short DD
Long DD
ENDS
LongName DD 2 DUP
ENDUNION
Value DD
SectionNumber DW
Type DW
StorageClass DB
NumberOfAuxSymbols DB
ENDS
00380E00: 0B 00 00 00-20 00 00 00-00 00 00 00-00 00 00 00 .... ...........
00380E10: 00 10 00 00-00 12 00 00-00 20 00 00-00 28 00 00 ......... ...(..
00380E20: 00 00 00 00-04 00 00 00-50 20 00 00-02 00 00 00 ........P ......
00380E30: 02 00 73 7A-54 65 73 74-00 00 54 20-00 00 02 00 ..szTest..T ....
00380E40: 00 00 02 00-53 54 41 52-54 00 00 00-00 10 00 00 ....START.......
00380E50: 01 00 00 00-02 00 00 00-00 00 0E 00-00 00 00 20 ...............
00380E60: 00 00 02 00-00 00 02 00-00 00 00 00-1D 00 00 00 ................
00380E70: 00 20 00 00-02 00 00 00-02 00 00 00-00 00 31 00 . ............1.
00380E80: 00 00 1C 20-00 00 02 00-00 00 02 00-44 6C 67 50 ... ........DlgP
00380E90: 72 6F 63 00-E2 10 00 00-01 00 00 00-02 00 00 00 roc.â...........
00380EA0: 00 00 45 00-00 00 F0 10-00 00 01 00-00 00 02 00 ..E...ð.........
00380EB0: 00 00 00 00-58 00 00 00-14 11 00 00-01 00 00 00 ....X...........
00380EC0: 02 00 00 00-00 00 65 00-00 00 FB 10-00 00 01 00 ......e...û.....
00380ED0: 00 00 02 00-00 00 00 00-76 00 00 00-0B 11 00 00 ........v.......
00380EE0: 01 00 00 00-02 00 86 00-00 00 68 49-6E 73 74 61 ......†...hInsta
00380EF0: 6E 63 65 00-45 78 63 65-70 74 69 6F-6E 41 72 67 nce.ExceptionArg
00380F00: 73 31 00 45-78 63 65 70-74 69 6F 6E-41 72 67 73 s1.ExceptionArgs
00380F10: 31 2E 4E 61-6D 65 00 45-78 63 65 70-74 69 6F 6E 1.Name.Exception
00380F20: 41 72 67 73-31 2E 74 79-70 65 00 44-6C 67 50 72 Args1.type.DlgPr
00380F30: 6F 63 2E 57-4D 5F 43 4F-4D 4D 41 4E-44 00 44 6C oc.WM_COMMAND.Dl
00380F40: 67 50 72 6F-63 2E 45 58-49 54 00 44-6C 67 50 72 gProc.EXIT.DlgPr
00380F50: 6F 63 2E 57-4D 5F 43 4C-4F 53 45 00-44 6C 67 50 oc.WM_CLOSE.DlgP
00380F60: 72 6F 63 2E-44 45 46 50-52 4F 43 00- roc.DEFPROC.
As you can see the first easily recognized symbol is szTest, using that as a jumping off point it appears that the 18 byte rule holds but since it is at offset 50 in the data there must be a 32 byte header which might indicate that the 2nd DWORD in the data points to the first symbol. To verify that the structure seems correct, I look at the START symbol and count ahead 18 bytes, I find the value 00 00-00 00 0E 00-00 00. If I take the start of the names section to be the 0x86 (†) which seems to be right as it is the number of bytes in the names section, I arrive at ExceptionArgs1 exactly 0E (IMAGE_SYMBOL.Name.Long) into the names, which would appear to be correct. Counting backward 18 bytes from START I find the data 00 00 00 00-04 00 00 00, the IMAGE_SYMBOL.Name.Long offset of hInstance, the first symbol in the table, so with multiple verifications I can be pretty confident that I have chosen the right struct to decode the entries.
So, does anyone know the structure of the data in the 32 byte header, mainly I am looking to obtain the offset to the names section, in this case 0xE6 or maybe an RVA ?
Edgar
Mmmmm, IMAGE_COFF_SYMBOLS_HEADER might be it.
EDIT:
Yup, that's the answer to my question. IMAGE_COFF_SYMBOLS_HEADER is 32 bytes long and the second parameter is LvaToFirstSymbol which is what I expected it to be. The lva to the names section is not given but is easily calculated using the following:
mov eax,[esi+IMAGE_COFF_SYMBOLS_HEADER.NumberOfSymbols]
mov ecx, SIZEOF IMAGE_SYMBOL
mul ecx
EAX holds the offset to the names section from the first symbol entry. Now just to find their addresses and values :)
this should help, Edgar :U
http://www.masm32.com/board/index.php?topic=13135.0
Thanks Dave,
It looks like it will help alot, I have spent a few hours reverse engineering the PE file so I could get the raw symbol data. I can already read the data and now understand much of it but it should help (I hope) to assign relocations to the symbols so I can find them in the executable by address. All a part of my grand idea for a profiling tool :) An exceedingly interesting project.
Edgar
it does sound interesting
i have thought some about the idea myself
although, i am a total n00b at windows gui apps - lol
but, i have been trying to do some of the low-level stuff like ID'ing processors (incl frequency) and OS's, etc
Hutch has detailed available memory and disk space for us
Michael has given us some insight to timing
but, to get into ring 0 and use the performance counters is the next big step - a little over my head :bg
someplace, i found a nice link that might be helpful - let me see if i can find it...
...well - this is one - i thought i had a more interesting one - maybe i saved the whole url page - i will keep my eyes open for it
http://perfinsp.sourceforge.net/
Thanks Dave, not too worried about accurate timing yet, just have to get the address of the code symbols figured out. BTW the above code was the long way to do it, once you figure everything out and are sure it boils down to this:
invoke Dbghelp.dll:ImageNtHeader,[pMapFile]
mov edi,eax
mov ebx,[edi+IMAGE_NT_HEADERS.FileHeader.PointerToSymbolTable]
add ebx,[pMapFile]
mov eax,[edi+IMAGE_NT_HEADERS.FileHeader.NumberOfSymbols]
mov esi,eax
mov ecx,SIZEOF IMAGE_SYMBOL
mul ecx
mov edi,eax
add edi,ebx
EDI contains the memory address of the names
EBX contains the base address of the array of IMAGE_SYMBOL structures
ESI contains the number of entries
Pretty easy to decode after that.
Edgar
well - the link i remember (or don't remember) had a lot of great info about programming the performance regsiters
it was over my head, of course, but i found it interesting
The COFF debug data isn't particularly rich with detail, you'll have to pull any section and relocation information out of the PE structure itself.
The CodeView and PDB files have significantly more information, but again you'd have to leverage data from the PE file as a whole. There is some pre-link segmentation data that comes from the object files, and this gets condensed into the PE sections, via a segment table. Another area of complication with Microsoft system files is the use of post-link optimization mapping (OMAP) where stuff is moved around, and the debug information doesn't get updated.
DumpPE should give a rough translation of the debug data within the file.
-Clive
CodeView and PDB are not an option, GoAsm embeds debug symbols in the file, I am currently trying to get the DbgHelp API functions working but not having much luck enumerating the symbols with SymEnumSymbols
The call from the CREATE_PROCESS_DEBUG_EVENT handler:
invoke EnumerateSymbols,[dbe.u.CreateProcessInfo.hProcess],[dbe.u.CreateProcessInfo.lpBaseOfImage]
My symbol enumerator:
EnumerateSymbols FRAME hProcess, ImageBase
LOCAL SymInfo:SYMBOL_INFO
LOCAL ProcessPath[MAX_PATH]:%CHAR
mov D[ProcessPath],0
invoke SetLastError,0
invoke GetProcessImageFileName ,[hProcess],offset ProcessPath,MAX_PATH
invoke SymInitialize,[hProcess], offset ProcessPath, FALSE
// BaseOfDll is a 64 bit number passed as 2 DWORDS ([ImageBase],0)
invoke SymEnumSymbols,[hProcess],[ImageBase],0,0,offset SymEnumSymbolsProc, NULL
invoke SymCleanup,[hProcess]
RET
ENDF
SymEnumSymbolsProc FRAME pSymInfo, SymbolSize, UserContext
mov eax,[pSymInfo]
add eax,SYMBOL_INFO.Name
PrintStringByAddr(eax)
mov eax,TRUE
RET
ENDF
SymInitialize, SymEnumSymbols and SymCleanup all return TRUE (successful) however the enumeration function is not called, it leads me to believe that the DbgApi does not recognize the symbol data. I am not sure what is required but I am trying a few different things.
Edgar
Well, I have it working, sort of anyway:
The call from the CREATE_PROCESS_DEBUG_EVENT handler:
invoke EnumerateSymbols,[dbe.u.CreateProcessInfo.hFile],[dbe.u.CreateProcessInfo.hProcess],[dbe.u.CreateProcessInfo.lpBaseOfImage]
EnumerateSymbols FRAME hFile, hProcess, ImageBase
LOCAL fsh:%DWORD32
LOCAL fsl:%DWORD32
LOCAL ProcessPath[MAX_PATH]:%CHAR
mov D[ProcessPath],0
invoke SetLastError,0
invoke GetProcessImageFileName ,[hProcess],offset ProcessPath,MAX_PATH
invoke GetFileSize,[hFile],offset fsh
mov [fsl],eax
invoke SymInitialize,[hProcess], offset ProcessPath, FALSE
invoke SymLoadModuleEx,[hProcess],[hFile],offset ProcessPath,NULL,[ImageBase],0,[fsl],0,0
invoke SymEnumSymbols,[hProcess],[ImageBase],0,"*",offset SymEnumSymbolsProc, NULL
invoke SymUnloadModule64,[hProcess],[ImageBase],0
invoke SymCleanup,[hProcess]
RET
ENDF
SymEnumSymbolsProc FRAME pSymInfo, SymbolSize, UserContext
mov eax,[pSymInfo]
add eax,4
mov edx,[eax+SYMBOL_INFO.Address]
add eax,SYMBOL_INFO.Name
invoke AddSymbol,[hSymbolListview],eax,edx
mov eax,TRUE
RET
ENDF
This will enumerate the symbols and everything is perfect as long as I add 4 to the pSymInfo address in the callback. I can't figure that one out at all but the structures address is actually 4 bytes above the address in pSymInfo. Once I figured that out, which was no small puzzle, everything fell into place, I can get the address of the symbol, its name and a lot of other information about the symbol. This is definitely the way to go but I am worried that I'll be bitten in the ass because of the 4 byte offset thing.
Edgar
Great stuff. I see getting bored is a good incentive for becoming really productive :green
So we see a variable & labels dump function at the horizon, right?
:bg
Quote from: jj2007 on March 29, 2010, 06:40:49 AM
Great stuff. I see getting bored is a good incentive for becoming really productive :green
So we see a variable & labels dump function at the horizon, right?
:bg
The tool will do that but it is going to be a profiling tool (time profiler) when it grows up, pick 2 labels and measure the time to execute the code between them. Frequency of calls to a particular procedure or references to a particular memory location etc...
Quote from: donkey
CodeView and PDB are not an option, GoAsm embeds debug symbols in the file, I am currently trying to get the DbgHelp API functions working but not having much luck enumerating the symbols with SymEnumSymbols
Fair enough, the COFF stuff is pretty straight forward. Most of the relocation is done, just need to add in the base address. The symbol records are nominally 0x12 (IMAGE_SYMBOL) bytes long, but can spill into multiple records (NumberOfAuxSymbols). Short symbols are stored withing the symbol (8 chars), with long ones indexed into the appended symbol table (at sizeof(IMAGE_SYMBOL) * NumberOfSymbols + LvaToFirstSymbol). With GoAsm you probably don't have to worry about OMAPing.
From Testbug3.dll
Debug Entry
Chars TimeDate Maj Min Type Size AddrRaw PtrRaw
-------- -------- ---- ---- ---------------------- -------- -------- --------
00000000 41587495 0000 0000 00000001 COFF 000000CA 00000000 00001000
COFF Debug Info Header
NumberOfSymbols: 00000008
LvaToFirstSymbol: 00000020
NumberOfLinenumbers: 00000000
LvaToFirstLinenumber: 00000000
RvaToFirstByteOfCode: 00001000
RvaToLastByteOfCode: 00001200
RvaToFirstByteOfData: 00002000
RvaToLastByteOfData: 00002A00
Val 00002000, Sec 0002, Typ 0000, Sto 02, Aux 00, DLLMESS1
Val 00002078, Sec 0002, Typ 0000, Sto 02, Aux 00, M1
Val 0000207F, Sec 0002, Typ 0000, Sto 02, Aux 00, DLLMESS2
Val 000020C0, Sec 0002, Typ 0000, Sto 02, Aux 00, DLLMESS3
Val 00002128, Sec 0002, Typ 0000, Sto 02, Aux 00, M2
Val 00001000, Sec 0001, Typ 0000, Sto 02, Aux 00, HEXROTATE4
Val 0000101F, Sec 0001, Typ 0000, Sto 02, Aux 00, START
Val 00001065, Sec 0001, Typ 0000, Sto 02, Aux 00, DLL_TEST3B
-Clive
Hi clive,
Thanks, with a bit of RE work I got the original one working though in the end I decided to use DbgHelp to do the symbol extraction for me, that way if I decide to support other formats it will be a minor addition. Also the amount of information returned from SymEnumSymbols would have been a lot of work to duplicate. The actual code I used to read the raw debug symbol table is this:
GetDebugSymbolsFromFile FRAME pMapFile
uses edi,esi,ebx
// Get a pointer to IMAGE_NT_HEADERS
mov edi,[pMapFile]
add edi,[edi+IMAGE_DOS_HEADER.e_lfanew]
// Get a pointer to the COFF symbol table
mov ebx,[edi+IMAGE_NT_HEADERS.FileHeader.PointerToSymbolTable]
// EBX will hold the memory address of the symbol table
add ebx,[pMapFile]
// Get the number of symbols
mov eax,[edi+IMAGE_NT_HEADERS.FileHeader.NumberOfSymbols]
// ESI will hold the symbol count
mov esi,eax
// Calculate the size of the IMAGE_SYMBOL array
mov ecx,SIZEOF IMAGE_SYMBOL
mul ecx
// The long names are stored right after the IMAGE_SYMBOL array
// EDI will hold the memory address of the long names array
mov edi,eax
add edi,ebx
:
mov edx,ebx
mov ecx,[ebx+IMAGE_SYMBOL.Name.Long]
add ecx,edi
cmp B[ebx],0
cmovz edx,ecx
movzx eax,W[ebx+IMAGE_SYMBOL.SectionNumber]
// So how do I get the symbols address ?????
invoke AddSymbol,[hSymbolListview],edx,NULL
add ebx,SIZEOF IMAGE_SYMBOL
dec esi
jnz <
RET
ENDF
However, getting the VA for a symbol would have been quite a lot of detective work added on to a day of RE'ing the PE symbol format so DbgHelp was the only solution that fit into the schedule I had (have to do this around work and other responsibilities). I may still tackle the raw way to do it but that is for another Sunday, in this project the symbols and VA though critical are only a small part of the overall code. The 4 byte offset thing still has me worried though, it seems to always contain DWORD 0x58
Edgar
EDIT: tidied up the code a bit.
I'd probably look at the NumberOfAuxSymbols being zero, and the SectionNumber not being 0x0000 or 0xFFFF, as a quick junk filter.
Also when NumberOfAuxSymbols is non zero you have to skip Aux * IMAGE_SYMBOL additional records. Not sure if GoAsm would generate them, but regular COFF debug records from LINK do.
Most profiling type applications tend to instrument the prolog/epilog code, or use FPO records.
-Clive
Hi clive,
The prologue/epilogue is probably the way I am going to go but since GoAsm does not store the line number data I needed a way to allow the user to select specific parts of the code and also a way to identify parts of code that were profiled. Without line numbers source level profiling is pretty much out of the question so symbols are the only option left that I can think of short of a long list of meaningless addresses. As far as I know GoAsm does not include any aux symbols but if I decide to do a real (non-experimental) raw symbol data extractor I will have to make allowance for them.
Edgar
For the bad section entries, a bounds check can be added to the loop:
:
mov edx,ebx
mov ecx,[ebx+IMAGE_SYMBOL.Name.Long]
add ecx,edi
cmp B[ebx],0
cmovz edx,ecx
movsx eax,W[ebx+IMAGE_SYMBOL.SectionNumber]
cmp eax,1
jl >.BADSECTION // signed or 0
// So how do I get the symbols address ?????
// symbol name is in EDX
invoke AddSymbol,[hSymbolListview],edx,NULL
.BADSECTION
add ebx,SIZEOF IMAGE_SYMBOL
dec esi
jnz <
Thanks for the article in the earlier post Dave, it solved the biggest problem I had, the address of the symbol. The field IMAGE_SYMBOL.Value holds the RVA for the symbol, just add this value to the image base (usually 0x00400000) and you have an address in the loaded executable ! The value is found in [ebx+IMAGE_SYMBOL.Value], the normal load address can be found in IMAGE_NT_HEADERS.OptionalHeader.ImageBase. So the code with virtual addresses would look like this:
GetDebugSymbolsFromFile FRAME pMapFile
uses edi,esi,ebx
LOCAL ImageBase:%PTR
// Get a pointer to IMAGE_NT_HEADERS
mov edi,[pMapFile]
add edi,[edi+IMAGE_DOS_HEADER.e_lfanew]
mov eax,[edi+IMAGE_NT_HEADERS.OptionalHeader.ImageBase]
mov [ImageBase],eax
// Get a pointer to the COFF symbol table
mov ebx,[edi+IMAGE_NT_HEADERS.FileHeader.PointerToSymbolTable]
// EBX will hold the memory address of the symbol table
add ebx,[pMapFile]
// Get the number of symbols
mov eax,[edi+IMAGE_NT_HEADERS.FileHeader.NumberOfSymbols]
// ESI will hold the symbol count
mov esi,eax
// Calculate the size of the IMAGE_SYMBOL array
mov ecx,SIZEOF IMAGE_SYMBOL
mul ecx
// The long names are stored right after the IMAGE_SYMBOL array
// EDI will hold the memory address of the long names array
mov edi,eax
add edi,ebx
:
mov edx,ebx
mov ecx,[ebx+IMAGE_SYMBOL.Name.Long]
add ecx,edi
cmp B[ebx],0
cmovz edx,ecx
movsx eax,W[ebx+IMAGE_SYMBOL.SectionNumber]
cmp eax,1
jl >.BADSECTION // signed or 0
// symbol name is in EDX
// Calculate the VA of the symbol
mov eax,[ebx+IMAGE_SYMBOL.Value]
add eax,[ImageBase]
invoke AddSymbol,[hSymbolListview],edx,eax
movzx ecx,B[ebx+IMAGE_SYMBOL.NumberOfAuxSymbols]
test ecx,ecx
jz >.BADSECTION // No AUX symbols
// Not sure if AUX symbols are counted in FileHeader.NumberOfSymbols but if they are
// adjust ESI to remove them from the count - I am assuming they are
sub esi,ecx
// skip past the AUX symbols
mov eax,SIZEOF IMAGE_SYMBOL
mul ecx
add ebx,eax
.BADSECTION
add ebx,SIZEOF IMAGE_SYMBOL
dec esi
jnz <
RET
ENDF
This manages to actually get more information from the symbol than from the DbgHelp API since I also can use the section number and read the data type in the section it will more reliably tell me if the section is code or data, something I have found lacking in the DbgHelp API.
Edgar
Nice !!!
(http://img706.imageshack.us/img706/386/symbolsh.jpg)
Used the following to make an array holding the section types, the section number was passed to AddSymbol and the strings were chosen based on the section characteristics flags. The section number - 1 was the element of the array holding the characteristics for that section.
ReadSectionTable FRAME pMapFile
uses edi,esi,ebx
// All this function does is create an array of DWORDs to
// hold the characteristics of each section
// Get a pointer to IMAGE_NT_HEADERS
mov edi,[pMapFile]
add edi,[edi+IMAGE_DOS_HEADER.e_lfanew]
movzx esi,W[edi+IMAGE_NT_HEADERS.FileHeader.NumberOfSections]
// Get the size of the otpional header and add it to EDI
movzx eax,W[edi+IMAGE_NT_HEADERS.FileHeader.SizeOfOptionalHeader]
add edi,eax
add edi,4 // Here's that stupid +4 again
add edi, SIZEOF IMAGE_FILE_HEADER
// EDI holds a memory pointer to the section table
// ESI holds the number of IMAGE_SECTION_HEADER entries in the table
// Create the array of Characteristics values
mov eax,esi
shl eax,2
invoke GlobalAlloc,GMEM_FIXED,eax
mov [paSectionTypes],eax
mov edx,eax
xor ecx,ecx
:
mov eax,[edi+IMAGE_SECTION_HEADER.Characteristics]
mov [edx+ecx*4], eax
add edi,SIZEOF IMAGE_SECTION_HEADER
inc ecx
dec esi
jnz <
:
RET
ENDF
Edgar
Quote from: donkey on March 30, 2010, 06:03:44 AM
Nice !!!
Very nice indeed. However, I am worried about the window title. Until now, we had occasional rows on coding issues but at least politically we seemed to be very close. Now you make propaganda for GOP...?
:wink
GoP = GoAsm Profiler
Looking at Microsoft Link.exe, it seems that there is no way possible to have it include debug symbols in the executable so I have to go the PDB route for any MASM programs. That's a shame because I don't know how to extract section information from the symbol data (ie code, initialized data etc) . The link.exe command line option states here: (http://msdn.microsoft.com/en-us/library/xe4t6fc1%28v=VS.80%29.aspx)
Quote from: MSDNIt is not possible to create an .exe or .dll that contains debug information. Debug information is always placed in a .pdb file.
Since both GoAsm and MASM debug builds will work with the DbgHelp API I have chosen to use that route exclusively. The other option is to RE the pdb file format and since a look at that indicates there are quite a few different versions I think it would be far too much work and leave my program vulnerable to changes that might come later or some version I didn't have. Unfortunately this means that I have to rethink my method to determine which section a symbol resides in and that might be a bit of extra detective work.
EDIT: Well, the tag of the MASM symbol tells me what kind of symbol it is but that is not available in the GoAsm enumeration so I will have to use both and determine it based an the IMAGE_NT_HEADERS.FileHeader.PointerToSymbolTable contents, if there is a pointer I get the info from the file, if not then I use the DbgHelp API.
Depends on the version of LINK, normally I would use /DEBUG /DEBUGTYPE:COFF
-Clive
Quote from: donkey on March 31, 2010, 02:18:27 AM
Looking at Microsoft Link.exe, it seems that there is no way possible to have it include debug symbols in the executable so I have to go the PDB route for any MASM programs. That's a shame because I don't know how to extract section information from the symbol data (ie code, initialized data etc) . The link.exe command line option states here: (http://msdn.microsoft.com/en-us/library/xe4t6fc1%28v=VS.80%29.aspx)
Quote from: MSDNIt is not possible to create an .exe or .dll that contains debug information. Debug information is always placed in a .pdb file.
That is a surprising statement. Attached an example that seems to prove the contrary - unless I completely misunderstand the meaning of "debug information". I have renamed pdb to xpdb and ilk to xilk, but Olly still sees the symbolic names.
Built with /Zi for the assembler and /debug for the linker (version 5.12).
Quote from: donkey
Since both GoAsm and MASM debug builds will work with the DbgHelp API I have chosen to use that route exclusively. The other option is to RE the pdb file format and since a look at that indicates there are quite a few different versions I think it would be far too much work and leave my program vulnerable to changes that might come later or some version I didn't have. Unfortunately this means that I have to rethink my method to determine which section a symbol resides in and that might be a bit of extra detective work.
Yes, the PDB file format is a bit of a bugger, there are around 5 variants, and any given version of Microsoft's helper tools can't read some of them because they are not clearly identified. It's not well documented. The file is a self contained file system that permits incremental changes as the source code is updated and recompiled/linked. Some of the symbol and type information has roots in the CodeView format(s) so a familiarity with those is helpful.
Depending on how they were built you can break the file down at the library/object/source/function level, and source line number.
-Clive
Quote from: jj2007 on March 31, 2010, 01:31:17 PM
Quote from: donkey on March 31, 2010, 02:18:27 AM
Looking at Microsoft Link.exe, it seems that there is no way possible to have it include debug symbols in the executable so I have to go the PDB route for any MASM programs. That's a shame because I don't know how to extract section information from the symbol data (ie code, initialized data etc) . The link.exe command line option states here: (http://msdn.microsoft.com/en-us/library/xe4t6fc1%28v=VS.80%29.aspx)
Quote from: MSDNIt is not possible to create an .exe or .dll that contains debug information. Debug information is always placed in a .pdb file.
That is a surprising statement. Attached an example that seems to prove the contrary - unless I completely misunderstand the meaning of "debug information". I have renamed pdb to xpdb and ilk to xilk, but Olly still sees the symbolic names.
Built with /Zi for the assembler and /debug for the linker (version 5.12).
Hi JJ2007,
I tried to build with /Zi and /Debug and the information is not present in the build, only the absolute path to the PDB file (use a hex editor and look at the very end of the file). I was using link 5.12.8078.0 which is the only version I have since I don't use MASM much at all I have never upgraded it. You can change the extension of the PDB file without losing your symbols as the search path is set by trying to open that file, if it is not found then the DbgHelp API (which Olly uses) will attempt to find any file in that path that matches the specs for a PDB and contains debug information for the executable. Try deleting the file completely or better still check the IMAGE_NT_HEADERS.FileHeader.PointerToSymbolTable entry, you will find it is NULL.
Hi Clive,
Yes, I have 2 examples of MS PDB files that vary widely. The file is structured as you said like a file system with a number of streams that contain different information, there appears to be a codeview stream but I am not sure if the information matches the codeview spec and the shear complexity of RE'ing the file format is more than I am willing to do when the DbgHelp API can extract the information I need. I have pretty much moved past the symbol extraction since the dual method appears to work and have landed squarely in the procedure mapping part of the profiler, that looks like it will be mostly searching for patterns and taking some best guesses but they should be pretty accurate. Distinguishing data symbols in the code section will be much tougher though I am not sure I will need to do that for the profiler as the person running it will know the source of the symbol. I am also having problems finding information about single stepping a program, all of the information that I can find deals with checking an option in some MS tool, not how it is implemented.
With LINK 5.12.8078 (in MASM32) you need both the /DEBUG and /DEBUGTYPE:COFF to get symbols within the file.
It's been decades since I wrote code to single step an 8086, there you generated an interrupt (INT x), and in the handler you would set the trace/single step bit (TF) in the stacked flags, the exiting IRET would then start stepping instructions via INT 1 (interrupt would be called after each instruction, IRET to run the next one).
How to achieve this on a Windows box is beyond my level of tolerance, I suspect the mechanism is much the same and can probably be done in a driver with perhaps some assistance from the OS hooks put in for WinDbg and SoftICE.
Thinks like VTune had static analysis modes with really good internal models of the processor, the dynamic modes used the internal performance counters. I'd have to check if my MSR dumping application still works in XP/Vista/Win7.
Not sure single stepping is good way to profile code, it is going to cause havoc with the pipelining. I would probably attack it with RDTSC in the prologue/epilog, do some course analysis to pin point the hot spots and then drill down further. I rarely use single stepping to debug things (ARM/MIPS w/JTAG), it's a method of last resort as I can usually read the code and understand the flow, or register usage. If I want to tune some code I'll pull it out and put it in a test harness and either use hardware based cycle counters, or timers.
-Clive
here is an MSR tool, Clive
http://www.fileden.com/files/2008/3/3/1794507/MSR.zip
Attention: I am not responsible if someone ph_x up their machine with this program
Hi Clive,
I believe I can set single step through the eflags, in the profiled process I should be able to do it with the context structure passed to CREATE_PROCESS_DEBUG_EVENT but I have to wait til I get home to test it. As far as I can tell you just set the trace flag in the eflags register (0x100 - bit 8).
The single stepping is a way to track usage of data buffers which cannot be breakpointed, it will not be used for performance testing. Performance testing will be achieved by injecting a breakpoint and when it is reached writing the original data back before execution continues. A second breakpoint will signal the end of the profiled segment of code, when it is reached the breakpoint is restored.
Edgar
Quote from: dedndave
here is an MSR tool, Clive
Had to blow the dust of this one. Well this still works in 32-bit XP, will try in Vista.
Edit : Well Vista doesn't like the dynamic injection of a driver.
-Clive
Hi clive,
Thanks very much. The program will be single stepped if you add the following to CREATE_PROCESS_DEBUG_EVENT:
mov D[context.ContextFlags],CONTEXT_CONTROL
invoke SuspendThread, [dbe.u.CreateProcessInfo.hThread]
invoke GetThreadContext,[dbe.u.CreateProcessInfo.hThread],offset context
or D[context.EFlags],0x100
invoke SetThreadContext,[dbe.u.CreateProcessInfo.hThread],offset context
invoke ResumeThread, [dbe.u.CreateProcessInfo.hThread]
The single step is handled in the EXCEPTION_SINGLE_STEP handler. A quick test here shows that the program goes into single step though it appears that I may have to reset the EFlags for each instruction. Not a terribly difficult thing to do but very time consuming, I will have to experiment a bit more when I get back to my dev box. I ran it on a Vista box without the OS objecting at all.
Edgar
Quote from: donkey on April 01, 2010, 05:55:56 AM
Hi JJ2007,
I tried to build with /Zi and /Debug and the information is not present in the build, only the absolute path to the PDB file (use a hex editor and look at the very end of the file). I was using link 5.12.8078.0 which is the only version I have since I don't use MASM much at all I have never upgraded it. You can change the extension of the PDB file without losing your symbols as the search path is set by trying to open that file, if it is not found then the DbgHelp API (which Olly uses) will attempt to find any file in that path that matches the specs for a PDB and contains debug information for the executable.
Actually, I was tricked by Olly:
When FileName.pdb is present, Olly loads it and creates a file named \masm32\OllyDbg\FileName.udd
Afterwards, you can delete FileName.pdb without losing the symbolic names.
Hi donkey,
Did you get any further with this? I have completed the exact same task (parsing the Symbol Table in a COFF header) in order to find locations of variables once loaded into memory. I am also single stepping through instructions. This is to a completely different end to you - and I resorted to C, although it's pretty API and structure intensive, so not much different in the code!
Tom
Quote from: brixton on June 27, 2010, 10:47:23 PM
Hi donkey,
Did you get any further with this? I have completed the exact same task (parsing the Symbol Table in a COFF header) in order to find locations of variables once loaded into memory. I am also single stepping through instructions. This is to a completely different end to you - and I resorted to C, although it's pretty API and structure intensive, so not much different in the code!
Tom
Hi Tom,
Yes I have completed the parsing of the symbol table and have also sorted through the sections etc.. I am currently working towards either using a prepackaged library for disassembly or writing my own, this is in order to help make a guess at procedure start and end. As for the single stepping, I haven't implemented that completely yet but it appears to be viable, there is another solution using page guard that I am hoping to try soon.
Edgar
Hi donkey,
Sounds promising! The single stepping is not a problem - For my application, I set an int3 breakpoint on the first instruction of the target binary, and when I process this (not the initial Windows EXCEPTION_BREAKPOINT) I do a GetContext, do a bitwise inclusive-or on the trap flag in the EFLAGS register and then SetContext. Unfortunately yes, after each EXCEPTION_SINGLE_STEP you need to GetContext, set the trap flag and then SetContext again.
I am actually using the BeaEngine library for disassembly. You pass it the EIP register (or actually any pointer) and it returns a structure containing information about the first instruction it encounters. It also gives you a nice string so you can just print the instruction out if need be. BeaEngine comes with python, pascal, C, NASM/MASM/FASM/GoASM headers, I've been using it and it seems accurate so far..
Tom
nice find brixton, it's under the LGPL license which rocks, and it looks very clean and fast, + has 64bit and Goasm/Masm support :D even has a LDE version, wow dude you really hit the motherload here! most other good engines are crappy GPL and difficult to use with GoASM/MASM.
Hey ecube,
Yes, I did a lot of research on this subject - it does seem good, but I am yet to put it through more intensive tests :bdg
Hi Brixton,
I have looked at BeaEngine, (as well as DiStorm but that was too language restrictive) a nice package with a very liberal license however there are some issues with GoAsm and the _Disasm@4 export (DLL version). Specifically it requires that you use the /mix switch and that has some adverse affects on the headers, Jeremy is looking at the bug. The lib file that comes with the distribution could not be read by GoAsm, the format was unrecognized. Right now I have almost decided on Drizz's disassembler, I haven't looked it through very deeply but the license is great and the author is known around the forums:
QuoteCopy-left FOREVER by drizz. No rights reserved.
All modules in this library are dedicated to the public domain by drizz.
Permission to use, copy, modify, reverse-engineer, crack,
patch and distribute this compilation for any purpose is hereby granted.
Gotta love it !
Edgar
donkey I got BeaEngine's goasm staticlib example to assemble fine. just added
#dynamiclinkfile msvcrt.dll to the source and assembled with
set INCLUDE=C:\GoAsm\include
\GoAsm\bin\GoAsm /x86 example.asm
\GoAsm\bin\GoLink /console /fo example.exe example.obj
pause
Incidentally, I am trying to tease out the length (size) of a symbol, if it is statically linked data. Does anyone know how to do this? I can find the location, but the size eludes me..
BeaEngine also has a 32bit/64bit len dissembler on their site, it's still crashing for me, but you can use it to read certain number of bytes and it'll give back the instructions len.
I actually mean variable (included in symbol table) lengths, not instruction lengths. eg. if I have a global variable:
someString BYTE 5 DUP(?)
I can find the location of someString in the data section of memory (garnered from the symbol table), but I don't know how to find its length of 5.
if it's your own code you can use sizeof for a lot of things, if it's another processes code you can use heap32first etc...to walk through alloc'd heap memory(globalloc,heapalloc etc...) or virtualquery to walk memory pages ingeneral and get sizes.
for static libs, you can use the /l switch with goasm, to generate a listing, i'm not sure how much help it'd be in that case but it may. if your static lib has debug symbols you can load your assembled program in olly and it'll give you info, aswell as the option to load the source code in, so it gives you line by line.
Donkey,
Icezlion does what you want here http://win32assembly.online.fr/tut30.html but he says single stepping large programs can take 10 mins, wtf!
Quote from: brixton on July 02, 2010, 03:50:11 PM
I actually mean variable (included in symbol table) lengths, not instruction lengths. eg. if I have a global variable:
someString BYTE 5 DUP(?)
I can find the location of someString in the data section of memory (garnered from the symbol table), but I don't know how to find its length of 5.
You can't. You could guess the size by deducting label offsets (next-this). But there could be "align" directive for example which would add to size.
VCx0.pdb could have this info (i'm not sure) but only for c/c++.
Here's an old obj2asm utility i made. If you test it you can see that it just dumps the bytes no analysis on the data.
Quote from: drizz on July 02, 2010, 09:03:00 PM
Quote from: brixton on July 02, 2010, 03:50:11 PM
I actually mean variable (included in symbol table) lengths, not instruction lengths. eg. if I have a global variable:
someString BYTE 5 DUP(?)
I can find the location of someString in the data section of memory (garnered from the symbol table), but I don't know how to find its length of 5.
You can't. You could guess the size by deducting label offsets (next-this). But there could be "align" directive for example which would add to size.
VCx0.pdb could have this info (i'm not sure) but only for c/c++.
Here's an old obj2asm utility i made. If you test it you can see that it just dumps the bytes no analysis on the data.
Thanks drizz, I didn't think there was a way (as far as I could tell). Looks like I'm going to have to resort to guessing (educated guess).
Quote from: E^cube on July 02, 2010, 07:26:27 PM
Donkey,
Icezlion does what you want here http://win32assembly.online.fr/tut30.html but he says single stepping large programs can take 10 mins, wtf!
Not surprised by that, on the old hardware of the time. Especially worse if you're printing something after each instruction. I single-stepped theGUN.exe (from the MASM32 package), printing each instruction as it went. It took about 150k instructions before a window started appearing (30 seconds or so on my machine). Once loaded the only instructions processed are from GetMessage, like when my mouse moves in/out or it needs to repaint.
Pretty sure there isn't a way to do strings, the TYPES data can be used to infer details of the fields within a RECORD/STRUCTURE
From the PDB file
sn iBlk Size Stamp Module
000A : 0029 : 0000019C 0037BA68 test26.obj
iBlk Blk FileOffs Size
0029 0013 00004C00 19C
sstAlignSym - Module Symbols (Size 0120)
00000000: 0012 0009 S_OBJNAME 00000001 test26.obj
00000014: 0036 0001 S_COMPILE 00000303 Intel 80386 MASM Microsoft (R) Macro Assembler Version 6.15.8803
0000004C: 001A 1007 S_LDATA32_VS97 00000022 3.00000002 g_ArrayPtr
00000068: 000E 1003 S_UDT_VS97 00001001 Array
00000078: 0016 1007 S_LDATA32_VS97 00000020 3.00000006 String
00000090: 002E 100A S_LPROC32_VS97 1.00000000[0000001B] $$$00001
000000C0: 0002 0006 S_END
000000C4: 0012 0209 S_LABEL32 1.00000000 Start
000000D8: 0012 1003 S_UDT_VS97 00001005 c_msvcrt
000000EC: 0016 1007 S_LDATA32_VS97 00001001 3.00000000 g_Array
00000104: 001A 1007 S_LDATA32_VS97 00000020 3.00000012 SomeString
11 Symbol(s)
sstSrcModule - Line Numbering (Size 0058)
cFile = 0001, cSeg = 0001
1.00000000->0000001A
00000014 cSeg = 0001, pad = 0000, Name = test26.asm
00000030 1.00000000->0000001A
1.00000000 27
1.00000000 27
1.00000006 28
1.00000008 32
1.0000000B 34
1.0000001A 36
Here SomeString (DB 5 DUP (?)) has a TYPE of 0x20 which is basically T_UCHAR. No size/length is inferred within the CodeView/PDB data. You'd just have to sort the symbols, and the object boundaries, and compute the total space taken up including any alignment that's thrown in by the compiler and/or linker.
From the OBJ File, the .debug$S (CodeView Symbols)
03 .debug$S Virtual Address 00000000
Physical Address 00000032
Raw Data Offset 0000015C
Raw Data Size 00000102
Relocation Offset 0000025E
Relocation Count 000C
Line Number Offset 00000000
Line Number Count 0000
Characteristics 42100040
Initialized Data
1 Byte Align
Discardable
Readable
00000053 0000000E 000B (SECREL ) 2.00000002 g_ArrayPtr
00000057 0000000E 000A (SECTION ) 2.00000002 g_ArrayPtr
00000076 0000000F 000B (SECREL ) 2.00000006 String
0000007A 0000000F 000A (SECTION ) 2.00000006 String
000000A1 00000012 000B (SECREL ) 1.00000000 _$$$00001@0
000000A5 00000012 000A (SECTION ) 1.00000000 _$$$00001@0
000000BB 00000010 000B (SECREL ) 1.00000000 _Start
000000BF 00000010 000A (SECTION ) 1.00000000 _Start
000000DB 00000011 000B (SECREL ) 2.00000000 g_Array
000000DF 00000011 000A (SECTION ) 2.00000000 g_Array
000000EF 00000019 000B (SECREL ) 2.00000012 SomeString
000000F3 00000019 000A (SECTION ) 2.00000012 SomeString
0011 - 0009 S_OBJNAME test26.obj
00000000: 11 00 09 00 01 00 00 00 - 0A 74 65 73 74 32 36 2E .........test26.
00000010: 6F 62 6A obj
0036 - 0001 S_COMPILE Microsoft (R) Macro Assembler Version 6.15.8803
00000000: 36 00 01 00 03 03 00 00 - 2F 4D 69 63 72 6F 73 6F 6......./Microso
00000010: 66 74 20 28 52 29 20 4D - 61 63 72 6F 20 41 73 73 ft (R) Macro Ass
00000020: 65 6D 62 6C 65 72 20 56 - 65 72 73 69 6F 6E 20 36 embler Version 6
00000030: 2E 31 35 2E 38 38 30 33 .15.8803
0015 - 0201 S_LDATA32 T_ULONG g_ArrayPtr
00000000: 15 00 01 02 00 00 00 00 - 00 00 22 00 0A 67 5F 41 .........."..g_A
00000010: 72 72 61 79 50 74 72 rrayPtr
000A - 0004 S_UDT 1003 Array
00000000: 0A 00 04 00 03 10 05 41 - 72 72 61 79 .......Array
0011 - 0201 S_LDATA32 T_UCHAR String
00000000: 11 00 01 02 00 00 00 00 - 00 00 20 00 06 53 74 72 .......... ..Str
00000010: 69 6E 67 ing
002C - 0204 S_LPROC32 1005 $$$00001
00000000: 2C 00 04 02 00 00 00 00 - 00 00 00 00 00 00 00 00 ,...............
00000010: 1B 00 00 00 00 00 00 00 - 1B 00 00 00 00 00 00 00 ................
00000020: 00 00 05 10 00 08 24 24 - 24 30 30 30 30 31 ......$$$00001
0002 - 0006 S_END
00000000: 02 00 06 00 ....
000F - 0209 S_LABEL32 Start
00000000: 0F 00 09 02 00 00 00 00 - 00 00 00 05 53 74 61 72 ............Star
00000010: 74 t
000D - 0004 S_UDT 1000 c_msvcrt
00000000: 0D 00 04 00 00 10 08 63 - 5F 6D 73 76 63 72 74 .......c_msvcrt
0012 - 0201 S_LDATA32 1003 g_Array
00000000: 12 00 01 02 00 00 00 00 - 00 00 03 10 07 67 5F 41 .............g_A
00000010: 72 72 61 79 rray
0015 - 0201 S_LDATA32 T_UCHAR SomeString
00000000: 15 00 01 02 00 00 00 00 - 00 00 20 00 0A 53 6F 6D .......... ..Som
00000010: 65 53 74 72 69 6E 67 eString
Total size of .data object record is 0x17 (2.00000000 .. 2.00000016) , with SomeString starting at 0x12 (2.00000012)
02 .data Virtual Address 00000000
Physical Address 0000001B
Raw Data Offset 0000013A
Raw Data Size 00000017
Relocation Offset 00000152
Relocation Count 0001
Line Number Offset 00000000
Line Number Count 0000
Characteristics C0300040
Initialized Data
4 Byte Align
Readable
Writeable
Hi clive,
Yes, so using the string's location in the data section, along with the locations of other data in there could give me an indication of its length (but that is all, it can't be 100% reliable, right?).
Quote from: brixton
Yes, so using the string's location in the data section, along with the locations of other data in there could give me an indication of its length (but that is all, it can't be 100% reliable, right?).
That's pretty much it. The assembler knows this information but it doesn't export it in the object file, or listing for that matter. The debug records include the name, address, and type information. So you would know if it was an array of BYTE, WORD, or some custom RECORD/STRUCTURE, but you'd have to confine the symbol between other symbols, or the end of a section within the object file.
I'm trying to compile 'ed' (from cygwin) with symbol/debugging information. I added the -g -gcoff switches to gcc, and indeed I get a (massive) symbol table - but I don't get most of the global variables listed in there at all. Very strange.
From main.c:
static const char * invocation_name = 0;
static const char * const Program_name = "GNU Ed";
static const char * const program_name = "ed";
static const char * const program_year = "2008";
static char _restricted = 0; /* invoked as "red" */
static char _scripted = 0; /* if set, suppress diagnostics */
static char _traditional = 0; /* if set, be backwards compatible */
I can find invocation_name in the symbol table, but cannot find any of the others at all, even using the famous PEDUMP.exe. Stumped!
"static" means that it is local to the object in question, so it isn't going to be exported to pollute the namespace.
Interesting - so how come invocation_name is included?
edit: I removed the 'static const' keywords (eg. char * const Program_name = "GNU Ed";) and it still doesn't export them..
Hi,
Just an answer to donkey. He says :
QuoteI have looked at BeaEngine, (as well as DiStorm but that was too language restrictive) a nice package with a very liberal license however there are some issues with GoAsm and the _Disasm@4 export (DLL version). Specifically it requires that you use the /mix switch and that has some adverse affects on the headers, Jeremy is looking at the bug. The lib file that comes with the distribution could not be read by GoAsm, the format was unrecognized
that's true. With the last version (4.0) of BeaEngine, it is not possible to link the lib to goasm program. It is quite annoying :) In fact, BeaEngine 4.0 is now compiled with MingW (gcc). I don't know why, GoAsm does not like lib built with gcc. I just try to use a version compiled with PellesC (original compiler used for BeaEngine) and now, GoAsm is linking properly (and you don't need to use /mix option !). Just try :)
Thanks beatrix,
I will definitely take a look on the weekend.
Edgar
very cool code, Beatrix - i may put it to use :U
oh - and nice to have you visit us
Hi Beatrix,
Has the Disasm function been changed to C call from STDCALL ? It seems when I loop through and decode 10 instructions ESP is offset by 40 bytes. The code for the disassembly is :
The code section is read directly from a PE file and decoded from there (pMem points the Global buffer that holds the target code)
mov D[usedInstructionsCount],0
// Zero the _Disasm structure
mov ecx,SIZEOF _Disasm
lea edi,DisasmStruct
xor eax,eax
rep stosb
mov eax,[pMem]
mov [DisasmStruct.EIP],eax
mov eax,[dwAddress]
mov [DisasmStruct.VirtualAddr],eax
mov D[DisasmStruct.Options],GoAsmSyntax
mov ebx,[cbSize]
:
invoke beaengine.lib:Disasm,offset DisasmStruct
add esp,4
push eax
invoke wsprintf,offset OutputLine,offset CodeFormat,[DisasmStruct.VirtualAddr],offset DisasmStruct.CompleteInstr
add esp,16
pop eax
add [DisasmStruct.EIP],eax
sub ebx,eax
inc D[usedInstructionsCount]
add [DisasmStruct.VirtualAddr],eax
invoke SendMessage,[hwnd],EM_SETSEL,-1,-1
invoke SendMessage,[hwnd],EM_REPLACESEL,FALSE,offset OutputLine
cmp ebx,0
jg <
Has the Disasm function been changed to C call from STDCALL
gloops. No, the lib released on www.beaengine.org uses stdcall convention. The one I gave you in this forum is using cdecl. Sorry for that, I just forgot to specify the desired output and by default, the compiled lib uses cdecl.
@dedndave : thanks :)
No problem beatrix,
I just wanted to make sure I was using it correctly, I will set it up as a C call. Nice lib by the way, fast and accurate.
Edgar
donkey can you explain why wsprintf seems to work fine with GoASM on 32bit not using your cdecl macro?
Quote from: E^cube on July 15, 2010, 03:38:41 AM
donkey can you explain why wsprintf seems to work fine with GoASM on 32bit not using your cdecl macro?
Hi E^cube,
Not sure what you mean, it has always worked however you have to adjust ESP directly by calculating the bytes pushed (4 * #parameters), the CInvoke macro calculates the amount to adjust ESP for you but it expands to exactly the same thing that I did manually.
Edgar
yeah I meant it worked without adjusting anything, I suppose I should just use the macro :)
beatrix your lib is weird, it doesnt display the instructions string correctly :\ idk why. you mind recompiling the latest versions or giving directions on how to?
Update I just compiled to a lib myself using visual studio 6.0 and its definitely not outputting correct strings :( is missings mov, jmp etc... some strings are even empty,ugh.
Hi E^cube,
Could you post an example project and detail the lines that have problems, I have used the lib in GoP but now I'm worried that the disassembly might have problems.
Edgar
Donkey,
here's an example of the latest lib that I compiled myself, aswell as the old lib found before on the site(that worked). you can try beatrix's lib too, neither output the write strings.
also I don't know why it's messing up, i'll have to try and find the reason in the c code.
Hi E^cube,
I'm in BC for the week but should be home Sunday, I will take a look at it then.
Hi E^Cube,
I think it is only a problem of compiler. Last version of BeaEngine are compiled with gcc and GoAsm can't use these versions. Just try to use the lib in the joined archive, it works fine.
Quote from: beatrix on July 23, 2010, 08:23:17 PM
Hi E^Cube,
I think it is only a problem of compiler. Last version of BeaEngine are compiled with gcc and GoAsm can't use these versions. Just try to use the lib in the joined archive, it works fine.
Thanks beatrix, I appreciate all the hardwork you've put into BeaEngine, it really is amazing. Even more amazing is the license, most people just use regular GPL which makes their project completely useless to me, you on the other hand took the more intelligent, considerate route. I commend you. I hope you consider putting up a donation button on your site :thumbu
Hi all,
I don't know if you've found this, but with different linkers comes different symbol table usage. For instance, my main gripe is with Cygwin's GCC (may be the same in the original GCC) where the 'Value' field of an IMAGE_SYMBOL structure does not seem to give a value which is an offset from the image base, but rather from the start of a data section (ie. a value of 0x20 rather than something in the thousands). This is confusing my symbol table parser and I cannot see an obvious way around it..
Tom
Quote from: brixton on August 04, 2010, 03:38:54 PM
Hi all,
I don't know if you've found this, but with different linkers comes different symbol table usage. For instance, my main gripe is with Cygwin's GCC (may be the same in the original GCC) where the 'Value' field of an IMAGE_SYMBOL structure does not seem to give a value which is an offset from the image base, but rather from the start of a data section (ie. a value of 0x20 rather than something in the thousands). This is confusing my symbol table parser and I cannot see an obvious way around it..
Tom
OffsetToRva function should fix that.
Quote from: E^cube on August 04, 2010, 05:42:24 PM
Quote from: brixton on August 04, 2010, 03:38:54 PM
Hi all,
I don't know if you've found this, but with different linkers comes different symbol table usage. For instance, my main gripe is with Cygwin's GCC (may be the same in the original GCC) where the 'Value' field of an IMAGE_SYMBOL structure does not seem to give a value which is an offset from the image base, but rather from the start of a data section (ie. a value of 0x20 rather than something in the thousands). This is confusing my symbol table parser and I cannot see an obvious way around it..
Tom
OffsetToRva function should fix that.
Hi E^cube,
Where is this function?
Hi Brixton,
Not sure about the OffsetToRva function, but there are many DbgHelp functions and I find ImageNtHeader / ImageRvaToVa very useful when reading PE files. I think its what you're looking for but if you need to go in the other direction SymFromAddr works just fine as well but requires a bit more setup.
http://msdn.microsoft.com/en-us/library/ms679291%28v=VS.85%29.aspx
This code is in GoAsm format and will enumerate symbols but it demonstrates how to use the symbol table and the Sym... functions:
EnumerateSymbols FRAME hFile, hProcess, ImageBase
LOCAL fsh:%DWORD32
LOCAL fsl:%DWORD32
LOCAL ProcessPath[2048]:%CHAR
LOCAL ihmod64:IMAGEHLP_MODULE64
mov D[ProcessPath],0
invoke SetLastError,0
invoke GetProcessImageFileName ,[hProcess],offset ProcessPath,MAX_PATH
invoke GetFileSize,[hFile],offset fsh
mov [fsl],eax
invoke SymInitialize,[hProcess], offset ProcessPath, FALSE
invoke SymLoadModuleEx,[hProcess],[hFile],offset ProcessPath,NULL,[ImageBase],0,[fsl],0,0
push eax,edx
mov D[ihmod64.SizeOfStruct],SIZEOF IMAGEHLP_MODULE64
invoke SymGetModuleInfo64,[hProcess],[ImageBase],0,offset ihmod64
cmp D[ihmod64.SymType],SymNone
je >>.NOSYMBOLS
pop edx,eax
invoke SymEnumSymbols,[hProcess],eax,edx,"*",offset SymEnumSymbolsProc, [hProcess]
invoke SymUnloadModule64,[hProcess],[ImageBase],0
invoke SymCleanup,[hProcess]
xor eax,eax
RET
.NOSYMBOLS
xor eax,eax
dec eax
ret
ENDF
SymEnumSymbolsProc FRAME pSymInfo, SymbolSize, UserContext
...
.CONTINUE
mov eax,TRUE
RET
ENDF
Hi donkey,
Thanks a lot for the reply, I will bare this in mind (and perhaps implement it, if I have time.. the deadline is fast approaching). Yesterday I found the cause of the problem: the .bss section:
Quote from: wiki... the bss section typically includes all uninitialized variables declared at the file level (i.e., outside of any function) as well as uninitialized local variables declared with the static keyword. An implementation may also assign statically-allocated variables initialized with a value consisting solely of zero-valued bits to the bss section.
Hence, all of the global variables were actually present at (symbol table.Value+ImageBase+offset of .bss section) - this was found with some detective work and olly.
Tom
Donkey why did you use %DWORD32? according to your definition it's just a structure defined as DD which doesn't change depending on 34bit or 64bit? %INT_PTR appears to change though. Also here are some mov/cmp macros you can use for 32bit/64bit to auto convert.
%MOV(%DESTN,%SOURCE) MACRO
#IFNDEF WIN64
mov D[%DESTN],%SOURCE
#ELSE
mov Q[%DESTN],%SOURCE
#ENDIF
ENDM
%CMP(%DESTN,%SOURCE) MACRO
#IFNDEF WIN64
cmp D[%DESTN],%SOURCE
#ELSE
cmp Q[%DESTN],%SOURCE
#ENDIF
ENDM
%MOV(myvar,53)
%CMP(myvar,54)
jne >
Hi E^cube,
It is a very early draft and I never noticed, nothing more sinister than that :) But I believe that file size high (fsh) and file size low (fsl) remain DWORDs even when in 64 bit mode:
DWORD WINAPI GetFileSize(
__in HANDLE hFile,
__out_opt LPDWORD lpFileSizeHigh
);
Edgar