News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Reading raw debug symbol data

Started by donkey, March 28, 2010, 07:34:12 PM

Previous topic - Next topic

brixton

Quote from: drizz on July 02, 2010, 09:03:00 PM
Quote from: brixton on July 02, 2010, 03:50:11 PM
I actually mean variable (included in symbol table) lengths, not instruction lengths.  eg. if I have a global variable:

someString BYTE 5 DUP(?)

I can find the location of someString in the data section of memory (garnered from the symbol table), but I don't know how to find its length of 5.

You can't. You could guess the size by deducting label offsets (next-this). But there could be "align" directive for example which would add to size.

VCx0.pdb could have this info (i'm not sure) but only for c/c++.

Here's an old obj2asm utility i made. If you test it you can see that it just dumps the bytes no analysis on the data.

Thanks drizz, I didn't think there was a way (as far as I could tell).  Looks like I'm going to have to resort to guessing (educated guess).

Quote from: E^cube on July 02, 2010, 07:26:27 PM
Donkey,

Icezlion does what you want here http://win32assembly.online.fr/tut30.html but he says single stepping large programs can take 10 mins, wtf!

Not surprised by that, on the old hardware of the time.  Especially worse if you're printing something after each instruction.  I single-stepped theGUN.exe (from the MASM32 package), printing each instruction as it went.  It took about 150k instructions before a window started appearing (30 seconds or so on my machine).  Once loaded the only instructions processed are from GetMessage, like when my mouse moves in/out or it needs to repaint.
If you love somebody, set them free.
If they return, they were always yours. If they don't, they never were..

clive

Pretty sure there isn't a way to do strings, the TYPES data can be used to infer details of the fields within a RECORD/STRUCTURE

From the PDB file

sn     iBlk   Size     Stamp    Module
000A : 0029 : 0000019C 0037BA68 test26.obj
iBlk Blk  FileOffs Size
0029 0013 00004C00  19C

sstAlignSym - Module Symbols (Size 0120)

00000000: 0012 0009 S_OBJNAME        00000001 test26.obj
00000014: 0036 0001 S_COMPILE        00000303 Intel 80386 MASM Microsoft (R) Macro Assembler Version 6.15.8803
0000004C: 001A 1007 S_LDATA32_VS97   00000022     3.00000002 g_ArrayPtr
00000068: 000E 1003 S_UDT_VS97       00001001 Array
00000078: 0016 1007 S_LDATA32_VS97   00000020     3.00000006 String
00000090: 002E 100A S_LPROC32_VS97      1.00000000[0000001B] $$$00001
000000C0: 0002 0006 S_END
000000C4: 0012 0209 S_LABEL32                     1.00000000 Start
000000D8: 0012 1003 S_UDT_VS97       00001005 c_msvcrt
000000EC: 0016 1007 S_LDATA32_VS97   00001001     3.00000000 g_Array
00000104: 001A 1007 S_LDATA32_VS97   00000020     3.00000012 SomeString

11 Symbol(s)

sstSrcModule - Line Numbering (Size 0058)

cFile = 0001, cSeg = 0001
    1.00000000->0000001A
00000014 cSeg = 0001, pad = 0000, Name = test26.asm
   00000030    1.00000000->0000001A
               1.00000000    27
               1.00000000    27
               1.00000006    28
               1.00000008    32
               1.0000000B    34
               1.0000001A    36


Here SomeString (DB 5 DUP (?)) has a TYPE of 0x20 which is basically T_UCHAR. No size/length is inferred within the CodeView/PDB data. You'd just have to sort the symbols, and the object boundaries, and compute the total space taken up including any alignment that's thrown in by the compiler and/or linker.

From the OBJ File, the .debug$S (CodeView Symbols)

03  .debug$S Virtual Address         00000000
Physical Address        00000032
Raw Data Offset         0000015C
Raw Data Size           00000102
Relocation Offset       0000025E
Relocation Count        000C
Line Number Offset      00000000
Line Number Count       0000
Characteristics         42100040
Initialized Data
1  Byte Align
Discardable
Readable

00000053 0000000E 000B (SECREL  )    2.00000002 g_ArrayPtr
00000057 0000000E 000A (SECTION )    2.00000002 g_ArrayPtr
00000076 0000000F 000B (SECREL  )    2.00000006 String
0000007A 0000000F 000A (SECTION )    2.00000006 String
000000A1 00000012 000B (SECREL  )    1.00000000 _$$$00001@0
000000A5 00000012 000A (SECTION )    1.00000000 _$$$00001@0
000000BB 00000010 000B (SECREL  )    1.00000000 _Start
000000BF 00000010 000A (SECTION )    1.00000000 _Start
000000DB 00000011 000B (SECREL  )    2.00000000 g_Array
000000DF 00000011 000A (SECTION )    2.00000000 g_Array
000000EF 00000019 000B (SECREL  )    2.00000012 SomeString
000000F3 00000019 000A (SECTION )    2.00000012 SomeString

0011 - 0009 S_OBJNAME        test26.obj

00000000: 11 00 09 00 01 00 00 00 - 0A 74 65 73 74 32 36 2E  .........test26.
00000010: 6F 62 6A                                           obj

0036 - 0001 S_COMPILE        Microsoft (R) Macro Assembler Version 6.15.8803

00000000: 36 00 01 00 03 03 00 00 - 2F 4D 69 63 72 6F 73 6F  6......./Microso
00000010: 66 74 20 28 52 29 20 4D - 61 63 72 6F 20 41 73 73  ft (R) Macro Ass
00000020: 65 6D 62 6C 65 72 20 56 - 65 72 73 69 6F 6E 20 36  embler Version 6
00000030: 2E 31 35 2E 38 38 30 33                            .15.8803

0015 - 0201 S_LDATA32        T_ULONG g_ArrayPtr

00000000: 15 00 01 02 00 00 00 00 - 00 00 22 00 0A 67 5F 41  .........."..g_A
00000010: 72 72 61 79 50 74 72                               rrayPtr

000A - 0004 S_UDT            1003 Array

00000000: 0A 00 04 00 03 10 05 41 - 72 72 61 79              .......Array

0011 - 0201 S_LDATA32        T_UCHAR String

00000000: 11 00 01 02 00 00 00 00 - 00 00 20 00 06 53 74 72  .......... ..Str
00000010: 69 6E 67                                           ing

002C - 0204 S_LPROC32        1005 $$$00001

00000000: 2C 00 04 02 00 00 00 00 - 00 00 00 00 00 00 00 00  ,...............
00000010: 1B 00 00 00 00 00 00 00 - 1B 00 00 00 00 00 00 00  ................
00000020: 00 00 05 10 00 08 24 24 - 24 30 30 30 30 31        ......$$$00001

0002 - 0006 S_END

00000000: 02 00 06 00                                        ....

000F - 0209 S_LABEL32        Start

00000000: 0F 00 09 02 00 00 00 00 - 00 00 00 05 53 74 61 72  ............Star
00000010: 74                                                 t

000D - 0004 S_UDT            1000 c_msvcrt

00000000: 0D 00 04 00 00 10 08 63 - 5F 6D 73 76 63 72 74     .......c_msvcrt

0012 - 0201 S_LDATA32        1003 g_Array

00000000: 12 00 01 02 00 00 00 00 - 00 00 03 10 07 67 5F 41  .............g_A
00000010: 72 72 61 79                                        rray

0015 - 0201 S_LDATA32        T_UCHAR SomeString

00000000: 15 00 01 02 00 00 00 00 - 00 00 20 00 0A 53 6F 6D  .......... ..Som
00000010: 65 53 74 72 69 6E 67                               eString


Total size of .data object record is 0x17 (2.00000000 .. 2.00000016) , with SomeString starting at 0x12 (2.00000012)

02  .data    Virtual Address         00000000
Physical Address        0000001B
Raw Data Offset         0000013A
Raw Data Size           00000017
Relocation Offset       00000152
Relocation Count        0001
Line Number Offset      00000000
Line Number Count       0000
Characteristics         C0300040
Initialized Data
4  Byte Align
Readable
Writeable


It could be a random act of randomness. Those happen a lot as well.

brixton

Hi clive,

Yes, so using the string's location in the data section, along with the locations of other data in there could give me an indication of its length (but that is all, it can't be 100% reliable, right?).
If you love somebody, set them free.
If they return, they were always yours. If they don't, they never were..

clive

Quote from: brixton
Yes, so using the string's location in the data section, along with the locations of other data in there could give me an indication of its length (but that is all, it can't be 100% reliable, right?).

That's pretty much it. The assembler knows this information but it doesn't export it in the object file, or listing for that matter. The debug records include the name, address, and type information. So you would know if it was an array of BYTE, WORD, or some custom RECORD/STRUCTURE, but you'd have to confine the symbol between other symbols, or the end of a section within the object file.
It could be a random act of randomness. Those happen a lot as well.

brixton

I'm trying to compile 'ed' (from cygwin) with symbol/debugging information.  I added the -g -gcoff switches to gcc, and indeed I get a (massive) symbol table - but I don't get most of the global variables listed in there at all.  Very strange.

From main.c:


static const char * invocation_name = 0;
static const char * const Program_name    = "GNU Ed";
static const char * const program_name    = "ed";
static const char * const program_year    = "2008";

static char _restricted = 0; /* invoked as "red" */
static char _scripted = 0; /* if set, suppress diagnostics */
static char _traditional = 0; /* if set, be backwards compatible */


I can find invocation_name in the symbol table, but cannot find any of the others at all, even using the famous PEDUMP.exe.  Stumped!
If you love somebody, set them free.
If they return, they were always yours. If they don't, they never were..

clive

"static" means that it is local to the object in question, so it isn't going to be exported to pollute the namespace.
It could be a random act of randomness. Those happen a lot as well.

brixton

Interesting - so how come invocation_name is included?
edit:  I removed the 'static const' keywords (eg. char * const Program_name    = "GNU Ed";) and it still doesn't export them..
If you love somebody, set them free.
If they return, they were always yours. If they don't, they never were..

beatrix

Hi,

Just an answer to donkey. He says :

QuoteI have looked at BeaEngine, (as well as DiStorm but that was too language restrictive) a nice package with a very liberal license however there are some issues with GoAsm and the _Disasm@4 export (DLL version). Specifically it requires that you use the /mix switch and that has some adverse affects on the headers, Jeremy is looking at the bug. The lib file that comes with the distribution could not be read by GoAsm, the format was unrecognized

that's true. With the last version (4.0) of BeaEngine, it is not possible to link the lib to goasm program. It is quite annoying :) In fact, BeaEngine 4.0 is now compiled with MingW (gcc). I don't know why, GoAsm does not like lib built with gcc. I just try to use a version compiled with PellesC (original compiler used for BeaEngine) and now, GoAsm is linking properly (and you don't need to use /mix option !). Just try :)


donkey

Thanks beatrix,

I will definitely take a look on the weekend.

Edgar
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

dedndave

very cool code, Beatrix - i may put it to use   :U
oh - and nice to have you visit us

donkey

#55
Hi Beatrix,

Has the Disasm function been changed to C call from STDCALL ? It seems when I loop through and decode 10 instructions ESP is offset by 40 bytes. The code for the disassembly is :

The code section is read directly from a PE file and decoded from there (pMem points the Global buffer that holds the target code)
mov D[usedInstructionsCount],0

// Zero the _Disasm structure
mov ecx,SIZEOF _Disasm
lea edi,DisasmStruct
xor eax,eax
rep stosb

mov eax,[pMem]
mov [DisasmStruct.EIP],eax
mov eax,[dwAddress]
mov [DisasmStruct.VirtualAddr],eax
mov D[DisasmStruct.Options],GoAsmSyntax

mov ebx,[cbSize]

:
invoke beaengine.lib:Disasm,offset DisasmStruct
add esp,4
push eax

invoke wsprintf,offset OutputLine,offset CodeFormat,[DisasmStruct.VirtualAddr],offset DisasmStruct.CompleteInstr
add esp,16

pop eax
add [DisasmStruct.EIP],eax
sub ebx,eax
inc D[usedInstructionsCount]
add [DisasmStruct.VirtualAddr],eax

invoke SendMessage,[hwnd],EM_SETSEL,-1,-1
invoke SendMessage,[hwnd],EM_REPLACESEL,FALSE,offset OutputLine
cmp ebx,0
jg <

"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

beatrix

Has the Disasm function been changed to C call from STDCALL

gloops. No, the lib released on www.beaengine.org uses stdcall convention. The one I gave you in this forum is using cdecl. Sorry for that, I just forgot to specify the desired output and by default, the compiled lib uses cdecl.

@dedndave : thanks :)

donkey

No problem beatrix,

I just wanted to make sure I was using it correctly, I will set it up as a C call. Nice lib by the way, fast and accurate.

Edgar
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

ecube

donkey can you explain why wsprintf seems to work fine with GoASM on 32bit not using your cdecl macro?

donkey

Quote from: E^cube on July 15, 2010, 03:38:41 AM
donkey can you explain why wsprintf seems to work fine with GoASM on 32bit not using your cdecl macro?

Hi E^cube,

Not sure what you mean, it has always worked however you have to adjust ESP directly by calculating the bytes pushed (4 * #parameters), the CInvoke macro calculates the amount to adjust ESP for you but it expands to exactly the same thing that I did manually.

Edgar
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable