Hello GoAsm community,
I'm new to this list, so I apologize if I have posted in the wrong forum. I have a question about what appears to be a PDB age discrepancy.
Taking this article as a starting point:
http://www.godevtool.com/Other/pdb.htm
To summarize, this article has an example that illustrates the structure of the PDB match information stored inside the second PDB stream in a PDB file (the age+GUID is used to match a PDB to its corresponding EXE to load symbols into a debugger). In this article, the age that was found in the exe's CODEVIEW data (age 7) matched the age found in the second stream in the PDB file (again, age 7). However, I am finding pdb's whose internal age does not match the corresponding executable. This is what I did...
dumpbin /headers c:\windows\system32\ntoskrnl.exe
Time Type Size RVA Pointer
-------- ------ -------- -------- --------
4E02AAA3 cv 25 001A300C 1A260C Format: RSDS, {47F5C3BF-9E0A-493C-9F63-BB8F6413358B}, 2, ntkrnlmp.pdb
4E02AAA3 ( A) 4 001A3008 1A2608 BB03197E
Here we see the age is 2 and the GUID is {47F5C3BF-9E0A-493C-9F63-BB8F6413358B}
So then I asked the MS Symbol server for the PDB symbol file for the same ntoskrnl.exe using symchk:
symchk c:\downloads\ntos\ntoskrnl.exe /v /s SRV*c:\downloads\ntos*http://msdl.microsoft.com/download/symbols
...
DBGHELP: ntoskrnl - public symbols
c:\downloads\ntos\ntkrnlmp.pdb\47F5C3BF9E0A493C9F63BB8F6413358B2\ntkrnlmp.pdb
[SYMCHK] MODULE64 Info ----------------------
[SYMCHK] Struct size: 1680 bytes
[SYMCHK] Base: 0x0000000140000000
[SYMCHK] Image size: 6197248 bytes
[SYMCHK] Date: 0x4e02aaa3
[SYMCHK] Checksum: 0x0055c228
[SYMCHK] NumSyms: 0
[SYMCHK] SymType: SymPDB
[SYMCHK] ModName: ntoskrnl
[SYMCHK] ImageName: c:\downloads\ntos\ntoskrnl.exe
[SYMCHK] LoadedImage: c:\downloads\ntos\ntoskrnl.exe
[SYMCHK] PDB: "c:\downloads\ntos\ntkrnlmp.pdb\47F5C3BF9E0A493C9F63BB8F6413358B2\ntkrnlmp.pdb"
[SYMCHK] CV: RSDS
[SYMCHK] CV DWORD: 0x53445352
[SYMCHK] CV Data: ntkrnlmp.pdb
[SYMCHK] PDB Sig: 0
[SYMCHK] PDB7 Sig: {47F5C3BF-9E0A-493C-9F63-BB8F6413358B}
[SYMCHK] Age: 2
[SYMCHK] PDB Matched: TRUE
[SYMCHK] DBG Matched: TRUE
[SYMCHK] Line nubmers: FALSE
[SYMCHK] Global syms: FALSE
[SYMCHK] Type Info: TRUE
[SYMCHK] ------------------------------------
SymbolCheckVersion 0x00000002
Result 0x00130001
DbgFilename
DbgTimeDateStamp 0x4e02aaa3
DbgSizeOfImage 0x005e9000
DbgChecksum 0x0055c228
PdbFilename
c:\downloads\ntos\ntkrnlmp.pdb\47F5C3BF9E0A493C9F63BB8F6413358B2\ntkrnlmp.pdb
PdbSignature {47F5C3BF-9E0A-493C-9F63-BB8F6413358B}
PdbDbiAge 0x00000002
[SYMCHK] [ 0x00000000 - 0x00130001 ] Checked "c:\downloads\ntos\ntoskrnl.exe"
SYMCHK: FAILED files = 0
SYMCHK: PASSED + IGNORED files = 1
So, symchk got the PDB with the same GUID and age as what was stored in the binary. Cool.
However, when I dump the streams from that pdb (using any number of free tools that do so) and view the second stream in a hex editor, the age is not 2:
94 2E 31 01 A7 AA 02 4E [05 00 00 00] [BF C3 F5 47
0A 9E 3C 49 9F 63 BB 8F 64 13 35 8B] 0A 00 00 00
2F 4C 69 6E 6B 49 6E 66 6F 00 01 00 00 00 02 00
00 00 01 00 00 00 02 00 00 00 00 00 00 00 00 00
The GUID in brackets matches the one that was downloaded, but the age is 5 -- not 2. Can anyone explain this discrepancy?
Thanks in advance,
Quote from: lilhoserCan anyone explain this discrepancy?
Probably not totally satisfactorily, but the data on the symbol server, and the executables have been heavily post-processed. That is to say they have been altered by tools you don't have that strip out information Microsoft doesn't want to see, and they have profiled and performed optimization mapping on the code after the link time creation of the PDB.
The discrepancy is observable in quite a number of files. I personally use the time of compilation/linkage, to confirm A goes with B. I'm pretty confident that Microsoft provides the appropriate/useable file when the symbol server is queried.
DUMPPE...
c:\windows\system32\ntoskrnl.exe (hex) (dec)
..
Portable Executable starts at 278
Signature 00004550 (PE)
Machine 014C (Intel 386)
Sections 0016
Time Date Stamp 4E02A381 Wed Jun 22 21:22:57 2011
..
Debug Entry
Chars TimeDate Maj Min Type Size AddrRaw PtrRaw
-------- -------- ---- ---- ---------------------- -------- -------- --------
00000000 4E02A381 0000 0000 00000002 CODEVIEW 00000025 00115DFC 001155FC
00000000 4E02A381 0235 197E 0000000A RESERVED10 00000004 00115DF8 001155F8
CODEVIEW Debug Info
ntkrnlmp.pdb - 5BADD44F6B58A34D931DBDFE9100357D - 00000002
GUID {4FD4AD5B-586B-4DA3-931D-BDFE9100357D}
Opening Socket..
Getting Host..
msdl.microsoft.com -> 65.55.10.11
Connecting..
Sending..
Receiving..
Content-Length 2110369
DUMPPDB...
sn iBlk Size Module
0001 : 00000019 : 00000059
iBlk Blk FileOffs Size
00000019 00001A18 00686000 59
4E02A381 - 00000005 - TDS/Version - Wed Jun 22 21:22:57 2011
5BADD44F6B58A34D931DBDFE9100357D - 00000005 - GUID/Version
00000000 /LinkInfo
0000000A /names
00000002 00000004 00000001 00000006
0 00000000 0000000A
1 00000009 00000000
2 00000004 00000000
00000000: 94 2E 31 01 81 A3 02 4E - 05 00 00 00 5B AD D4 4F ..1....N....[..O
00000010: 6B 58 A3 4D 93 1D BD FE - 91 00 35 7D 11 00 00 00 kX.M......5}....
00000020: 2F 4C 69 6E 6B 49 6E 66 - 6F 00 2F 6E 61 6D 65 73 /LinkInfo./names
00000030: 00 02 00 00 00 04 00 00 - 00 01 00 00 00 06 00 00 ................
00000040: 00 00 00 00 00 0A 00 00 - 00 09 00 00 00 00 00 00 ................
00000050: 00 04 00 00 00 00 00 00 - 00 .........
this may or may not be relevant, but...
there are different versions of ntoskrnl
one of them is selected to become the ntoskrnl that is actually used
ntoskrnl.exe - "standard"
ntkrnlmp.exe - multiprocessor
ntkrnlpa.exe - page address extension
ntkrpamp.exe - multiprocessor and page address extension
also, there are different versions of the linker that generate different pdb formats
I appreciate the ideas.
@clive, Even if the executables were post-processed/stripped/whatever, windbg/symchk still have to know how to match the executable in my windows installation with a PDB file on a symbol server. It was my understanding that this is done by combining the GUID with the age into a unique string that you see in the symbol path (C:\symbols\<file>\<guid+age>\<file>.pdb). In both your sample below and my sample, the symbol server retrieves the correct PDB despite the age value in the second PDB stream not matching what's reported by both dumpbin and symchk/windbg. So my question remains, how is it doing that? Subtracting some pre-determined value from the pdb age?
@dedndave, correct. I am testing on ntoskrnl.exe and the symbol server retrieves the symbols for the mp version. So, I guess that's correct? There actually aren't that many differences in the two binaries, so it makes sense that the symbol files would be the same.
The symbol fetch, whose URL I redacted, pulls the PDB based on the GUID/Version (nee Age) in the PE executable's RSDS tag.
The name/path on the symbol server is unique based on those codes, as the PDB name is rather ambiguous. The local caching replicates the tree structure of the server as I recall.
So it pulls the 5BADD44F6B58A34D931DBDFE9100357D/00000002 version, you get the 5BADD44F6B58A34D931DBDFE9100357D/00000005 variant which is usable.
Quote from: lilhoser on February 21, 2012, 06:41:24 PM
@dedndave, correct. I am testing on ntoskrnl.exe and the symbol server retrieves the symbols for the mp version. So, I guess that's correct? There actually aren't that many differences in the two binaries, so it makes sense that the symbol files would be the same.
well - there are more differences than one might think :P
i cannot say that one would have more or different symbols than another
but, it would seem possible or even likely
i ran across this issue when i noticed that the ntoskrnl.exe file in my
System32 folder did not match the one in my System32\dllcache folder
the one in System32 named ntoskrnl.exe
does match the
one in System32\dllcache named ntkrnlmp (whew !)
i thought this was a strange way to handle it
it would seem more logical for both files to be named ntoskrnl,
as there is little need for the other versions unless i change out the motherboard
but, the one that gets loaded is selected by the boot.ini file
i guess, if i wanted to, i could boot up with the "standard" version and not have multiprocessing :P
the naming method they use allows all versions to be modified during hotifx or service pack updates
NT classically booted the MP version during the setup from CD, and then installed the appropriate NTOSKRNL based on the BIOS, APIC, etc. I also seem to vaguely recall Compaq having there own version for high processor count systems, which may have been NUMA vs SMP, but definitely related to the localization of threads/processes/memory in a manner more appropriate to the hardware than the bog-standard Microsoft release which didn't scale as well.
Basically the difference in builds were how the locking/mutex/semaphore type operations were in-lined in a system appropriate way, as a single processor version could assume a serialized execution of the instruction stream without worrying about coherency across processors/cache/memory, and would consequently run much faster than the multi-processor version which had to care about such things.
Quote from: clive on February 21, 2012, 06:53:54 PM
The symbol fetch, whose URL I redacted, pulls the PDB based on the GUID/Version (nee Age) in the PE executable's RSDS tag.
I kind of feel like we are going in circles here..my apologies.
All of the research I can find on this topic indicates this is not how the debugger matches a PDB to EXE. It uses the GUID+age, because those are the only two bits of information that are both stored in the EXE PE debug info AND in the PDB stream data (ie, there is no timestamp in PDB format). Yet, I am seeing Microsoft EXE's whose debug entry references a PDB age that does not match what is stored in the PDB retrieved by the symbol server. That is my question.
Quote from: clive on February 21, 2012, 06:53:54 PM
The name/path on the symbol server is unique based on those codes, as the PDB name is rather ambiguous. The local caching replicates the tree structure of the server as I recall.
So it pulls the 5BADD44F6B58A34D931DBDFE9100357D/00000002 version, you get the 5BADD44F6B58A34D931DBDFE9100357D/00000005 variant which is usable.
The folder name is <GUID>+<age>. Ie, for my original example with ntoskrnl.exe:
[SYMCHK] PDB: "c:\downloads\ntos\ntkrnlmp.pdb\47F5C3BF9E0A493C9F63BB8F6413358B2\ntkrnlmp.pdb"
...
PdbSignature {47F5C3BF-9E0A-493C-9F63-BB8F6413358B}
PdbDbiAge 0x00000002
The GUID is 47F5C3BF-9E0A-493C-9F63-BB8F6413358B and the age is 2, so the folder name is 47F5C3BF9E0A493C9F63BB8F6413358B2
Quote from: lilhoser on February 21, 2012, 08:33:44 PM
I kind of feel like we are going in circles here..my apologies.
It is cataloguing and caching the PDB files based on the values in the EXE file. ie 2
You, in this case, request version 2, it sends you a PDB image with version 5 inside. The treeing/naming all reflects version 2.
The age/version values in the PDB don't always match the EXE values, Microsoft provides you with an equivalent file with a level of detail that mirrors your need-to-know.
PDB files have always had time date stamps, the GUID method only occurred in later versions of the PDB format. As Dave noted there are several, I'll note that some of Microsoft own tools are incapable of parsing older versions properly.
PdbDbiAge 0x00000002 <<<<
This is a ruse, the file is 0x00000005
Notice also the discrepancy of when the PDB was created by LINK, and it's file date from the server.
Microsoft (R) Cabinet Extraction Tool - Version 6.1.7600.16385
Copyright (c) Microsoft Corporation. All rights reserved..
Cabinet symget.cab
06-23-2011 11:43:50p A--- 8,752,128 ntkrnlmp.pdb
1 File 8,752,128 bytes
06/23/2011 11:43 PM 8,752,128 ntkrnlmp.pdb
sn iBlk Size Module
0001 : 00000020 : 00000059
iBlk Blk FileOffs Size
00000020 0000213D 0084F400 59
4E02AAA7 - 00000005 - TDS/Version - Wed Jun 22 21:53:27 2011
BFC3F5470A9E3C499F63BB8F6413358B - 00000005 - GUID/Version
All that matters to the debugger, or more importantly you the user, is that the symbolic information contained in the PDB is in sync with the executable being debugged. ie a direct and correct correlation between addresses and symbols in the two pieces.
I should probably improve the byte swizzling of the GUID in DumpPDB, I just dumped the 16 linear bytes because it was easier.