News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Doc Comment Parsing

Started by Neo, June 12, 2008, 04:08:43 AM

Previous topic - Next topic

Neo

It sounds strange to be asking about documentation here, but I'm wondering on what sort of formats that people use for in-code documentation to try to extract that documentation.  Examples, descriptions, or advice would be very helpful.  :toothy  Javadoc is fairly standardized, but I'm talking about not-so-standardized formats, like the one I use for assembly, which is compatible with Natural Docs.

In the example below, I'd want to extract that what follows is a procedure named OpenFileNTFS (from the PwnOS code), the description text for it, the parameter / local variable names and associated descriptions, and return description.  I can parse this format, but I'm wondering how to go about parsing doc comments when they aren't in any particular format.

;********************************************************************************************************************************
;*                                                                                                                              *
;*  Procedure: OpenFileNTFS                                                                                                     *
;*                                                                                                                              *
;*  This procedure opens a file from an NTFS partition.  It should only be called from <OpenFile>.                              *
;*                                                                                                                              *
;*  *TODO:* Add handling for Access value and Creation value.                                                                   *
;*                                                                                                                              *
;*  Parameters:                                                                                                                 *
;*      pPartition      - address of <PARTITIONINFO_NTFS> structure for the NTFS partition                                      *
;*      pName           - address of unicode filename with no preceding protocol                                                *
;*      pDirectory      - address of <FILE> structure for the directory to which the filename is relative, or NULL if absolute  *
;*      Access          - access options                                                                                        *
;*      Creation        - creation options                                                                                      *
;*      Flags           - miscellaneous                                                                                         *
;*                                                                                                                              *
;*  Local Variables:                                                                                                            *
;*      pHeap           - address of the heap on which to allocate memory                                                       *
;*      MFTRecordNum    - <NTFSMFTREF> to keep track of the MFT record number of the file found                                 *
;*                                                                                                                              *
;*  Returns:                                                                                                                    *
;*      - address of the <FILE> structure or NULL if the file doesn't exist or couldn't be opened                               *
;*                                                                                                                              *
;<*******************************************************************************************************************************

hutch--

Neo,

I tend to get the same problem when I have to write help files and it depends a lot on how I wrote the code in the first place. With an eample of the type you have shown its simple enough to extract each comment block out of source code but it will depend a lot on how your display engine works as to how you format the results.

Striping out the comments is simple enough but you have to make some decisions on what is displayed as fixed formatted text and what you allow to wordwrap. I still do it all manually as i have a habit when writing code collections of adding the description of each algo to a seperate text file or alternatively in a commented section at the start of the algo in much the same manner as your example.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Neo

Sorry, I think I asked quite ambiguously.   :red  What I meant was that I'm trying to write code for PwnIDE that will scan through other people's code to find doc comments and try to sort out as much information as reasonable.  I've already written code to scan through and understand the strict format I showed (since that's the format it gets output in), but I'm wondering what formats other people use so that I can try to have the code figure out those formats too.  That's why I'm looking for some examples of other people's doc comments.

The current way that the data is displayed in the IDE is actually quite cheezy and generally crummy-looking, but just this weekend, a friend and I came up with a much better way of displaying it that beats the crap out of the current stuff, though it'll be a while to implement.  It'll be exciting to see how it plays out, since it should scale up to much more advanced documentation like diagrams and charts.  :8)

Tedd

I think you'll find it very difficult to get it to work for arbitrary free-flow text - there are just too many variations and arrangements, not to mention language, spelling/typing errors, abbreviations, etc..

I suppose I would go for trying to identify the 'heading' keywords ("parameters", "arguments", "returns", "comments", "description", "note" - plus 100 variations of each) and take those as section markers.
But even then, it will only work when the comment block is arranged in such a logical way. Although, anyone who takes the time to write comment block in the first place, will probably have some kind of logical structure for them.

But after saying that, looking through some of my commented code, my 'section headings' are largely inherent.
The general structure seems to be:

;function declaration (semi C-style)
;one-line summary/description of the function
; argument1        paragraph explaining the purpose of the argument,
;                  with extra lines aligned underneath by whitespace
; argument2        ..etc..
;Returns: description of return values, etc
;Note: any notes, special cases to mention

Some headings may not be present, and new ones may appear as required.
No snowflake in an avalanche feels responsible.

Neo

Quote from: Tedd on June 18, 2008, 01:04:18 PM
I think you'll find it very difficult to get it to work for arbitrary free-flow text - there are just too many variations and arrangements, not to mention language, spelling/typing errors, abbreviations, etc..

I suppose I would go for trying to identify the 'heading' keywords ("parameters", "arguments", "returns", "comments", "description", "note" - plus 100 variations of each) and take those as section markers.
Actually, after thinking about this a bit more, you're right that it really is an option just to not handle every case, because when in doubt, I can always just treat stuff as part of the main description.  The documentation doesn't get completely lost, it just won't be perfectly and automatically associated with a parameter or other item.

QuoteBut even then, it will only work when the comment block is arranged in such a logical way. Although, anyone who takes the time to write comment block in the first place, will probably have some kind of logical structure for them.
My IDE automatically formats doc comments in the format I posted earlier, and it's as easy as just filling in the blanks, so it should shave a ton of time off writing documentation.   :wink

QuoteBut after saying that, looking through some of my commented code, my 'section headings' are largely inherent.
The general structure seems to be:

;function declaration (semi C-style)
;one-line summary/description of the function
; argument1        paragraph explaining the purpose of the argument,
;                  with extra lines aligned underneath by whitespace
; argument2        ..etc..
;Returns: description of return values, etc
;Note: any notes, special cases to mention

Some headings may not be present, and new ones may appear as required.
Thanks for the example!  :bg