News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

File Parsing... request (Text)

Started by Draakie, March 19, 2007, 09:37:33 AM

Previous topic - Next topic

Draakie

Feeling a bit blonder than usual today + I'd like to get one up on the hubby. :bdg

Some advice or your thoughts around text file parsing and subsequent string manipulation. My
direct targets are implimentations of 3ds or XML file parsers. These always seem overly bothersome
to code as I possibly lack a better workaround. Currently I have something like this :

1. Load Source into Memmory Mapped File so that we've got fast memmory based access to the various components of interest
      (Multipile scans possible without disk-access overhead - plus handy memmory based indexing - see [3])
2. Use a Search Algorithm to search Set - "till end of File" for Header Structures.
3. Use .CASE or .IF  to identify particular Header Structures and build a Memmory based index array.
4. Use string based inspection - char for char - to determine and then convert values to User/Program Arrays and constants.

On second thought - I'am actually really looking for a better way of dealing with point [4]. Wild card pattern matching on the
per character level - without the overhead of generic search algorithms. Some-one got one lying about ? :lol A binary dword
mask search perhaps....... something like
Mark Larson's : Different tricks you can do with masks created by comparison instructions using MMX/SSE/SSE2
as found on http://www.mark.masmcode.com/ ......

Thanx
Draakie



Does this code make me look bloated ? (wink)

gabor

Hi!

I made an XML parser some time ago, it is based on the finite state machine theory. You can get it from my website, with all sources and a small documentation. If you have luck it will work too :) However, I would advice to have a look on it only, to know about this way of parsing, since it lacks some features (the special characters of some languages are not treated correctly). If you would use it to process XML data consisting of simple english markups then it will work nicely...
The parsing way used there (based on FSM) is a universal method, no matter what input is processed. FSMs are created for parsing and recognizing words, sentences of any languages in the first place!

To you other question I'm sorry but I cannot tell anything because I didn't even fully understand it  :red

Greets, Gábor

Draakie

Thanx Gabor - will have a look.... :P

As to my rather chaotic query ->
Example : 3ds (3dstudio) text files contain such data as vertex lists, face lists etc. delimited by ';'
I am trying to find out if there is a better way of searching for and finding this delimiter.

0.99282 ; 0.56261 ; 0.7827737 etc.   //  1;2;3;3;2;1;3;4 etc.
^          ^            ^

Thus beieng able to convert and save the floats and constants. At the moment I'am using a totally
inefficient lodsb / test / stosb / etc. I understand that a bitmask method would be better - but the
way that something like that would look like escapes me.
Does this code make me look bloated ? (wink)

zooba

A bitmask method may help, but for short runs like this I doubt it would provide much in terms of speed improvement (and it would definitely send you backwards in terms of code readability).

About the best optimisation you can get here is to buffer the text into memory before parsing it. I believe that using a file map actually does this, so you're probably fine.

Best of luck,

Zooba :U

u

I've never seen such 3ds file (though I've only looked at about only a hundred) - could you attach a small one?
I usually read arrays by first putting the data-pointer to esi, and use custom procs like "ReadFloat", "ReadDword", "ReadString", that modify esi and return whether they did succeed (or reached end of line/file or some incorrect syntax).
This way I've done .obj (WaveFront object, for 3D data) and .lws (lightwave scene, similar to XML, but looks like C++), apart from custom text-file-formats (mesh materials and props, assembler pre-processor, and the like).

How big are the files you expect to open? Or you actually target opening of many 1MB files at once? Because a simple ReadFile(whole) does the trick the best, usually (at least is the easiest - you put a 0 character at the end, and it simplifies the code!).

Please use a smaller graphic in your signature.

PBrennick

Ultrano,
I think that is a nice idea. With a little work, an editor can be modified to do a dedicated task. That way, a lot of the boring coding is already done.  :U

Paul
The GeneSys Project is available from:
The Repository or My crappy website

Draakie

I've never seen such 3ds file (though I've only looked at about only a hundred) - could you attach a small one?


Oops - got confused with DeleD (text)  :red - Erm...... Blonde ! - although I do remember 3D studio having a
Text based object model output....but anyway....

Thanks Ultrano.....will try your method with 3ds (binary) files.
Does this code make me look bloated ? (wink)

Biterider

Hi
I use a CAD system to generate my models. The output is a XML based file called 3DXML from Dassault systems. MS has taken this format too (if you believe the press news).
I wrote a parser using MSXML libraries. Attached you can find the parser and the code that uses it. It is written using OOP, but you can extract the code you need. Take a closer look at the D3D_Mesh.LoadFromXMLFile method.

Regards,

Biterider


[attachment deleted by admin]

Draakie

Thanx master BiteRider... :bg - will peruse tonight ta...
Does this code make me look bloated ? (wink)