News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Unstring CSV-Records

Started by UlliN, January 25, 2007, 09:44:19 AM

Previous topic - Next topic

UlliN

Hi all,

here's an "unstring" procedure dividing a string into substrings. Breaks are done at given field-separators. Field-delimiters like " (to allow embedded separators!) are handled. A typical application is the splitting of a standard CSV-record into its single items.
The proc input are the string to be divided, a one-char field separator, a one-char field delimiter(or space if not available), a (sufficient) table which is filled by the proc with start position and length of each substring.  Returns the count of the substrings. The attachment contains a selfexplaining example.
I hope it's useful to someone.

As always improvement is appreciated ;-)

Ulli


[attachment deleted by admin]

MichaelW

Your prototype needs two more parameters.

Is there some reason for having both separators and delimiters, beyond allowing embedded separators? If not, then I think using separators with an escape character preceding each embedded separator would be easier to understand and use, and easier to code. The escape character would function as an escape character only when it preceded a separator, so it could be hard coded, eliminating one parameter and further simplifying the code.
eschew obfuscation

UlliN

Hello Michael,

it's our usual/daily business, to process large(up to 80mio.recs) adress data files having this CSV-style! Most of these files are exported from mainframe or databases just like EXCEL or MSSQL. Common separators are hex09(tab), comma or semicolon, but we've seen |, #,&,%,/ and other nice characters used as separators. That applies to the field-"decoration" too. In our COBOL-environment we've replaced the COBOL-verb "unstring" (http://www.csis.ul.ie/COBOL/Course/Unstring.htm#verb) by this masm-routine giving our apps a  pretty speed up.
All told we've to provide both delimiter and separator as variable input parameters to keep the customers satisfied ;-) But feel free to modify/simplify the proc to fit your needs. Any speed up of the masm-proc is appreciated.


Regards,
Ulli