checking for end of line

NightWare · June 28, 2009, 11:24:10 PM

Quote from: ramguru on June 28, 2009, 10:46:32 PM
yeah .. it's the same thing
you should check not for \00 byte
but when the last byte is read

yeah, yeah, and i suppose you will re-write all your string functions for the case where it's a file ? just because you're too lazy to increment the file size when you allocate the memory ? very efficient strategy, no doubt...

dedndave · June 28, 2009, 11:45:51 PM

i think you misunderstand him, Night
he is saying that EOF is a condition when you have read the entire contents of the file
that is correct for many (probably most) files types
for text files, however, once the EOF character has been recognized, the remainder of the file is "don't care"
many text editors (especially old ones - lol) store stuff beyond the EOF character like margin/tab lines and fonts that aren't rendered

Mark Jones · June 28, 2009, 11:56:38 PM

Travis, perhaps try testing for all base conditions---do not worry about all the possible permutations. "Newline" on any CR or LF and skip any of the other; "EOF" on any null or (size of file - current position == 0.) If the source is unicode (should detect this when first opened) use appropriately adjusted routines. Get ASCII working long before attempting Unicode. English ASCII-in-Unicode is nothing more than "ASCII byte, null" so that is easy to test, but other languages utilize that second byte and will have to be handled appropriately, which (IMO) can get very complicated due to the large number of codepages and languages. YMMV.

Planning is 9/10ths of the law here like Huch suggested. You could consider using an XLAT table to branch on characters; for instance, "newline" on 13 or 10, end on 00, "command" on ".", "label" on ":", "default handler" for the other remaining chars, etc although admittedly, this sounds like it could be rather slow. Experiment, find what works first, then try to improve it later.

dedndave · June 29, 2009, 12:14:39 AM

that is something i never really thought about
i suppose they have unicode ASM files, though

the problem Travis was having originally sounded like something was amiss
as though, perhaps, his EBX register had been overwritten

but Mark has the right idea - look for either CR or LF or EOF as EOL's
parse until the char is NOT CR or LF to find the start of next line (assuming no EOF)

many compilers and assemblers have a "beautifier" pass prior to assembling/compiling passes
they strip out remarks/tabs/extra spaces, terminate lines with nulls (perhaps), in some cases, they "tokenize" instructions and/or directives
this makes the source much smaller to work with and simplifies the assembly/compile pass parsers

Mark Jones · June 29, 2009, 12:46:17 AM

I've actually put a lot of time into thinking of new assembler ideas Dave. Even started one, a multi-threaded package which had all the hope of being something new and unique with advanced features and all that jazz, but alas couldn't bring myself to finish "yet another assembler" and futher divide the already thin assembler programming community. So I started it, and stopped. Then I thought, I would finish it for my own use, but even then can't justify writing a new dialect. So I'm back to using MASM and GoASM. Even that is stretching it, sometimes I feel like I'm getting dumber instead of smarter... I would use GoASM exclusively if I could, but it is not the "de-facto" standard, so sharing code has a 50/50 chance of being ignored... :lol

Note that semantically, there is no such distinction as a "newline" when parsing a file, even though our brains try to segment files into "lines" for simplicity. Someone will undoubtedly argue the point, however the file is read as bytes+offset, so the "newline" character(s) are only a "newline" in the context of human-readable text, and not parsed data. This is not to say that newlines may be ignored, certain things must be handled in their presence, but I would try to think of newline characters as any other conditional encountered when parsing the file data. Treat everything literally, and make it as simple as possible to start with.

dedndave · June 29, 2009, 01:48:38 AM

well, i am sure most of us have considered it at one time or another
you just got farther than we did - lol
i had considered writting one that would support a few of the more popular syntaxes
so, you could have .MASM, .FASM, .NASM directives (and so on) and switch on the fly
i think GoAsm, PoAsm, and RadAsm have us covered, though
they may not do what i was thinking of, but they do a great job and the authors listen to programmers wishes
i, like many, merely wanted to remove ms from the picture
i think if i was to write an assembler, i would start with the basics and design expansion into it
with the variety of CPUs to support, that would be a major task on it's own
then add things like macro support, and other features, one step at a time

assemblers are not really that difficult - there is just a lot to cover
debuggers/disassemblers are another matter
i don't feel i am qualified to attack that one, yet
i need to build on my knowledge of windows OS's, first
but, i see a demand for certain things in that area

Farabi · June 29, 2009, 11:09:54 AM

HI travis,
I've made a text file handler here http://www.masm32.com/board/index.php?topic=11392.0
How to useit is simple, and it fast enough. Just ask me if you dont understand how to use it.

travism · June 29, 2009, 11:25:14 AM

Guys I was saying the end of a line... Not the end of file the end of file is easy to check for...oh and btw this isn't for the assembler I'm not going to be starting that for a very long while thanks everyone for ur help.. I'm sure ill figure something out sooner or later

Tedd · June 29, 2009, 11:30:36 AM

Windows/DOS = CR,LF (13h,10h)
Linux/Unix = LF (10h)
Mac OSX = LF (10h)
{Old Mac = CR (13h)}

[LF,CR is sometimes used to insert 'soft' linebreaks for word-wrapping, but you shouldn't see these saved in a text file - i.e. treat it as two linebreaks.]

If you just want to know where the line ends, you can simply check for either CR or LF, and end the line there.
If you want to parse line-by-line, then the next line starts after a LF, or if CR it starts after the CR unless the next character is LF then it starts after that.

And since you can't guarantee the file ends on a newline, you need to check against the file size. (This is why most HLLs mandate the source file should end with a newline - so you only need to check against the file size after each line read, and not on every character.)

travism · June 29, 2009, 11:55:37 AM

Thank you all for ur help I got it working perfectly I needed to check for 13,10 and zero lol.

ToutEnMasm · June 29, 2009, 01:09:25 PM

and here the bests methods to find them.
http://www.masm32.com/board/index.php?topic=11061.msg81555#msg81555

News:

checking for end of line