News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Putting a null at the end of a string

Started by allynm, July 29, 2009, 01:37:52 AM

Previous topic - Next topic

allynm

Hi dedndave -

The [ecx -2] thing was extremely sweet.
Ollydbg loved it.

Mark

ToutEnMasm


Quote
The [ecx -2] thing was extremely sweet.
Ollydbg loved it.

I agree.
Outside of a code,there is no particular reason to prefer one method to the other.
Both works.

sinsi

Remember that a lot of text files (e.g. linux .c files grrr) only have 0a as a new line, not 0d 0a.
If you are using ReadFile, it could be a problem, you might lose the last character.

I read somewhere recently about the 'end of line' char, unix/mac/pc all use different chars, and then there's unicode...
Light travels faster than sound, that's why some people seem bright until you hear them.

hutch--

They surely do, PC historically uses 13 10, Unix uses 10 and a MAC use 13. A richedit control 2.0 and up uses the MAC 13 so editing bits fished out of a late rich edit control has to be done differently.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

if you want to write code that travels across all those platforms, then scasb is a good idea
still, for line input, i would think you could start the scas at the end of the string
you could look at the last 4 chars, scan for 0Dh, scan for 0Ah, and replace the earliest with a 0

hutch--

Dave,

The linear scanner code to handle any combination of 13 10 is easy enough to write, a normal character scanning loop with a subloop on either 13 or 10 that does not exit until a different char is tested. On exit from the subloop just write whatever line terminator you want if in fact you need one otherwise you can use the technique to do your own conversion from hardcoded LF combinations to wordwrap format.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

in other words, make it flexible, eh, Hutch ?

allynm

Hi everyone -

I didn't know about system differences in appending 0D 0A.  That was quite informative.  Thanks, Sinsi and Hutch.  To the comments made by Hutch and Dedndave:  I realized also that flexibility was important feature and that the poor old 0A hanging out there all by itself after I expunged the 0D must be kind of lonely.  So in my SCASB proc i stuck in the flexibility to get rid of either character.  In the process I got familiar again with the EDI during string processing.  So it was a good drill. 

I think I'd like to start a new thread concerning the availability of code that can count the number of cycles different instructions require. 

Regards all,

Mark A

dedndave

hi Mark
the 0Ah can be lonely and doesn't mind - lol
what was meant by "flexible" was the routine should be usable to terminate the line in any fashion that needs dictate (not just 0)

as for timing routines, refer to the Laboratory subforum
the first post of the first thread (sticky) has the timers.asm and counters2.asm macros we use for timing code, written by MichaelW
as asm programmers, we love to spend much toil over that very subject
playing with routines again and again until the optimal result is found is a lot of fun for us
it is a great way to learn to write good code
or, maybe, we are just geeks - lol

hutch--

I posted an algo in the Lab that converts any CR LF combination to standard PC CRLF. May be useful to someone and thank God it does not use SCASB.  :bg
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

lol Hutch
i saw that - you could add a dword-aligned word-sized parm to that routine and let the caller name the terminator
whatever the user calls in that value becomes the term

hutch--

Zero is the spec for non length recorded strings, the alternative is Pascal style string like Basic uses where you store the length in the leading 4 bytes of the string, that is how OLE strings do it in either ANSI or UNICODE. For what its worth, zero terminated strings are simply faster.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

BATSoftware

A couple of notes from 25+ years programming experience:

1. ASCIZ strings are slow to manage but since "C" uses them and Windows is "C" based you should use them primarily when dealing with the WIN32 API.
2. OLE/BSTRs/ASCIC are the fastest to manage and are becomming far more important since OLE is taking over. BASIC/PASCAL and to some extent FORTRAN use counted strings. A very important guildline to engineering fast code is never do the same thing twice unless you have too. This rule is fundamental to prodecural programming and violated massively by OOP programming.

Example of why counted strings are superior - strlen function:

1. Null terminated string:

LEA EDI,STRADR
XOR EAX,EAX
MOV ECX,-1
PUSH EDI
REPNE SCAB
MOV EAX,EDI
POP ECX
DEC EAX
SUB EAX,ECX

or
LEA ESI,STRADR
XOR ECX,ECX
@@: LODSB
INC ECX
TEST AL,AL
JNE @B
DEC ECX


(note the number of registers used - aleast 3 - address, accumulator, counter.)
versus:
2. counted strings

LEA ESI,STRADR
LODSD/B
(two registers used)

Which do you think is faster?

Try strcat and the performance gain by counted string s even better. Think about where the performance gain is....For NULL terminated strings, the the overhead cost in CPU grows geometrically in direct proportion to the string size. Counted string have a fixed over head cost.

Universally useful tip: NULL terminate counted strings. Using this method (which is also adopted by the STRINGTABLE resource and OLE allows counted strings to also be used where ever NULL terminated strings are required - just use the starting address + 1/4.

I use the following strings types in my programming:

1. NULL terminated strings - used when receiving/sending sending variable sized strings to WINAPI or C styled functions (mainly user input or WINAPI results)..
2. Counted strings/NULL terminated - used for static string useage, display, prefixing/suffixing, control, resources.
3. OLE strings(Counted DWORD prefix count) - interfacing to OLE
4. Descriptor strings (strings are passed as a 2 DWORD desriptor - size and address) - when starting or length is variable/calculated values.

Learn the benefits of all four types of strings and you will master efficent string management.


Good luck!

hutch--

OLE strings may be flexible is you bother to set up the support framework for them but they are fundamentally sllow as they involve memory allocation for any change of length, greater or smaller. You can in fact treat an allocated OLE buffer as just a memory buffer but the gives it no advantage over any other memory strategy.

In BYTE sized characters zero terminated strings may be crude but they are also fast, it comes at the price that you must know what you are doing in terms of setting your buffer size. Now in this world you get nothing for nothing, to create an OLE string you must know its length in much the same way as and other BYTE or WORD array, the difference with zero terminated strings is you only have to have a buffer that is big enough so if you know the maximum range, you spare yourself the hassle of having to get the length each time you allocate the string.

REPNE SCAB is a particularly slow combination but LODSB without the prefix is even slower, unless you are writing 16 bit DOS software or for an obscure couple of old AMD processors, treat them like the plague and use registers with incremented address pointers, the latter being much faster on Intel hardware from the 486 upwards.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php