Print Page - Putting a null at the end of a string

Title: Putting a null at the end of a string
Post by: allynm on July 29, 2009, 01:37:52 AM

Hello everyone -

When we initialize a string using the conventional notation var db " ", 0 the assembler kindly places a null at the end of the string in memory. Let's change the scenario slightly. Suppose I read into memory a string of ASCII characters using, for example, ReadFile. When I look at memory I will discover that 0D, 0A have been appended to the string because obviously I hit the carriage return at the end of the input and a newline was generated and the readfile command counts these characters as if they were part and parcel of the string. I would like to know how you folks think I could get rid of the newline charactes and replace them with a NULL so that the string looks like it "should" if it were initialized as such.

A small, probably silly, question.

Thanks as always,
Mark Allyn

Title: Re: Putting a null at the end of a string
Post by: dedndave on July 29, 2009, 01:46:28 AM

for line input from the console, they are always there, i think (for real mode DOS, it was only a ODh)
probably the fastest way is to use the NumberOfBytesRead value as an index and replace the ODh with a 0

mov ecx,NumberOfBytesRead
mov byte ptr InpReadBuffer[ecx-2],0

i think that will work - lol
well, you get the idea, anyway

Title: Re: Putting a null at the end of a string
Post by: hutch-- on July 29, 2009, 01:59:30 AM

Mark,

Just scan the string and when you find the ASCII 13, write an ASCII zero in its place.

Title: Re: Putting a null at the end of a string
Post by: dedndave on July 29, 2009, 02:01:55 AM

i was gonna say that, too, Hutch
but it is slower, and requires use of edi
and, you still wind up putting the count in ecx - may as well just go there and terminate it
i suppose you could use std and start at the end to speed it up a bit

Title: Re: Putting a null at the end of a string
Post by: allynm on July 29, 2009, 02:09:58 AM

Hi dedndave and hutch -

I will try both of your suggestions. Thanks for your help. As dedndave points out it does require edi, but I am still experimenting with the Intel instructions so it will be fun to see what happens. This discussion does remind me of something that I think JJ (jochen?) mentioned (might have been MichaelW) in a recent posting concerning the number of cycles consumed by various instructions. I think he reported results for a Celeron processor...but, more generally, is there some code around that can compute cycles for a 386?

Mark

Title: Re: Putting a null at the end of a string
Post by: hutch-- on July 29, 2009, 02:12:04 AM

The "input" macro already removes the CRLF.

Code Select


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    .data?
      value dd ?

    .data
      item dd 0

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL inp$  :DWORD

    mov inp$, input("Hmmmm, type something")

    print inp$,13,10

    print str$(len(inp$))," characters long",13,10

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start

It uses this library module.

Code Select


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    .486
    .model flat, stdcall
    option casemap :none   ; case sensitive

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE 
OPTION EPILOGUE:NONE 

StripLF proc src:DWORD

    mov eax, [esp+4]
    sub eax, 1
  @@:
    add eax, 1
    cmp BYTE PTR [eax], 0
    je tlfout
    cmp BYTE PTR [eax], 13
    jne @B

    mov BYTE PTR [eax], 0
    ret 4

  tlfout:
    ret 4

StripLF endp

OPTION PROLOGUE:PrologueDef 
OPTION EPILOGUE:EpilogueDef 

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end

Title: Re: Putting a null at the end of a string
Post by: allynm on July 29, 2009, 02:41:11 AM

Hi Hutch -

Thanks for clarifying what INPUT will do. I really did not know that this macro would accomplish this. I am reading the code to mean that I don't actually need to use edi. Is this correct?
Mark

Title: Re: Putting a null at the end of a string
Post by: hutch-- on July 29, 2009, 03:39:12 AM

Mark,

The only time you are required to use either EDI or ESI specifically is when an instruction requires it, mainly the older string instruction like movsb, scasb etc .... and their WORD and DWORD counterparts. In that context ESI EDI are respectively the source and destination indexes. Apart from these usages they can be used as general purpose registers like any of the others. Just remember if you need to use either in a proc that you must preserve and restore their content at the beginning and end of the proc.

Title: Re: Putting a null at the end of a string
Post by: allynm on July 29, 2009, 12:26:49 PM

Hi Hutch -

This question may deserve a new thread....but, I'll take the plunge anyway.

I knew the ESI:DSI requirement on the string instructions. I'm curious to know what you meant when you characterized movsb, scasb, etc as "older"...if you have a moment, what were you thinking of as "newer"?

Regards,
Mark

Title: Re: Putting a null at the end of a string
Post by: Slugsnack on July 29, 2009, 12:40:48 PM

as time went on the x86 instruction set was expanded to add functionality. a few examples of 'newer' instructions is PUSHAD/POPAD, MOVZX are just a few off the top of my head

here you go though :
http://en.wikipedia.org/wiki/X86_instruction_listings

Title: Re: Putting a null at the end of a string
Post by: ToutEnMasm on July 29, 2009, 01:10:53 PM

another soluce to do it is (if you know the number of bytes written):
get the adress of buffer
add the number of bytes written to this address.
sub 2 to this adress and put the zero at this adress,this replace the 0D by zero.

Title: Re: Putting a null at the end of a string
Post by: dedndave on July 29, 2009, 01:16:43 PM

i tried that one, Yves - nobody seems to like that idea - lol
http://www.masm32.com/board/index.php?topic=11969.msg90899#msg90899

Title: Re: Putting a null at the end of a string
Post by: hutch-- on July 29, 2009, 01:22:18 PM

Mark,

The string instructions (MOVSB, SCASB etc ....) were designed in the 8086 days and required specific registers to function. Since the beginning of 32 bit x86 there have usually been faster ways to do this type of function, usually by loading the address into a register and performing the operation on each data size (BYTE WORD DWORD) then incrementing the pointer to the next data item. Ther are a couple of exceptions with MOVSD etc ... but only with the prefix REP and only over a certain data length.

Apart from the stack pointer ESP and most of the time the base pointer EBP you can use any register for anything, you can use EBP if you write a "no stack frame" proc and if you are really desperate and know what you are doing you can occasionally even use ESP but the general drift is with freestyle code that uses the registers available without being limited by instruction choice, you can write anything you like.

Another factor is chip silicon space, many of the slower instructions live in microcode where the fast simpler instructions live on direct silicon pathways and this gives you a pseudo RISC range if you need to keep your code fast. In most other things it does not matter if you use the older slower opcodes as long as your code is reliable and properly preserves registers.

Title: Re: Putting a null at the end of a string
Post by: allynm on July 31, 2009, 01:33:31 AM

Hi dedndave, Yves (tout en Masm), and Hutch--

I actually LIKED dedndave's solution quite a bit. I coded up dedndave's solution and also the search via scasb and dedndave's soution was quite elegant in comparison. ON THE OTHER HAND, for those of us (Me!) still coming to grips with how to use the string instuctions that Hutch has described, it is worth going thru the scasb thing. I wrote the code both ways and profited each time I did it.

Thanks,

Mark Allyn

Title: Re: Putting a null at the end of a string
Post by: dedndave on July 31, 2009, 01:55:57 AM

i am a bit behind the times, Mark
i have years of experience with all the microprocessors that you could think of that are obsolete - lol
so, these guys have a definite edge on me when it comes to knowing when to use which instructions
but, i thought the [ecx-2] thing was a sure winner ! - lol

Title: Re: Putting a null at the end of a string
Post by: allynm on July 31, 2009, 02:04:21 AM

Hi dedndave -

The [ecx -2] thing was extremely sweet.
Ollydbg loved it.

Mark

Title: Re: Putting a null at the end of a string
Post by: ToutEnMasm on July 31, 2009, 06:34:08 AM

Quote
The [ecx -2] thing was extremely sweet.
Ollydbg loved it.

I agree.
Outside of a code,there is no particular reason to prefer one method to the other.
Both works.

Title: Re: Putting a null at the end of a string
Post by: sinsi on July 31, 2009, 06:55:10 AM

Remember that a lot of text files (e.g. linux .c files grrr) only have 0a as a new line, not 0d 0a.
If you are using ReadFile, it could be a problem, you might lose the last character.

I read somewhere recently about the 'end of line' char, unix/mac/pc all use different chars, and then there's unicode...

Title: Re: Putting a null at the end of a string
Post by: hutch-- on July 31, 2009, 06:59:54 AM

They surely do, PC historically uses 13 10, Unix uses 10 and a MAC use 13. A richedit control 2.0 and up uses the MAC 13 so editing bits fished out of a late rich edit control has to be done differently.

Title: Re: Putting a null at the end of a string
Post by: dedndave on July 31, 2009, 08:45:31 AM

if you want to write code that travels across all those platforms, then scasb is a good idea
still, for line input, i would think you could start the scas at the end of the string
you could look at the last 4 chars, scan for 0Dh, scan for 0Ah, and replace the earliest with a 0

Title: Re: Putting a null at the end of a string
Post by: hutch-- on July 31, 2009, 01:05:09 PM

Dave,

The linear scanner code to handle any combination of 13 10 is easy enough to write, a normal character scanning loop with a subloop on either 13 or 10 that does not exit until a different char is tested. On exit from the subloop just write whatever line terminator you want if in fact you need one otherwise you can use the technique to do your own conversion from hardcoded LF combinations to wordwrap format.

Title: Re: Putting a null at the end of a string
Post by: dedndave on July 31, 2009, 01:33:48 PM

in other words, make it flexible, eh, Hutch ?

Title: Re: Putting a null at the end of a string
Post by: allynm on July 31, 2009, 02:03:18 PM

Hi everyone -

I didn't know about system differences in appending 0D 0A. That was quite informative. Thanks, Sinsi and Hutch. To the comments made by Hutch and Dedndave: I realized also that flexibility was important feature and that the poor old 0A hanging out there all by itself after I expunged the 0D must be kind of lonely. So in my SCASB proc i stuck in the flexibility to get rid of either character. In the process I got familiar again with the EDI during string processing. So it was a good drill.

I think I'd like to start a new thread concerning the availability of code that can count the number of cycles different instructions require.

Regards all,

Mark A

Title: Re: Putting a null at the end of a string
Post by: dedndave on July 31, 2009, 02:21:26 PM

hi Mark
the 0Ah can be lonely and doesn't mind - lol
what was meant by "flexible" was the routine should be usable to terminate the line in any fashion that needs dictate (not just 0)

as for timing routines, refer to the Laboratory subforum
the first post of the first thread (sticky) has the timers.asm and counters2.asm macros we use for timing code, written by MichaelW
as asm programmers, we love to spend much toil over that very subject
playing with routines again and again until the optimal result is found is a lot of fun for us
it is a great way to learn to write good code
or, maybe, we are just geeks - lol

Title: Re: Putting a null at the end of a string
Post by: hutch-- on July 31, 2009, 02:29:51 PM

I posted an algo in the Lab that converts any CR LF combination to standard PC CRLF. May be useful to someone and thank God it does not use SCASB. :bg

Title: Re: Putting a null at the end of a string
Post by: dedndave on July 31, 2009, 02:40:09 PM

lol Hutch
i saw that - you could add a dword-aligned word-sized parm to that routine and let the caller name the terminator
whatever the user calls in that value becomes the term

Title: Re: Putting a null at the end of a string
Post by: hutch-- on July 31, 2009, 02:46:49 PM

Zero is the spec for non length recorded strings, the alternative is Pascal style string like Basic uses where you store the length in the leading 4 bytes of the string, that is how OLE strings do it in either ANSI or UNICODE. For what its worth, zero terminated strings are simply faster.

Title: Re: Putting a null at the end of a string
Post by: BATSoftware on August 01, 2009, 01:09:39 PM

A couple of notes from 25+ years programming experience:

1. ASCIZ strings are slow to manage but since "C" uses them and Windows is "C" based you should use them primarily when dealing with the WIN32 API.
2. OLE/BSTRs/ASCIC are the fastest to manage and are becomming far more important since OLE is taking over. BASIC/PASCAL and to some extent FORTRAN use counted strings. A very important guildline to engineering fast code is never do the same thing twice unless you have too. This rule is fundamental to prodecural programming and violated massively by OOP programming.

Example of why counted strings are superior - strlen function:

1. Null terminated string:

LEA EDI,STRADR
XOR EAX,EAX
MOV ECX,-1
PUSH EDI
REPNE SCAB
MOV EAX,EDI
POP ECX
DEC EAX
SUB EAX,ECX

or
LEA ESI,STRADR
XOR ECX,ECX
@@: LODSB
INC ECX
TEST AL,AL
JNE @B
DEC ECX

(note the number of registers used - aleast 3 - address, accumulator, counter.)
versus:
2. counted strings

LEA ESI,STRADR
LODSD/B
(two registers used)

Which do you think is faster?

Try strcat and the performance gain by counted string s even better. Think about where the performance gain is....For NULL terminated strings, the the overhead cost in CPU grows geometrically in direct proportion to the string size. Counted string have a fixed over head cost.

Universally useful tip: NULL terminate counted strings. Using this method (which is also adopted by the STRINGTABLE resource and OLE allows counted strings to also be used where ever NULL terminated strings are required - just use the starting address + 1/4.

I use the following strings types in my programming:

1. NULL terminated strings - used when receiving/sending sending variable sized strings to WINAPI or C styled functions (mainly user input or WINAPI results)..
2. Counted strings/NULL terminated - used for static string useage, display, prefixing/suffixing, control, resources.
3. OLE strings(Counted DWORD prefix count) - interfacing to OLE
4. Descriptor strings (strings are passed as a 2 DWORD desriptor - size and address) - when starting or length is variable/calculated values.

Learn the benefits of all four types of strings and you will master efficent string management.

Good luck!

Title: Re: Putting a null at the end of a string
Post by: hutch-- on August 02, 2009, 02:10:16 PM

OLE strings may be flexible is you bother to set up the support framework for them but they are fundamentally sllow as they involve memory allocation for any change of length, greater or smaller. You can in fact treat an allocated OLE buffer as just a memory buffer but the gives it no advantage over any other memory strategy.

In BYTE sized characters zero terminated strings may be crude but they are also fast, it comes at the price that you must know what you are doing in terms of setting your buffer size. Now in this world you get nothing for nothing, to create an OLE string you must know its length in much the same way as and other BYTE or WORD array, the difference with zero terminated strings is you only have to have a buffer that is big enough so if you know the maximum range, you spare yourself the hassle of having to get the length each time you allocate the string.

REPNE SCAB is a particularly slow combination but LODSB without the prefix is even slower, unless you are writing 16 bit DOS software or for an obscure couple of old AMD processors, treat them like the plague and use registers with incremented address pointers, the latter being much faster on Intel hardware from the 486 upwards.

The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: allynm on July 29, 2009, 01:37:52 AM