News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

My first useful Function, comments wanted

Started by rags, March 22, 2005, 05:49:29 PM

Previous topic - Next topic

rags

Ok Guys, ive finally got around to making something useful for me in asm. I know its been done before ,
But I wanted to see if I could do it myself.
It's a function that strips leading tabs and spaces from lines of text in a buffer.
You just need to supply a pointer to the buffer that holds the text, and the characters are stripped if any, and returned to the same buffer.

TrimLeadingSpaces PROTO :DWORD


TrimLeadingSpaces PROC USES EBX EDI ESI LpBuffer:DWORD

LOCAL LinBuffer[512] :BYTE ; A buffer to hold a line of text
                           ; to parse


mov ebx, LpBuffer
lea edi, LinBuffer
mov edx, ebx           ; edx holds the starting position in the
xor esi, esi           ; buffer to put the parsed text at.

GetLine:
mov al, [ebx]     ; move characters from input buffer to the
mov [edi + esi], al     ; LinBuffer until a LF or end of buffer
add ebx, 1              ; is found.
add esi, 1
test al, al             ; at end of buffer?
jnz NotEnd
jmp over
NotEnd:    
cmp al, 10 ; at end of line?
jne GetLine
over:
xor esi, esi
sub esi, 1
ParseBuff:
    add esi ,1
mov al, [edi + esi]
cmp al, 32 ; is the char a space?
ja DoneParse            ; Done is char is a num,letter,or punctuaton
je ParseBuff            ; get next char if a space

cmp al, 9 ; is the char a tab?
je ParseBuff            ; get next char if a tab

DoneParse:
    mov ecx, edi            ; get start offset in LinBuff
    add ecx , esi
    xor esi, esi
   
MoveBuff:
mov al, [ecx + esi] ; put the line back into the input buffer,
mov [edx], al     ; minus the leading tabs and spaces.
add edx, 1
add esi,1

test al,al              ; are we at the end of input?
jz Done
cmp al, 10              ; are we at end of line?
jne MoveBuff

xor esi, esi
jmp GetLine

Done:
ret


Ok  Give me your honest opinions on it, things I did wrong, things that need improvement, its the only way i'll learn :bg
also im including a little program to show it working.
just type text into the  edit control  and click trim to see it work.
the clear button clears the edit control


Rags

[attachment deleted by admin]
God made Man, but the monkey applied the glue -DEVO

Mark Jones

 Nice job Rags, this seems markedly familiar to the Spaces/Tab conversion tool I recently made over in the WINDOWS.INC Project thread.
http://www.masmforum.com/simple/index.php?topic=1044.0

I'm curious, is there a de-facto line length standard of 255 characters? If so, can you decrease LinBuffer? Also you could try a memory allocation API such as VirtualAlloc or LocalAlloc to create a runtime buffer. Benefit is, it uses no space in the compiled executable. Tradeoff is, you have to Free the memory religously, else you get memory pooling problems. There is also a way to directly access an edit control's memory buffer. I couldn't get it to work properly with my existing design but maybe you can. Search the WIN32.HLP for "allocating a text buffer" and you should land on the details.

Careful, a "simple" project like this can end up taking weeks.  :wink
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

raymond

The first observation I would have is that you made the app a lot more complicated than you had to.

For instance, you already have a buffer where your text is located. You want the "modified" text to be returned there and it will definitely not be any larger. Using that buffer would simplify a lot of your code. (I'll post my version of it within the next day.)

Raymond
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

raymond

Sooner than I thought.

TrimLeadingSpaces PROTO :DWORD

TrimLeadingSpaces PROC USES EDI ESI LpBuffer:DWORD

      mov   esi,LpBuffer
      mov   edi,esi           ; Use the supplied buffer
                              ; to condense the text

;check start of line

linestart:
      lodsb
      cmp   al,20h            ;space?
      jz    linestart
      cmp   al,9              ;tab?
      jz    linestart

;find crlf or end of text

   @@:
      stosb
      or    al,al             ;end of string?
      jz    done
      .if   al == 10          ;lf?
            .if byte ptr[esi+1] == 13   ;cr?
                  movsb               ;(this is a precaution
                                      ;against a reversal of the
                                      ;crlf bytes by some editors
            .endif
            jmp   linestart
      .endif
      lodsb
      jmp   @B

done:
      ret


Raymond
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

MichaelW

Both versions seem to produce the same output. Running on my P3, and with my sample buffer, the cycle counts were ~1280 for Rag's version and ~620 for Raymond's. When I substituted the MASM32 macros for LODSB and STOSB in Raymond's version, the cycle count dropped to ~570.


[attachment deleted by admin]
eschew obfuscation

hutch--

Rags,

It looks fine and apparently tests up OK. What you can do later when you can be bothered is to have a play at optimising it to get its speed up but as always, it better to have it working properly first.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

rags

Thanks everyone for the comments/sugestions, they are most welcome.
I will have another go at it, and try to improve it.
:bg
Mark:
  I just looked at the thread and checked out your tool. Nice job.
I don't know about a defacto line length, but in that case, you could most
definately reduce the size of LinBuffer. I'll look through the PSDK for that
info.
Raymond:
   Thanks for taking the time for showing me a faster, better way, of getting the job
done.
MichaelW:
   I'd better go have a better look at the masm macros, and see what other 'goodies'
are hidden in there, that could help me produce faster code.


Rags
God made Man, but the monkey applied the glue -DEVO

Randall Hyde

Here's the version from the HLA Standard Library.
Of course, it cheats and calls str.substr to do the actual string move, but that's the advantage of having library routines handy :-).
This one's a bit different that the original code insofar as it assumes you're passing it an HLA string rather than just an array of characters followed by a new line character (LF). Therefore, the code already knows the length of the string, for example.

This routine isn't particularly optimized, or anything (it would do the "str.substr" in-line if it were, for example). Just another example of this function.

unit stringUnit;

#include( "strings.hhf" );
#include( "excepts.hhf" );
#include( "memory.hhf" );

#include( "stdio.hhf" );



/********************************************************/
/*                                                      */
/* delspace,                                            */
/*                                                      */
/* Deletes any leading spaces from a string.            */
/*                                                      */
/* delspace is destructive, it modifies the string      */
/* passed as a  parameter.                              */
/*                                                      */
/*                                                      */
/********************************************************/


procedure str.delspace( dest: string ); @nodisplay; @noalignstack;
begin delspace;

    push( edi );
    push( eax );
   
    mov( dest, edi );
    if( edi = 0 ) then
   
        raise( ex.AttemptToDerefNULL );
       
    endif;
    mov( edi, eax );    // Save so we can compute index of 1st non-blank char.
   
    // Skip over all leading spaces.
   
    dec( edi );
    repeatWhlWS:
   
        inc( edi );
        cmp( (type char [edi]), ' ' );
        je repeatWhlWS;
        cmp( (type char [edi]), stdio.tab );
        je repeatWhlWS;
   
   
    // Remove the leading spaces by taking the substring
    // consisting of all characters beyond the last
    // leading space.
   
    sub( eax, edi );    // Compute index of first non-blank.
    str.substr( dest, dest, edi, (type str.strRec [eax]).length );
   
    pop( eax );
    pop( edi );

end delspace;

end stringUnit;


Cheers,
Randy Hyde

Mark_Larson


  I downloaded Michael's test.zip, and I made some changes to the code for a P4.  I added a new routine called "_TrimLeadingspace_mark".  I am still optimizing, but this is where I am at at the moment.  I used Rarymond's code as a starting point. 

_TrimLeadingSpaces_mark: 685 cycles
_TrimLeadingSpaces: 759 cycles
TrimLeadingSpaces: 1308 cycles
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

raymond

rags

As you may have concluded from MichaelW's reported timings, "optimization" doesn't only imply the use of a few more efficient instructions. That approach could even backfire in some cases depending on the computer architecture at hand.

The primary effort should be to analyze the problem thoroughly before starting any coding and use the simplest approach. That will generally be more efficient and much less prone to errors.

Raymond
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

Mark_Larson

#10
  Now I am going to post the one I am getting faster timings with than what Michael posted ( on my P4).  I was getting close to 100 cycles faster.  First Michael changed STOSB and LODSB to STB and LOB repsectively.  I pulled up the macros they simply do this:


;LOB
mov al,[esi]
inc esi

;STB
mov [edi],al
inc edi



  The reason that it is faster is ever since the Pentium processor LODSB and STOSB have been slower.  They were trying to make the Pentium more RISC-like.  So a lot of times breaking up a complex instruction into it's smaller components makes it run faster.  The LOOP instruction is the same way.  It's not as fast as a DEC/JNZ.  So Hutch's macro makes sense.  The problem is that on P4's INC/DEC is not the fastest way to change a registers value by 1.  You need to use ADD/SUB.  So that was the first thing I changed in the code below.  The second thing that was overlooked was the MOVSB instruction.  Hutch-- can we get a MOB macro for that one?  I changed it to "mov al,[esi]/add esi,1/mov [edi],al/add edi,1".  Michael's fastest modified version ( the one with LOB and STB), ran in 759 cycles on my P4.  The new version below runs in 685 cycles.  That's 74 cycles faster.  I added the code to Michael's test.asm, and used the same test string and everything ( I wanted to keep it apples to apples).  I tried a few other optimizations.  I tried a lookup table, but that was slower.  I also tried pre-reading the value that [esi] points to into another register.  That also ran slower. 


_TrimLeadingSpaces_mark PROC USES EDI ESI LpBuffer:DWORD
    mov   esi,LpBuffer
    mov   edi,esi               ; Use the supplied buffer

;xor eax,eax
    ;check start of line   
  linestart:
    ;;lodsb
    ;lob           ; MASM32 macro, fast replacement for lodsb
    mov   al,[esi]
    add   esi,1
    cmp   al,20h                ;space?
    jz    linestart
    cmp   al,9                  ;tab?
    jz    linestart
   
    ;find crlf or end of text
   @@:
    ;;stosb
    ;stb           ; MASM32 macro, fast replacement for stosb
    mov   [edi],al
    add   edi,1
    or    al,al                 ;end of string?
    jz    done
    .if   al == 10              ;lf?
      .if byte ptr[esi+1] == 13 ;cr?

;        movsb                   ;(this is a precaution
                                ;against a reversal of the
                                ;crlf bytes by some editors
mov al,[esi]
add esi,1
mov [edi],al
add edi,1

      .endif
      jmp   linestart
    .endif
    ;;lodsb
    ;lob           ; MASM32 macro, fast replacement for lodsb
    mov   al,[esi]
    add   esi,1
    jmp   @B
  done:
    ret
_TrimLeadingSpaces_mark endp



EDIT: I accidentally left a XOR EAX, EAX in there from when I tried to do a table lookup.  I Just commented it out.
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

lingo

#11
To be faster must be longer  :lol


OPTION PROLOGUE:NONE         ;turn it off
OPTION EPILOGUE:NONE
Align 16
db 8Dh,0A4h,24h,0,0,0,0,8Dh,64h,24h,0
TrimLe PROC lapBuffer:DWORD
mov eax,[esp] ;eax->return address
mov ecx,[esp+4] ;ecx->lpBuffer parameter
mov [esp],ebx ;save ebx register
mov [esp+4],eax ;save return address
mov ebx,1         ;ebx = 1
mov edx,ecx         ;edx = ecx->lpBuffer parameter
NewLine:
mov al,[ecx]
add ecx,ebx
ZeroTest:
cmp al,9
je NewLine
cmp al,20h
je NewLine
mov [edx],al
add edx,ebx
test al,al
je Ende
CopyText:
mov al,[ecx]
add ecx,ebx
mov [edx],al
add edx,ebx
cmp al,0Ah
jg CopyText
mov al,[ecx]
jne Ende
add ecx,ebx
cmp al,0Dh
jne ZeroTest
mov [edx],al
add edx,ebx
jno NewLine
Ende:
mov ebx,[esp] ;restore ebx register
add esp,4
ret         ;should be faster than ret 4
TrimLe  ENDP
OPTION PROLOGUE:PROLOGUEDEF ;turn back on the defaults
OPTION EPILOGUE:EPILOGUEDEF


Regards,
Lingo


Mark_Larson

 
  Nice code Lingo but you have a few bugs.  To give you a hint as to what one of them are.  Try a string that starts with "13, 10,9,9".  You'll see your code doesn't handle it right.  It's supposed to keep the 13,10, and your code throws it away.  Another bug is with the string "h",0.  Your code writes an extra byte.  It writes "h",0,0 ( a double zero).  That means your code overwrites whatever variable comes after the string.  I think there might be other bugs, but I have to go run errands.  Nice idea.
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

hutch--

Here is a quicky that seems to work OK. It overwrites the original buffer with the result.


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

align 4

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

block_ltrim proc src:DWORD

    xor eax, eax    ; zero EAX to prevent stall using AL

    mov edx, [esp+4]
    mov ecx, [esp+4]
    sub edx, 1
    jmp trimleft

  align 4
  trimleft:
    add edx, 1
    cmp BYTE PTR [edx], 32
    je trimleft
    cmp BYTE PTR [edx], 9
    je trimleft
    jmp nxt

  align 4
  store:
    add edx, 1
  nxt:
    mov al, [edx]
    mov [ecx], al
    add ecx, 1
    test al, al             ; test for written terminator
    jz bl_out
    sub al, 10              ; test for ascii 10
    jnz store
    jmp trimleft

  bl_out:

    ret 4

block_ltrim endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««


PS: "quicky" in Australian idiom means done quickly, it may go fast as well.  :bg
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

MichaelW

I modified _TrimLeadingSpaces replacing MOVSB, added an alignment directive to TrimLeadingSpaces, and added Hutch's and lingo's procedure. All four procedures seem to produce the same results, including the same returned buffer length. On the timing tests, I encountered an anomaly that I cannot explain. I had made a copy of Hutch's procedure because it was the fastest (on my P3), and before I starting trying to further optimize it, I measured the cycle counts for all five procedures as:

_TrimLeadingSpaces: 563 cycles
TrimLeadingSpaces: 1235 cycles
TrimLe: 575 cycles
block_ltrim: 485 cycles
_block_ltrim: 664 cycles

The strange thing is that _block_ltrim is an exact copy of block_ltrim with only the name changed. If I swap the location of the procedure definitions, with _block_ltrim defined first, it returns the lower cycle count and block_ltrim the higher cycle count. If I call the same procedure from both timing loops, I get essentially the same cycle count for both loops, and the count is again determined by the relative position of the procedure definitions. I tried changing the alignment, and using a local copy of the MASM32 szCopy procedure, hopefully to get the procedures closer together address-wise, and neither produced any significant change. Does anyone have any ideas on what the problem might be?


[attachment deleted by admin]
eschew obfuscation