The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: rags on March 22, 2005, 05:49:29 PM

Title: My first useful Function, comments wanted
Post by: rags on March 22, 2005, 05:49:29 PM
Ok Guys, ive finally got around to making something useful for me in asm. I know its been done before ,
But I wanted to see if I could do it myself.
It's a function that strips leading tabs and spaces from lines of text in a buffer.
You just need to supply a pointer to the buffer that holds the text, and the characters are stripped if any, and returned to the same buffer.

TrimLeadingSpaces PROTO :DWORD


TrimLeadingSpaces PROC USES EBX EDI ESI LpBuffer:DWORD

LOCAL LinBuffer[512] :BYTE ; A buffer to hold a line of text
                           ; to parse


mov ebx, LpBuffer
lea edi, LinBuffer
mov edx, ebx           ; edx holds the starting position in the
xor esi, esi           ; buffer to put the parsed text at.

GetLine:
mov al, [ebx]     ; move characters from input buffer to the
mov [edi + esi], al     ; LinBuffer until a LF or end of buffer
add ebx, 1              ; is found.
add esi, 1
test al, al             ; at end of buffer?
jnz NotEnd
jmp over
NotEnd:    
cmp al, 10 ; at end of line?
jne GetLine
over:
xor esi, esi
sub esi, 1
ParseBuff:
    add esi ,1
mov al, [edi + esi]
cmp al, 32 ; is the char a space?
ja DoneParse            ; Done is char is a num,letter,or punctuaton
je ParseBuff            ; get next char if a space

cmp al, 9 ; is the char a tab?
je ParseBuff            ; get next char if a tab

DoneParse:
    mov ecx, edi            ; get start offset in LinBuff
    add ecx , esi
    xor esi, esi
   
MoveBuff:
mov al, [ecx + esi] ; put the line back into the input buffer,
mov [edx], al     ; minus the leading tabs and spaces.
add edx, 1
add esi,1

test al,al              ; are we at the end of input?
jz Done
cmp al, 10              ; are we at end of line?
jne MoveBuff

xor esi, esi
jmp GetLine

Done:
ret


Ok  Give me your honest opinions on it, things I did wrong, things that need improvement, its the only way i'll learn :bg
also im including a little program to show it working.
just type text into the  edit control  and click trim to see it work.
the clear button clears the edit control


Rags

[attachment deleted by admin]
Title: Re: My first useful Function, comments wanted
Post by: Mark Jones on March 23, 2005, 02:43:11 AM
 Nice job Rags, this seems markedly familiar to the Spaces/Tab conversion tool I recently made over in the WINDOWS.INC Project thread.
http://www.masmforum.com/simple/index.php?topic=1044.0

I'm curious, is there a de-facto line length standard of 255 characters? If so, can you decrease LinBuffer? Also you could try a memory allocation API such as VirtualAlloc or LocalAlloc to create a runtime buffer. Benefit is, it uses no space in the compiled executable. Tradeoff is, you have to Free the memory religously, else you get memory pooling problems. There is also a way to directly access an edit control's memory buffer. I couldn't get it to work properly with my existing design but maybe you can. Search the WIN32.HLP for "allocating a text buffer" and you should land on the details.

Careful, a "simple" project like this can end up taking weeks.  :wink
Title: Re: My first useful Function, comments wanted
Post by: raymond on March 23, 2005, 03:43:13 AM
The first observation I would have is that you made the app a lot more complicated than you had to.

For instance, you already have a buffer where your text is located. You want the "modified" text to be returned there and it will definitely not be any larger. Using that buffer would simplify a lot of your code. (I'll post my version of it within the next day.)

Raymond
Title: Re: My first useful Function, comments wanted
Post by: raymond on March 23, 2005, 05:24:46 AM
Sooner than I thought.

TrimLeadingSpaces PROTO :DWORD

TrimLeadingSpaces PROC USES EDI ESI LpBuffer:DWORD

      mov   esi,LpBuffer
      mov   edi,esi           ; Use the supplied buffer
                              ; to condense the text

;check start of line

linestart:
      lodsb
      cmp   al,20h            ;space?
      jz    linestart
      cmp   al,9              ;tab?
      jz    linestart

;find crlf or end of text

   @@:
      stosb
      or    al,al             ;end of string?
      jz    done
      .if   al == 10          ;lf?
            .if byte ptr[esi+1] == 13   ;cr?
                  movsb               ;(this is a precaution
                                      ;against a reversal of the
                                      ;crlf bytes by some editors
            .endif
            jmp   linestart
      .endif
      lodsb
      jmp   @B

done:
      ret


Raymond
Title: Re: My first useful Function, comments wanted
Post by: MichaelW on March 23, 2005, 07:10:11 AM
Both versions seem to produce the same output. Running on my P3, and with my sample buffer, the cycle counts were ~1280 for Rag's version and ~620 for Raymond's. When I substituted the MASM32 macros for LODSB and STOSB in Raymond's version, the cycle count dropped to ~570.


[attachment deleted by admin]
Title: Re: My first useful Function, comments wanted
Post by: hutch-- on March 23, 2005, 09:08:51 AM
Rags,

It looks fine and apparently tests up OK. What you can do later when you can be bothered is to have a play at optimising it to get its speed up but as always, it better to have it working properly first.
Title: Re: My first useful Function, comments wanted
Post by: rags on March 23, 2005, 02:37:47 PM
Thanks everyone for the comments/sugestions, they are most welcome.
I will have another go at it, and try to improve it.
:bg
Mark:
  I just looked at the thread and checked out your tool. Nice job.
I don't know about a defacto line length, but in that case, you could most
definately reduce the size of LinBuffer. I'll look through the PSDK for that
info.
Raymond:
   Thanks for taking the time for showing me a faster, better way, of getting the job
done.
MichaelW:
   I'd better go have a better look at the masm macros, and see what other 'goodies'
are hidden in there, that could help me produce faster code.


Rags
Title: Re: My first useful Function, comments wanted
Post by: Randall Hyde on March 23, 2005, 02:50:23 PM
Here's the version from the HLA Standard Library.
Of course, it cheats and calls str.substr to do the actual string move, but that's the advantage of having library routines handy :-).
This one's a bit different that the original code insofar as it assumes you're passing it an HLA string rather than just an array of characters followed by a new line character (LF). Therefore, the code already knows the length of the string, for example.

This routine isn't particularly optimized, or anything (it would do the "str.substr" in-line if it were, for example). Just another example of this function.

unit stringUnit;

#include( "strings.hhf" );
#include( "excepts.hhf" );
#include( "memory.hhf" );

#include( "stdio.hhf" );



/********************************************************/
/*                                                      */
/* delspace,                                            */
/*                                                      */
/* Deletes any leading spaces from a string.            */
/*                                                      */
/* delspace is destructive, it modifies the string      */
/* passed as a  parameter.                              */
/*                                                      */
/*                                                      */
/********************************************************/


procedure str.delspace( dest: string ); @nodisplay; @noalignstack;
begin delspace;

    push( edi );
    push( eax );
   
    mov( dest, edi );
    if( edi = 0 ) then
   
        raise( ex.AttemptToDerefNULL );
       
    endif;
    mov( edi, eax );    // Save so we can compute index of 1st non-blank char.
   
    // Skip over all leading spaces.
   
    dec( edi );
    repeatWhlWS:
   
        inc( edi );
        cmp( (type char [edi]), ' ' );
        je repeatWhlWS;
        cmp( (type char [edi]), stdio.tab );
        je repeatWhlWS;
   
   
    // Remove the leading spaces by taking the substring
    // consisting of all characters beyond the last
    // leading space.
   
    sub( eax, edi );    // Compute index of first non-blank.
    str.substr( dest, dest, edi, (type str.strRec [eax]).length );
   
    pop( eax );
    pop( edi );

end delspace;

end stringUnit;


Cheers,
Randy Hyde
Title: Re: My first useful Function, comments wanted
Post by: Mark_Larson on March 23, 2005, 07:03:42 PM

  I downloaded Michael's test.zip, and I made some changes to the code for a P4.  I added a new routine called "_TrimLeadingspace_mark".  I am still optimizing, but this is where I am at at the moment.  I used Rarymond's code as a starting point. 

_TrimLeadingSpaces_mark: 685 cycles
_TrimLeadingSpaces: 759 cycles
TrimLeadingSpaces: 1308 cycles
Title: Re: My first useful Function, comments wanted
Post by: raymond on March 23, 2005, 07:17:24 PM
rags

As you may have concluded from MichaelW's reported timings, "optimization" doesn't only imply the use of a few more efficient instructions. That approach could even backfire in some cases depending on the computer architecture at hand.

The primary effort should be to analyze the problem thoroughly before starting any coding and use the simplest approach. That will generally be more efficient and much less prone to errors.

Raymond
Title: Re: My first useful Function, comments wanted
Post by: Mark_Larson on March 23, 2005, 07:50:11 PM
  Now I am going to post the one I am getting faster timings with than what Michael posted ( on my P4).  I was getting close to 100 cycles faster.  First Michael changed STOSB and LODSB to STB and LOB repsectively.  I pulled up the macros they simply do this:


;LOB
mov al,[esi]
inc esi

;STB
mov [edi],al
inc edi



  The reason that it is faster is ever since the Pentium processor LODSB and STOSB have been slower.  They were trying to make the Pentium more RISC-like.  So a lot of times breaking up a complex instruction into it's smaller components makes it run faster.  The LOOP instruction is the same way.  It's not as fast as a DEC/JNZ.  So Hutch's macro makes sense.  The problem is that on P4's INC/DEC is not the fastest way to change a registers value by 1.  You need to use ADD/SUB.  So that was the first thing I changed in the code below.  The second thing that was overlooked was the MOVSB instruction.  Hutch-- can we get a MOB macro for that one?  I changed it to "mov al,[esi]/add esi,1/mov [edi],al/add edi,1".  Michael's fastest modified version ( the one with LOB and STB), ran in 759 cycles on my P4.  The new version below runs in 685 cycles.  That's 74 cycles faster.  I added the code to Michael's test.asm, and used the same test string and everything ( I wanted to keep it apples to apples).  I tried a few other optimizations.  I tried a lookup table, but that was slower.  I also tried pre-reading the value that [esi] points to into another register.  That also ran slower. 


_TrimLeadingSpaces_mark PROC USES EDI ESI LpBuffer:DWORD
    mov   esi,LpBuffer
    mov   edi,esi               ; Use the supplied buffer

;xor eax,eax
    ;check start of line   
  linestart:
    ;;lodsb
    ;lob           ; MASM32 macro, fast replacement for lodsb
    mov   al,[esi]
    add   esi,1
    cmp   al,20h                ;space?
    jz    linestart
    cmp   al,9                  ;tab?
    jz    linestart
   
    ;find crlf or end of text
   @@:
    ;;stosb
    ;stb           ; MASM32 macro, fast replacement for stosb
    mov   [edi],al
    add   edi,1
    or    al,al                 ;end of string?
    jz    done
    .if   al == 10              ;lf?
      .if byte ptr[esi+1] == 13 ;cr?

;        movsb                   ;(this is a precaution
                                ;against a reversal of the
                                ;crlf bytes by some editors
mov al,[esi]
add esi,1
mov [edi],al
add edi,1

      .endif
      jmp   linestart
    .endif
    ;;lodsb
    ;lob           ; MASM32 macro, fast replacement for lodsb
    mov   al,[esi]
    add   esi,1
    jmp   @B
  done:
    ret
_TrimLeadingSpaces_mark endp



EDIT: I accidentally left a XOR EAX, EAX in there from when I tried to do a table lookup.  I Just commented it out.
Title: Re: My first useful Function, comments wanted
Post by: lingo on March 24, 2005, 01:08:36 AM
To be faster must be longer  :lol


OPTION PROLOGUE:NONE         ;turn it off
OPTION EPILOGUE:NONE
Align 16
db 8Dh,0A4h,24h,0,0,0,0,8Dh,64h,24h,0
TrimLe PROC lapBuffer:DWORD
mov eax,[esp] ;eax->return address
mov ecx,[esp+4] ;ecx->lpBuffer parameter
mov [esp],ebx ;save ebx register
mov [esp+4],eax ;save return address
mov ebx,1         ;ebx = 1
mov edx,ecx         ;edx = ecx->lpBuffer parameter
NewLine:
mov al,[ecx]
add ecx,ebx
ZeroTest:
cmp al,9
je NewLine
cmp al,20h
je NewLine
mov [edx],al
add edx,ebx
test al,al
je Ende
CopyText:
mov al,[ecx]
add ecx,ebx
mov [edx],al
add edx,ebx
cmp al,0Ah
jg CopyText
mov al,[ecx]
jne Ende
add ecx,ebx
cmp al,0Dh
jne ZeroTest
mov [edx],al
add edx,ebx
jno NewLine
Ende:
mov ebx,[esp] ;restore ebx register
add esp,4
ret         ;should be faster than ret 4
TrimLe  ENDP
OPTION PROLOGUE:PROLOGUEDEF ;turn back on the defaults
OPTION EPILOGUE:EPILOGUEDEF


Regards,
Lingo

Title: Re: My first useful Function, comments wanted
Post by: Mark_Larson on March 24, 2005, 02:57:58 AM
 
  Nice code Lingo but you have a few bugs.  To give you a hint as to what one of them are.  Try a string that starts with "13, 10,9,9".  You'll see your code doesn't handle it right.  It's supposed to keep the 13,10, and your code throws it away.  Another bug is with the string "h",0.  Your code writes an extra byte.  It writes "h",0,0 ( a double zero).  That means your code overwrites whatever variable comes after the string.  I think there might be other bugs, but I have to go run errands.  Nice idea.
Title: Re: My first useful Function, comments wanted
Post by: hutch-- on March 24, 2005, 04:05:48 AM
Here is a quicky that seems to work OK. It overwrites the original buffer with the result.


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

align 4

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

block_ltrim proc src:DWORD

    xor eax, eax    ; zero EAX to prevent stall using AL

    mov edx, [esp+4]
    mov ecx, [esp+4]
    sub edx, 1
    jmp trimleft

  align 4
  trimleft:
    add edx, 1
    cmp BYTE PTR [edx], 32
    je trimleft
    cmp BYTE PTR [edx], 9
    je trimleft
    jmp nxt

  align 4
  store:
    add edx, 1
  nxt:
    mov al, [edx]
    mov [ecx], al
    add ecx, 1
    test al, al             ; test for written terminator
    jz bl_out
    sub al, 10              ; test for ascii 10
    jnz store
    jmp trimleft

  bl_out:

    ret 4

block_ltrim endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««


PS: "quicky" in Australian idiom means done quickly, it may go fast as well.  :bg
Title: Re: My first useful Function, comments wanted
Post by: MichaelW on March 24, 2005, 07:44:00 AM
I modified _TrimLeadingSpaces replacing MOVSB, added an alignment directive to TrimLeadingSpaces, and added Hutch's and lingo's procedure. All four procedures seem to produce the same results, including the same returned buffer length. On the timing tests, I encountered an anomaly that I cannot explain. I had made a copy of Hutch's procedure because it was the fastest (on my P3), and before I starting trying to further optimize it, I measured the cycle counts for all five procedures as:

_TrimLeadingSpaces: 563 cycles
TrimLeadingSpaces: 1235 cycles
TrimLe: 575 cycles
block_ltrim: 485 cycles
_block_ltrim: 664 cycles

The strange thing is that _block_ltrim is an exact copy of block_ltrim with only the name changed. If I swap the location of the procedure definitions, with _block_ltrim defined first, it returns the lower cycle count and block_ltrim the higher cycle count. If I call the same procedure from both timing loops, I get essentially the same cycle count for both loops, and the count is again determined by the relative position of the procedure definitions. I tried changing the alignment, and using a local copy of the MASM32 szCopy procedure, hopefully to get the procedures closer together address-wise, and neither produced any significant change. Does anyone have any ideas on what the problem might be?


[attachment deleted by admin]
Title: Re: My first useful Function, comments wanted
Post by: hutch-- on March 24, 2005, 08:38:01 AM
Michael,

I have usually found you get anomalies like tis due to alignment of the procedure entry point. Its probably a good idea to align all of the procs so they are less order dependent but the difference may in fact be a larger alignment based on where it is in the code section.
Title: Re: My first useful Function, comments wanted
Post by: Mark_Larson on March 24, 2005, 12:21:27 PM
Quote from: MichaelW on March 24, 2005, 07:44:00 AM
I modified _TrimLeadingSpaces replacing MOVSB, added an alignment directive to TrimLeadingSpaces, and added Hutch's and lingo's procedure. All four procedures seem to produce the same results, including the same returned buffer length. On the timing tests, I encountered an anomaly that I cannot explain. I had made a copy of Hutch's procedure because it was the fastest (on my P3), and before I starting trying to further optimize it, I measured the cycle counts for all five procedures as:

_TrimLeadingSpaces: 563 cycles
TrimLeadingSpaces: 1235 cycles
TrimLe: 575 cycles
block_ltrim: 485 cycles
_block_ltrim: 664 cycles

The strange thing is that _block_ltrim is an exact copy of block_ltrim with only the name changed. If I swap the location of the procedure definitions, with _block_ltrim defined first, it returns the lower cycle count and block_ltrim the higher cycle count. If I call the same procedure from both timing loops, I get essentially the same cycle count for both loops, and the count is again determined by the relative position of the procedure definitions. I tried changing the alignment, and using a local copy of the MASM32 szCopy procedure, hopefully to get the procedures closer together address-wise, and neither produced any significant change. Does anyone have any ideas on what the problem might be?


  Go Hutch--!!!  I also noticed weird behavior Michael.  For some reason TrimLeadingSpaces is reporting in the 1600 cycle range on my home P4 but the 1200 range at my work P4.  And _TrimLeadingSpaces was running around 685, and now it's running around 1000.  I tried bumping up the number of loops by 10 and that didn't work.  So I was wondering if Norton Antivirus has anything to do with it.  It's running on my home laptop but not on my work system.  If you also have NAV maybe you should try keeping it from loading to see how it effects things.


Quote from: hutch-- on March 24, 2005, 08:38:01 AM
Michael,

I have usually found you get anomalies like tis due to alignment of the procedure entry point. Its probably a good idea to align all of the procs so they are less order dependent but the difference may in fact be a larger alignment based on where it is in the code section.

  I missed Hutch's reply before I wrote up mine.  For the P3 he's probably right.  But the P4 doesn't have code alignment issues.  So I am not sure why I am getting such different numbers between here and work ( It's about 400 cycles longer at home, which means it is running 50% slower).
Title: Re: My first useful Function, comments wanted
Post by: lingo on March 24, 2005, 01:42:20 PM
Michael,  :bdg

I corrected a bit my proc TrimLe
so pls include it in your tests

Regards,
Lingo
Title: Re: My first useful Function, comments wanted
Post by: hutch-- on March 24, 2005, 01:52:56 PM
Mark,

Something I have found timing different algos over time on two different PIVs is that alignment does effect the timings on a PIV. The problem is it have never been predictable and there are times when I have aligned a label and timed it and it dropped dead and there are other times when it made the algo faster. It has also been effected by the size of the alignment, often I found that align 4 was faster than align 16.

There is probably a way to test this, copy the algo to an aligned executable piece of memory and test it with big alignments like 4k, 16k etc .... I have also found that preceding code at times effects the algo timings even though I have jumped over it to an aligned label. I get the impression that the innards of a PIV are more complex than the technical data suggests.
Title: Re: My first useful Function, comments wanted
Post by: MichaelW on March 24, 2005, 02:47:08 PM
Mark,

Yes I do have NAV running. I'll run some tests without it later today.
Title: Re: My first useful Function, comments wanted
Post by: Mark Jones on March 24, 2005, 05:53:52 PM
Rags, you need to make a TrimTrailingSpaces now. :)
Title: Re: My first useful Function, comments wanted
Post by: rags on March 24, 2005, 06:37:58 PM
In these snippets from Marks code I have a question:
Snippet 1: 

  linestart:
    ;;lodsb
    ;lob           ; MASM32 macro, fast replacement for lodsb
    mov   al,[esi]
    add   esi,1

   
Snippet 2:

    .if   al == 10              ;lf?
      .if byte ptr[esi+1] == 13 ;cr?

;        movsb                   ;(this is a precaution
                                ;against a reversal of the
                                ;crlf bytes by some editors
mov al,[esi]
add esi,1
mov [edi],al
add edi,1

      .endif
      jmp   linestart
    .endif


First off, in the first snippet, a character gets moved to the al and esi gets incremented by 1.

In the second snippet, al is checked to see if it holds a LF. If it does, the next byte  in the buffer is checked
for a CR, by using .if byte ptr[esi+1] == 13, to see if the crlf byte order has been reversed.

My question is, since esi has already been incremented past the LF in the first snippet,
would'nt checking esi +1 be checking 2 bytes past the LF, and a reversed crlf never be detected?


Rags

Title: Re: My first useful Function, comments wanted
Post by: MichaelW on March 24, 2005, 08:27:31 PM
Rags,

I think you found a potential problem in not just Raymond's code. When I reverse the order of CR and LF in my test buffer, lingo's procedure is the only one that doesn't break:

                                         I'm Larry,
                         this is my brother darryl,
and this is my other brother darryl.
  _TrimLeadingSpaces:
I'm Larry,
                         this is my brother darryl,
and this is my other brother darryl.
  buffer1 length:85       TrimLeadingSpaces:
I'm Larry,
                         this is my brother darryl,
and this is my other brother darryl.
  buffer1 length:85       TrimLe:
I'm Larry,
this is my brother darryl,
and this is my other brother darryl.
  buffer1 length:78       block_ltrim:
I'm Larry,
                         this is my brother darryl,
and this is my other brother darryl.
  buffer1 length:85       _block_ltrim:
I'm Larry,
                         this is my brother darryl,
and this is my other brother darryl.
  buffer1 length:85


And if I change the statement to:

.if byte ptr[esi] == 13 ;cr?

then Raymond's version also produces the correct result. I counted the bytes and the correct length is 78.

Mark,

The problem is not NAV. I also ran the test on my K5 system under Windows XP HE and got cycle counts of:
61
1375
-58
-9
18

I think the odd counts can be attributed to a very oddball processor, but the pattern seems to be similar.
Title: Re: My first useful Function, comments wanted
Post by: raymond on March 24, 2005, 10:10:53 PM
I apologize for my error. :red  ESI is effectively pointing to the next character in my snippet and the instruction should have been:

.if byte ptr[esi] == 13

That, unfortunately, is what may occasionally happen with untested code. :boohoo:

Raymond
Title: Re: My first useful Function, comments wanted
Post by: Mark_Larson on March 24, 2005, 11:12:15 PM
Quote from: MichaelW on March 24, 2005, 08:27:31 PM
Rags,

I think you found a potential problem in not just Raymond's code. When I reverse the order of CR and LF in my test buffer, lingo's procedure is the only one that doesn't break:

Good spot, rags :)


Quote from: MichaelW on March 24, 2005, 08:27:31 PM
Mark,

The problem is not NAV. I also ran the test on my K5 system under Windows XP HE and got cycle counts of:
61
1375
-58
-9
18

I think the odd counts can be attributed to a very oddball processor, but the pattern seems to be similar.


Crud.  I also disabled NAV after posting this morning.  I got identical #'s.  I will try what Hutch-- suggested and try aligning the code (grumbles).  I've seen aligned code on the P4 be slower, but not faster.  Slower due to the extra code that gets added from the align.  If it does make it faster, I am going to go kick Intel.  They have a facility in Austin.

I am going to be so thankful when I switch to my Athlon ( they also have a facility in town).  I guess Donkey and I are going to have to write optimized Athlon-64 code together.  Ah bliss...



Title: Re: My first useful Function, comments wanted
Post by: MichaelW on April 02, 2005, 05:24:44 AM
Before this got lost completely, I went back and re-tested with Lingo's updated version:

_TrimLeadingSpaces: 568 cycles
TrimLeadingSpaces: 1238 cycles
TrimLe: 531 cycles
block_ltrim: 486 cycles
_block_ltrim: 651 cycles

Closer, but Hutch's version is still faster. I guess we'll never know why an identical procedure at the same alignment (I tried up to ALIGN 16) can take 34% more cycles to execute.
Title: Re: My first useful Function, comments wanted
Post by: hutch-- on April 02, 2005, 08:45:59 AM
Michael,

It can also be a factor of what has been run before it so it may be worth changing the order of which procedures are tested. I guess you could set all of the procedures so they followed a seperate identical algo so they all had the same lead in.
Title: Re: My first useful Function, comments wanted
Post by: lingo on April 02, 2005, 02:29:11 PM
Michael,  :lol

"Before this got lost completely, I went back and re-tested with Lingo’s updated version:

Code:
_TrimLeadingSpaces: 568 cycles
TrimLeadingSpaces: 1238 cycles
TrimLe: 531 cycles
block_ltrim: 486 cycles
_block_ltrim: 651 cycles"


Thank you Michael,   :clap:
but I received different results:
CPU Pentium 4 3.6GHz Prescott
and Windows XP SP2


"I'm Larry,
this is my brother darryl,
and this is my other brother darryl.
  _TrimLeadingSpaces:
I'm Larry,
this is my brother darryl,
and this is my other brother darryl.
  buffer1 length:78       TrimLeadingSpaces:
I'm Larry,
this is my brother darryl,
and this is my other brother darryl.
  buffer1 length:78       TrimLe:
I'm Larry,
this is my brother darryl,
and this is my other brother darryl.
  buffer1 length:78       block_ltrim:
I'm Larry,
this is my brother darryl,
and this is my other brother darryl.
  buffer1 length:78       _block_ltrim:
I'm Larry,
this is my brother darryl,
and this is my other brother darryl.
  buffer1 length:78
Press enter to continue...

_TrimLeadingSpaces: 977 cycles
TrimLeadingSpaces: 1609 cycles
TrimLe: 699 cycles
block_ltrim: 962 cycles
_block_ltrim: 959 cycles

Press enter to exit...


Regards,
Lingo
Title: Re: My first useful Function, comments wanted
Post by: MichaelW on April 02, 2005, 05:12:28 PM
Quote from: lingo on April 02, 2005, 02:29:11 PM
Thank you Michael,   :clap:
but I received different results:
CPU Pentium 4 3.6GHz Prescott
and Windows XP SP2

Sorry, it was late and I forgot to add that I was running the test on a P3. I should have expected that you would optimize for P4.

Title: Re: My first useful Function, comments wanted
Post by: Mark_Larson on April 02, 2005, 05:15:56 PM
Quote from: lingo on April 02, 2005, 02:29:11 PM
but I received different results:
CPU Pentium 4 3.6GHz Prescott
and Windows XP SP2


That's because Michael has a P3.  The P4 is complelely different from the P3.  So a number of things that are fast on the P3 are slow on the P4 and vice versa.
Title: Re: My first useful Function, comments wanted
Post by: MichaelW on April 02, 2005, 05:40:17 PM
Quote from: hutch-- on April 02, 2005, 08:45:59 AM
It can also be a factor of what has been run before it so it may be worth changing the order of which procedures are tested. I guess you could set all of the procedures so they followed a seperate identical algo so they all had the same lead in.

I changed the code so TrimLe was running before both copies of block_ltrim:

_TrimLeadingSpaces: 567 cycles
TrimLeadingSpaces: 1254 cycles
TrimLe: 531 cycles
block_ltrim: 486 cycles
TrimLe: 535 cycles
_block_ltrim: 676 cycles

Then I changed to align 16 for both copies:

_TrimLeadingSpaces: 566 cycles
TrimLeadingSpaces: 1258 cycles
TrimLe: 531 cycles
block_ltrim: 678 cycles
TrimLe: 525 cycles
_block_ltrim: 673 cycles

Then I eliminated the extra call to TrimLe:

_TrimLeadingSpaces: 565 cycles
TrimLeadingSpaces: 1246 cycles
TrimLe: 525 cycles
block_ltrim: 649 cycles
_block_ltrim: 648 cycles

My memory regarding the align 16 test was obviously wrong. The block_ltrim code seems to be very sensitive to alignment (running on a P3).
Title: Re: My first useful Function, comments wanted
Post by: BoR0 on April 03, 2005, 02:38:43 PM
Hello, I've written a similar algo in 5 minutes, so I decided to post it here too to check your opinions  :lol


.data
mystring db 9, 32, 32, 9, 32, 9, 9, 32, 9, "hi, BoR0 is cool :D",0

; 9 = tab
; 32 = spacing

.code
TrimStartSpacing PROC lpBuffer:DWORD
mov eax, lpBuffer

@@:
cmp byte ptr [eax], 32
jne tabcheck
inc eax
jmp @B
tabcheck:
cmp byte ptr [eax], 9
jne @F
inc eax
jmp @B
@@:

ret
TrimStartSpacing ENDP

start:

invoke TrimStartSpacing, ADDR mystring
invoke MessageBox, 0, eax, 0, 0

invoke ExitProcess, 0
end start


This function doesn't actually delete the tabs and spaces, but returns a valid pointer to the start of the text without the tabs and spaces. I think this is the fastest way.

Greets, BoR0  :U
Title: Re: My first useful Function, comments wanted
Post by: hutch-- on April 03, 2005, 03:36:46 PM
The idea is a good one but it only trims a single line, not a block of lines like the topic and examples.

Here is how to do it quickly.


OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 4

ltrimx proc src:DWORD

    mov eax, [esp+4]
    sub eax, 1
  @@:
    add eax, 1
    cmp BYTE PTR [eax], 32
    je @B
    cmp BYTE PTR [eax], 9
    je @B

    ret 4

ltrimx endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef