Print Page - Improved CAT$ macro

Title: Improved CAT$ macro
Post by: jj2007 on June 27, 2008, 11:02:21 PM

I am trying to get a somewhat faster and handier version of cat$. Here are some timings on a Celeron:

672 clocks CAT$a        113 bytes       7143 LAMPs
853 clocks CAT$aHA      113 bytes       9068 LAMPs
710 clocks CAT$b        127 bytes       8001 LAMPs
728 clocks CAT$bHA      127 bytes       8204 LAMPs
1115 clocks old cat$    99 bytes        11094 LAMPs

LAMPs = Lean And Mean Points = cycles * sqrt(size)

The "a" version is typically 33% faster than the old cat$ but slows down drastically if there is high ascii byte such as " with thä new CAT$ macro" at the beginning of a string, marked as "HA" above.
The "b" version is typically 20-25% faster than the old cat$ and does not have the high ascii problem described inter alia in this post (http://www.masm32.com/board/index.php?topic=1589.msg12333#msg12333).

Grateful for comments and timings, especially on non-Celerons.

[attachment deleted by admin]

Title: Re: Improved CAT$ macro
Post by: hutch-- on June 28, 2008, 02:41:02 AM

JJ,

The library cat$ must be able to do this.

Code Select


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    .code

start:
   
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    call main
    inkey
    exit

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

main proc

    LOCAL hMem  :DWORD
    LOCAL flen  :DWORD
    LOCAL buff  :DWORD

    mov hMem, InputFile("\masm32\include\windows.inc")
    mov flen, ecx

    mov eax, flen
    add eax, eax
    add eax, eax
    add eax, eax
    add eax, eax                ; flen * 16

    mov buff, alloc$(eax)       ; allocate buffer

    invoke GetTickCount
    push eax

  ; -------------------------------
  ; cat 16 copies of file to "buff"
  ; -------------------------------
    mov buff, cat$(buff,hMem,hMem,hMem,hMem,hMem,hMem,hMem,hMem,hMem,hMem,hMem,hMem,hMem,hMem,hMem,hMem)

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print str$(eax),13,10

    free$ buff

    ret

main endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

end start

Here are the timings on my old PIV, its typical of most late PIVs.

Code Select


696 clocks CAT$a        113 bytes       7399 LAMPs
909 clocks CAT$aHA      113 bytes       9663 LAMPs
743 clocks CAT$b        127 bytes       8373 LAMPs
755 clocks CAT$bHA      127 bytes       8508 LAMPs
863 clocks old cat$     99 bytes        8587 LAMPs

Title: Re: Improved CAT$ macro
Post by: jj2007 on June 28, 2008, 06:49:42 AM

Quote from: hutch-- on June 28, 2008, 02:41:02 AM
JJ,

The library cat$ must be able to do this.

You are really demanding! See attachment.

[attachment deleted by admin]

Title: Re: Improved CAT$ macro
Post by: Jimg on June 28, 2008, 04:15:15 PM

JJ-
What did you use to assemble your last version? When I assemble the source, it is consistently slower than the exe you provided.

Title: Re: Improved CAT$ macro
Post by: jj2007 on June 28, 2008, 05:41:47 PM

Quote from: Jimg on June 28, 2008, 04:15:15 PM
JJ-
What did you use to assemble your last version? When I assemble the source, it is consistently slower than the exe you provided.

I used jjTurboAsm with the option /AfterBurner=ON

Seriously, I just downloaded the zip, and the time stamps are 10:53:10 for the exe and 10:52:50 for the asm - unlikely that I was able to do drastic improvements in 20 seconds. So this is somewhat mysterious. I use polink, but that shouldn't affect speed afaik. Can you post your version, maybe with some timings?

Title: Re: Improved CAT$ macro
Post by: Jimg on June 28, 2008, 06:02:03 PM

This is the typical difference in execution-

Code Select


The one you assembled-
 ----------------------- CAT$ timings: -------------

566 clocks CAT$a        113 bytes       6017 LAMPs
775 clocks CAT$aHA      113 bytes       8238 LAMPs
571 clocks CAT$b        127 bytes       6435 LAMPs
645 clocks CAT$bHA      127 bytes       7269 LAMPs
759 clocks old cat$     99 bytes        7552 LAMPs

LAMPs = Lean And Mean Points = cycles * sqrt(size)

The one I assembled-
 ----------------------- CAT$ timings: -------------

584 clocks CAT$a        113 bytes       6208 LAMPs
772 clocks CAT$aHA      113 bytes       8206 LAMPs
587 clocks CAT$b        127 bytes       6615 LAMPs
657 clocks CAT$bHA      127 bytes       7404 LAMPs
830 clocks old cat$     99 bytes        8258 LAMPs

LAMPs = Lean And Mean Points = cycles * sqrt(size)

[attachment deleted by admin]

Title: Re: Improved CAT$ macro
Post by: jj2007 on June 28, 2008, 06:36:02 PM

The files look pretty similar in OllyDbg, although there is a slight variation at address 4016BE - polink?? But apart from that, your timings are almost identical - differences are within the normal "statistical noise".

Title: Re: Improved CAT$ macro
Post by: jj2007 on June 29, 2008, 01:13:38 AM

I have good and bad news on the CAT$ macro.
First, the bad news: StdOut teases me with non-standard chars; where I expect an ä, I get õ; any hints why?

Ciao bellissimo, this is test B with thõ new CAT$ macro

Second bad news: The previous version had a bug - it would not accept mov eax, CAT$(addr MyBuffer, ..), only the CAT$("Here I am", ) or CAT$(0, addr MyBuffer...) worked. Ok, that was fixed

Now the good news: I taught the beast something common in BASIC, i.e. having the destination string as one or more of the sources:

   mov eax, CAT$(addr mcDefBuffer, "this is test X", addr strN)
   print CAT$(addr mcDefBuffer, "Ciao caro, ", addr mcDefBuffer, CrLf$, "Fantastic ", str$(127), " bytes"

Ciao caro, this is test X with the new CAT$ macro
A cute little routine allowing you to concatenate
multiple strings of different origins
Fantastic 127 bytes short

Etc., full source attached. Suggestions for improving it?
Cheers, jj

----------------------- CAT$ usage: -----------------------

1. Like old cat$ (but no zero-ing of MyBuffer needed):
mov eax, CAT$(addr MyBuffer, "Test", addr Src2)

2. Write strings to a default buffer:
invoke MessageBox, 0, CAT$(0, "Ciao ", addr YourName),
chr$("Title"), MB_OK

invoke MessageBox, 0, CAT$("Ciao ", addr YourName),
chr$("Title"), MB_OK
(Oops, no zero after CAT$? The macro knows
that "Ciao" is not a destination ...)

mov eax, CAT$(0, "Test1: ", addr Src2, str$(eax), "bytes")

3. Append to last position after a CAT$(0) or CAT$(addr MyBuffer):
mov eax, CAT$(1, "Test3", addr Src4)

----------------------- CAT$ timings: ---------------------

973 clocks CAT$a 113 bytes 10343 LAMPs
870 clocks CAT$aHA 113 bytes 9248 LAMPs
706 clocks CAT$b 127 bytes 7956 LAMPs
764 clocks CAT$bHA 127 bytes 8610 LAMPs
849 clocks old cat$ 99 bytes 8447 LAMPs

LAMPs = Lean And Mean Points = cycles * sqrt(size)

EDIT: New version attached, allows destination in sources with this "lazy" syntax:

         mov eax, CAT$("this is test X", addr strN)   ; concat 2 strings into mcDefBuffer
         print CAT$("Ciao caro, ", 0, CrLf$, "Fantastic ", str$(127), " bytes short!", CrLf$, CrLf$)

0 is the currrent content of destination.

[attachment deleted by admin]

Title: Re: Improved CAT$ macro
Post by: GregL on June 29, 2008, 03:20:11 AM

jj,

Regarding the non-standard characters, it has to do with the code page and font you are using for the console. You can use the CHCP command at the command-prompt to get and set the code page. It's kind of a mess.

Keep your eye on the code page (http://blogs.msdn.com/oldnewthing/archive/2005/03/08/389527.aspx)

Console Code Pages (http://msdn.microsoft.com/en-us/library/ms682064(VS.85).aspx)

CHCP (http://technet2.microsoft.com/windowsserver/en/library/6556a0bb-29ba-4489-876e-852344661cbe1033.mspx?mfr=true)

Title: Re: Improved CAT$ macro
Post by: Jimg on June 29, 2008, 05:19:50 AM

jj-
may I have a copy of the timers.asm you are using? It seems to call ultoa from msvcrt where mine doesn't. That seems to produce alignment differences that account for the timing differences I am seeing.

Title: Re: Improved CAT$ macro
Post by: jj2007 on June 29, 2008, 07:55:13 AM

Quote from: Jimg on June 29, 2008, 05:19:50 AM
may I have a copy of the timers.asm you are using?

JimG: Here they are.
Greg: Thanks for the code page hint.

[attachment deleted by admin]

Title: Re: Improved CAT$ macro
Post by: Jimg on June 29, 2008, 04:42:46 PM

Thanks jj. I found the actual difference. I'm using the older macros.asm
ustr$ was changed to use the c stuff. The older macros called dwtoa, which added just enough bytes to shift the code to a place that runs about 5% slower on my machine.
Amazing how touchy these AMD's are about placement in memory. I can often make one routine faster than another just by moving it up in memory.

Title: Re: Improved CAT$ macro
Post by: jj2007 on July 03, 2008, 08:45:08 PM

Quote from: Jimg on June 29, 2008, 04:42:46 PM
Amazing how touchy these AMD's are about placement in memory. I can often make one routine faster than another just by moving it up in memory.

Check in Olly if calls and jumps and memory accesses change from near to far...

Title: Re: Improved CAT$ macro
Post by: Jimg on July 04, 2008, 01:01:46 AM

Quote from: jj2007 on July 03, 2008, 08:45:08 PM
Quote from: Jimg on June 29, 2008, 04:42:46 PM
Amazing how touchy these AMD's are about placement in memory. I can often make one routine faster than another just by moving it up in memory.
Check in Olly if calls and jumps and memory accesses change from near to far...

No, it's not the jumps. I've seen this so many times where just one extra byte makes a large difference that I had to give up on presenting my best optimizations because it's just totally different on an intel chip. Sad, because I really enjoy it.

Title: Re: Improved CAT$ macro
Post by: jj2007 on July 04, 2008, 11:05:54 PM

New version of CAT$ attached. The macro has become pretty flexible but also awfully complex. Usage is simple but debugging is a nightmare... let me know where it crashes please.

   MsgBox CAT$(\
   "We found 'FR_MatchAlefHamza' in", CrLf$, LastPath$, CrLf$,\
   "at pos ", str$(ecx), CrLf$, CrLf$, FifRet$, CrLf$,\
   "Case-insensitive search took only ", str$(esi), " ms"),\
   addr AppName, MB_OK

----------------------- CAT$ usage: -----------------------

1. Like old cat$ (but no zero-ing of MyBuffer needed):
mov eax, CAT$(addr MyBuffer, "Test", addr Src2)

2. Write strings to a default buffer:
invoke MessageBox, 0, CAT$(0, "Ciao ", addr YourName),
chr$("Title"), MB_OK

invoke MessageBox, 0, CAT$("Ciao ", addr YourName),
chr$("Title"), MB_OK
(Oops, no zero after CAT$? Well, an intelligent macro
knows that "Ciao" is not a destination buffer ...)

mov eax, CAT$(0, "Test1: ", addr Src2, str$(eax), "bytes")

3. Append to last position after a CAT$(0) or CAT$(addr MyBuffer):
mov eax, CAT$(1, "Test3", addr Src4)

[attachment deleted by admin]

The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: jj2007 on June 27, 2008, 11:02:21 PM