News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Dynamic string array benchmark

Started by hutch--, April 06, 2008, 09:04:42 AM

Previous topic - Next topic

hutch--

No,

You are missing the latest beta that the example came from. It builds under version 10i, thats why I posted the working binary as well so it cold be run without building it.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

5 million element test

31 ms array create
2563 ms array load data
31 ms array read
1859 ms array delete

Celeron 2.4 GHz, 448 MB Ram

herge


Hi hutch-:


5 million element test
401 ms array create
31625 ms array load data
681 ms array read
29392 ms array delete
Press any key to continue ...


Thanks.
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

lingo

core2duo E8500 3.16Ghz 1111Mhz/2M  Vista64

5 million element test

15 ms array create
687 ms array load data
0 ms array read
437 ms array delete
Press any key to continue ...

cmpxchg

I am getting either 15 or 0 ms for the same test. Doesn't matter which one(create or read). So, I'm thinking maybe its windows switching tasks every 15ms. Maybe test on smaller # of elements needs to be devised?
If such mistake exists then other tests really don't tell much because nobody knows how often Windows switches task with all your drivers & user apps running.
Thanks for reading.

hutch--

Hi cmpxchg,

Welcome on board. The results at such a low timing interval are not reliable as the GetTickCount() API does not have a fine enough granularity. Its OK at over one quarter of a second but nearly useless for  figures that small. All it demonstrates is that the create and read functions are almost negligible against the rest that do more work.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

azdps

Windows XP SP3 / Vostro 1400 Laptop / Core 2 Duo T9300 @2.50GHz  / 4GB of ram


Benchmarking array methods on 5 million members
1000 ms array create
547 ms array load data
16 ms array read
1000 ms array delete
Press any key to continue ...

5 million element test
31 ms array create
1281 ms array load data
16 ms array read
1000 ms array delete
Press any key to continue ...

shill

Windows XP SP2 / Core 2 Duo E4500 @ 2.31 GHz (overclocked 5%) / 2 GB of RAM

Benchmarking array methods on 5 million members
1234 ms array create
860 ms array load data
15 ms array read
1360 ms array delete

Mark_Larson

 Hey Hutch,

  I downloaded J and put it on a different drive.  I got your code to compile.  I am currently looking at $arrset to speed it up.  I already saw several speed ups.  How do I manually recompile m32lib?  I thought there was a batch file I could run.  I have to run now, but I will be back later.

BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

herge

 Hi Mark:


@echo off

copy masm32.inc \masm32\include\masm32.inc

del masm32.lib                      : delete any existing MASM32 Library

dir /b *.asm > ml.rsp               : create a response file for ML.EXE
\masm32\bin\ml /c /coff @ml.rsp
if errorlevel 0 goto okml
del ml.rsp
echo ASSEMBLY ERROR BUILDING LIBRARY MODULES
goto theend

:okml
\masm32\bin\link -lib *.obj /out:masm32.lib
if exist masm32.lib goto oklink

echo LINK ERROR BUILDING LIBRARY
echo The MASM32 Library was not built
goto theend

:oklink
copy masm32.lib \masm32\lib\masm32.lib

:theend
if exist masm32.lib del *.obj

dir \masm32\lib\masm32.lib
dir \masm32\include\masm32.inc

This is make.bat and is usually in \masm32\m32lib directory

Regards herge.
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

Mark_Larson

BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

Mark_Larson

arralloc is fast already.  So I am starting with arrset. 

I am getting 1188 with the unmodified code.

with my modifications I am getting 922.  I still have some bugs to work out.

I tired all the different Allocs  ( in place of SysAllocStringByteLenj), and manually copying the memory.  HeapAlloc was the fastest.  I did GetHeapMemory in Main()
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

hutch--

Mark,

Grab the current beta of masm32, I had to fix the original arralloc function as I had made a mistake with ESI that made it dangerous, its been fixed and no longer has the problem. In the beta its now part of the masm32 library so any mods can be built by running the make.bat file.

I looked at HeapAlloc early in the development but was wary of using it due to fragmentation problems which increase over time as array members are added and removed, OLE is reasonably well geared here but it has always been a bit slower than the lower level allocation methods. I wanted the characteristic of the length stored 4 bytes below the start address as it saves any length calculaions.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Mark_Larson

Quote from: hutch-- on August 09, 2008, 02:09:10 AM
Mark,

Grab the current beta of masm32, I had to fix the original arralloc function as I had made a mistake with ESI that made it dangerous, its been fixed and no longer has the problem. In the beta its now part of the masm32 library so any mods can be built by running the make.bat file.

I looked at HeapAlloc early in the development but was wary of using it due to fragmentation problems which increase over time as array members are added and removed, OLE is reasonably well geared here but it has always been a bit slower than the lower level allocation methods. I wanted the characteristic of the length stored 4 bytes below the start address as it saves any length calculaions.

  I grabbed (J) the same day I posted this.  It was the very top Beta in the MASM32 forum.  I decided not to do the makeit.bat and my c.bat to compile.  To save a step I cut and pasted the code into bmark.asm

  I also got better speeds using GlobalAlloc.  It was 20 ms slower than Healalloc.

  have you thought about removing the prologue and epilogue code for strset?  It gets called 5 million times, that adds up over time.

  This is what I am doing to copy the data.  I am definitely not using REP ( It was slower since it only starts getting fast around 64 bytes, and most of the strings were a LOT smaller than that).

  This is the main loop of where I copy the data.  I do a dword copy at a time.  The buffer will be dword aligned, so I copy all the dwords first, and then after the dword loop, I transfer any odd bytes that don't fit into a dword.  Here is the Dword loop.


align 16
my_loop:
mov ecx,[esi]
mov [edx],ecx
add esi,4
add edx,4

sub edi,4
jg my_loop
pop ecx

BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

jj2007

Quote from: Mark_Larson on August 09, 2008, 12:49:02 PM
Here is the Dword loop.


align 16
my_loop:
mov ecx,[esi]
mov [edx],ecx
add esi,4
add edx,4

sub edi,4
jg my_loop


About 3% faster (Celeron M) and 4 bytes shorter:


@@: mov eax,[esi+ecx]
mov [edi+ecx],eax
sub ecx,4
jnc @b