News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

File Compare

Started by MichaelW, May 12, 2006, 03:57:22 PM

Previous topic - Next topic

MichaelW

I needed a file compare procedure so I used this as an excuse to compare the run time for a conventional version that uses buffers allocated from the heap, to the run time for a version that uses file mapping. On my Windows 2000/P3 system, using the same windows.inc for both files, the version that uses file mapping runs in ~59% of the time required by the conventional version.

Based on these results, I would expect file mapping to produce similar decreases in run time for the MASM32 procedures that read and write files. Except for a potential problem with the MapViewOfFile function and a swap file that cannot get larger (unusual circumstances?), the required functions appear to be supported starting with Windows 95.




[attachment deleted by admin]
eschew obfuscation

Mark Jones

Interesting!

Quote from: AMD XP 2500+ / WinXP
correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_heapbuffers: 1045ms
filecmp_filemapping: 508ms

48.6%
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

Phoenix

Results oscillating max +- 0.6%...

Quote from: AMD XP 3000+ / WinXP SP2correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_heapbuffers: 962ms
filecmp_filemapping: 222ms

23.1%

P1

For P4 2Ghz, W2KSP4
correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_heapbuffers: 1582ms
filecmp_filemapping: 3281ms

Press any key to exit...


Regards,  P1  :8)

Ian_B

I wrote a simple file-joiner program recently. I was comparing its speed against a standard and widely-used utility, HJSplit, and despite it being absolutely stripped-down to bare ASM wrapping round the API calls it was depressingly slower by around 10%, whether I used synchronous or asynchronous code. I was advised to try file-mapping, on the basis that the buffering into memory was all being done by the kernel and should therefore be faster than standard read/writes to buffer, but while it made a bit of difference it still wasn't beating HJSplit on the join I was testing.

What allowed me to finally beat HJSplit by about 10% on speed was to stick with standard I/O but use the FILE_FLAG_NO_BUFFERING flag, to my surprise. It needs a little more care dealing with file-ends because of the need to pull-sector sized reads at all times, but it's by far the fastest way of doing I/O, even over file-mapping acording to the testing I did with my app.

Ian_B

MichaelW

This attachment this time includes a no-buffering version. On my Windows2000/P3-500 system the run time for the no-buffering version varies more than the run times for the other versions, with an average that is somewhere close to the run time for the heap buffer version.



[attachment deleted by admin]
eschew obfuscation

six_L

xp sp2 1.4GHz

filecmp
Quotecorrect return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_heapbuffers: 845ms
filecmp_filemapping: 140ms

Press any key to exit...
16.6%

filecmp2
Quotecorrect return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_nobuffering: x x x x  x  x 1

filecmp_heapbuffers: 863ms
filecmp_filemapping: 141ms

filecmp_nobuffering: 2840ms

Press any key to exit...
16.3%
regards

MichaelW

#7
I have now added a no-buffering asynchronous version, and while it is consistently faster than the heap buffer version, it does not come close to the file mapping version.

EDIT: Replaced attachment with new version

http://www.masmforum.com/simple/index.php?topic=4768.msg35790#msg35790



[attachment deleted by admin]
eschew obfuscation

Ghirai

Athlon 64 3000+, XP Pro SP2 (32b):

filecmp3:
Quotecorrect return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_nobuffering: x x x x  x  x 1

filecmp_nobuffering_async: x x x x  x  x 1

filecmp_heapbuffers: 991ms
filecmp_filemapping: 197ms

19.9%

filecmp_nobuffering: 6382ms

filecmp_nobuffering_async: 8788ms
MASM32 Project/RadASM mirror - http://ghirai.com/hutch/mmi.html

six_L

Quotecorrect return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_nobuffering: x x x x  x  x 1

filecmp_nobuffering_async: x x x x  x  x 1

filecmp_heapbuffers: 909ms
filecmp_filemapping: 141ms

filecmp_nobuffering: 2844ms

filecmp_nobuffering_async: 2888ms

Press any key to exit...
15.5%

regards

Phoenix

Quote from: Athlon 64 3000+ / WinXP SP2correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_nobuffering: x x x x  x  x 1

filecmp_nobuffering_async: x x x x  x  x 1

filecmp_heapbuffers: 967ms
filecmp_filemapping: 225ms

23.3%

filecmp_nobuffering: 2133ms

filecmp_nobuffering_async: 2093ms

Same CPU and OS as Ghirai, but very different results for filecmp_nobuffering versions?

dsouza123

Athlon 1190 Mhz, Windows XP SP2
Quote
correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_nobuffering: x x x x  x  x 1

filecmp_nobuffering_async: x x x x  x  x 1

filecmp_heapbuffers: 2307ms
filecmp_filemapping: 487ms

21.1%

filecmp_nobuffering: 4310ms

filecmp_nobuffering_async: 4125ms

Ossa

Athlon 64 Mobile 3400+, Windows XP SP2:

correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_nobuffering: x x x x  x  x 1

filecmp_nobuffering_async: x x x x  x  x 1

filecmp_heapbuffers: 804ms
filecmp_filemapping: 176ms

21.9%

filecmp_nobuffering: 2780ms

filecmp_nobuffering_async: 2514ms


Ossa
Website (very old): ossa.the-wot.co.uk

MichaelW

Quote from: Phoenix on May 13, 2006, 01:38:10 PM
Same CPU and OS as Ghirai, but very different results for filecmp_nobuffering versions?

I suspect the no-buffering versions are being strongly affected by hard disk performance, and the buffered versions strongly affected by CPU/memory performance. My system has a relatively fast hard disk (a Western Digital JB model with an 8MB cache), and a relatively slow CPU and memory (a 500MHz P3 and SDR, single channel, 133MHz SDRAM), so as a result the heap buffer and no-buffering versions have similar run times.

filecmp_heapbuffers: 5554ms
filecmp_filemapping: 3290ms

59.2%

filecmp_nobuffering: 5326ms

filecmp_nobuffering_async: 5231ms


Compared to my system, these newer systems have much higher CPU/memory performance, but may or may not have higher hard disk performance. Perhaps Ghirai's system has a relatively slow hard disk.

Testing on my system I was somewhat disappointed that the file mapping version was less than twice as fast as the conventional version. I was expecting 3-4 times faster, as it is for the newer/faster systems.

eschew obfuscation

Ian_B

#14
Michael

I'd note a few things briefly from looking at your code. First, the no-buffering version doesn't actually compare ALL the files, you truncate rather than over-running the end to the next sector boundary, so it's not actually doing the same thing as the other two procs. Second, you've hard-coded the sector multiple as 512, which may or may not be an appropriate number fudge. The correct way to find it, as I'm sure you know, is on an individual disk basis using GetDiskFreeSpace (it's the second returned parameter, Bytes Per Sector).

The third thing is that you are also comparing, rightly or wrongly, the varying speeds of allocating the memory, which has been thrashed over before. By using VirtualAlloc you are always going to get a slower result than using the heap functions, surely? It's hardly a fair comparison therefore. I realise this was part of the point of your test, though, to find the speed of the routine when using a heap memory allocation.

For what it's worth, in my app I isolated the memory allocation from the I/O. Because I am writing apps that assume they will do a lot of I/O tasks, I don't cripple them by allocating buffer memory on an as-needed basis every time. I make a 2Mb fixed buffer (generally used as a 1Mb read and a 1Mb write buffer) with VirtualAlloc on startup and reuse that. It's guaranteed sector-multiple-sized and only needs creating once, and I always know where it is.  :P  That is probably why my app speeded up the no-buffering I/O compared to the filemapping functions which must surely do their own allocation "under the hood".

I should, therefore, probably qualify my prevous comment by saying my experience shows that if you already have an appropriately aligned buffer you can use, the no-buffering I/O should be faster from an end-user POV. Making a single buffer once on startup for reuse is a better real-world optimisation for the entire app, looking at it from a wider perspective than this macro-oriented local usage. The extra time taken for memory allocation is also subsumed into startup time where the end user won't "notice" it as much as when it's part of a work procedure.

Ian_B