The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: MichaelW on May 12, 2006, 03:57:22 PM

Title: File Compare
Post by: MichaelW on May 12, 2006, 03:57:22 PM
I needed a file compare procedure so I used this as an excuse to compare the run time for a conventional version that uses buffers allocated from the heap, to the run time for a version that uses file mapping. On my Windows 2000/P3 system, using the same windows.inc for both files, the version that uses file mapping runs in ~59% of the time required by the conventional version.

Based on these results, I would expect file mapping to produce similar decreases in run time for the MASM32 procedures that read and write files. Except for a potential problem with the  MapViewOfFile (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/mapviewoffile.asp) function and a swap file that cannot get larger (unusual circumstances?), the required functions appear to be supported starting with Windows 95.




[attachment deleted by admin]
Title: Re: File Compare
Post by: Mark Jones on May 12, 2006, 04:03:32 PM
Interesting!

Quote from: AMD XP 2500+ / WinXP
correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_heapbuffers: 1045ms
filecmp_filemapping: 508ms

48.6%
Title: Re: File Compare
Post by: Phoenix on May 12, 2006, 06:34:31 PM
Results oscillating max +- 0.6%...

Quote from: AMD XP 3000+ / WinXP SP2correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_heapbuffers: 962ms
filecmp_filemapping: 222ms

23.1%
Title: Re: File Compare
Post by: P1 on May 12, 2006, 07:20:13 PM
For P4 2Ghz, W2KSP4
correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_heapbuffers: 1582ms
filecmp_filemapping: 3281ms

Press any key to exit...


Regards,  P1  :8)
Title: Re: File Compare
Post by: Ian_B on May 12, 2006, 07:40:29 PM
I wrote a simple file-joiner program recently. I was comparing its speed against a standard and widely-used utility, HJSplit, and despite it being absolutely stripped-down to bare ASM wrapping round the API calls it was depressingly slower by around 10%, whether I used synchronous or asynchronous code. I was advised to try file-mapping, on the basis that the buffering into memory was all being done by the kernel and should therefore be faster than standard read/writes to buffer, but while it made a bit of difference it still wasn't beating HJSplit on the join I was testing.

What allowed me to finally beat HJSplit by about 10% on speed was to stick with standard I/O but use the FILE_FLAG_NO_BUFFERING flag, to my surprise. It needs a little more care dealing with file-ends because of the need to pull-sector sized reads at all times, but it's by far the fastest way of doing I/O, even over file-mapping acording to the testing I did with my app.

Ian_B
Title: Re: File Compare
Post by: MichaelW on May 12, 2006, 09:40:05 PM
This attachment this time includes a no-buffering version. On my Windows2000/P3-500 system the run time for the no-buffering version varies more than the run times for the other versions, with an average that is somewhere close to the run time for the heap buffer version.



[attachment deleted by admin]
Title: Re: File Compare
Post by: six_L on May 13, 2006, 12:25:32 AM
xp sp2 1.4GHz

filecmp
Quotecorrect return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_heapbuffers: 845ms
filecmp_filemapping: 140ms

Press any key to exit...
16.6%

filecmp2
Quotecorrect return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_nobuffering: x x x x  x  x 1

filecmp_heapbuffers: 863ms
filecmp_filemapping: 141ms

filecmp_nobuffering: 2840ms

Press any key to exit...
16.3%
Title: Re: File Compare
Post by: MichaelW on May 13, 2006, 06:36:45 AM
I have now added a no-buffering asynchronous version, and while it is consistently faster than the heap buffer version, it does not come close to the file mapping version.

EDIT: Replaced attachment with new version

http://www.masmforum.com/simple/index.php?topic=4768.msg35790#msg35790



[attachment deleted by admin]
Title: Re: File Compare
Post by: Ghirai on May 13, 2006, 10:03:16 AM
Athlon 64 3000+, XP Pro SP2 (32b):

filecmp3:
Quotecorrect return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_nobuffering: x x x x  x  x 1

filecmp_nobuffering_async: x x x x  x  x 1

filecmp_heapbuffers: 991ms
filecmp_filemapping: 197ms

19.9%

filecmp_nobuffering: 6382ms

filecmp_nobuffering_async: 8788ms
Title: Re: File Compare
Post by: six_L on May 13, 2006, 12:08:58 PM
Quotecorrect return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_nobuffering: x x x x  x  x 1

filecmp_nobuffering_async: x x x x  x  x 1

filecmp_heapbuffers: 909ms
filecmp_filemapping: 141ms

filecmp_nobuffering: 2844ms

filecmp_nobuffering_async: 2888ms

Press any key to exit...
15.5%

Title: Re: File Compare
Post by: Phoenix on May 13, 2006, 01:38:10 PM
Quote from: Athlon 64 3000+ / WinXP SP2correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_nobuffering: x x x x  x  x 1

filecmp_nobuffering_async: x x x x  x  x 1

filecmp_heapbuffers: 967ms
filecmp_filemapping: 225ms

23.3%

filecmp_nobuffering: 2133ms

filecmp_nobuffering_async: 2093ms

Same CPU and OS as Ghirai, but very different results for filecmp_nobuffering versions?
Title: Re: File Compare
Post by: dsouza123 on May 13, 2006, 04:53:09 PM
Athlon 1190 Mhz, Windows XP SP2
Quote
correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_nobuffering: x x x x  x  x 1

filecmp_nobuffering_async: x x x x  x  x 1

filecmp_heapbuffers: 2307ms
filecmp_filemapping: 487ms

21.1%

filecmp_nobuffering: 4310ms

filecmp_nobuffering_async: 4125ms
Title: Re: File Compare
Post by: Ossa on May 13, 2006, 05:02:52 PM
Athlon 64 Mobile 3400+, Windows XP SP2:

correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_nobuffering: x x x x  x  x 1

filecmp_nobuffering_async: x x x x  x  x 1

filecmp_heapbuffers: 804ms
filecmp_filemapping: 176ms

21.9%

filecmp_nobuffering: 2780ms

filecmp_nobuffering_async: 2514ms


Ossa
Title: Re: File Compare
Post by: MichaelW on May 13, 2006, 05:22:35 PM
Quote from: Phoenix on May 13, 2006, 01:38:10 PM
Same CPU and OS as Ghirai, but very different results for filecmp_nobuffering versions?

I suspect the no-buffering versions are being strongly affected by hard disk performance, and the buffered versions strongly affected by CPU/memory performance. My system has a relatively fast hard disk (a Western Digital JB model with an 8MB cache), and a relatively slow CPU and memory (a 500MHz P3 and SDR, single channel, 133MHz SDRAM), so as a result the heap buffer and no-buffering versions have similar run times.

filecmp_heapbuffers: 5554ms
filecmp_filemapping: 3290ms

59.2%

filecmp_nobuffering: 5326ms

filecmp_nobuffering_async: 5231ms


Compared to my system, these newer systems have much higher CPU/memory performance, but may or may not have higher hard disk performance. Perhaps Ghirai's system has a relatively slow hard disk.

Testing on my system I was somewhat disappointed that the file mapping version was less than twice as fast as the conventional version. I was expecting 3-4 times faster, as it is for the newer/faster systems.

Title: Re: File Compare
Post by: Ian_B on May 13, 2006, 07:41:39 PM
Michael

I'd note a few things briefly from looking at your code. First, the no-buffering version doesn't actually compare ALL the files, you truncate rather than over-running the end to the next sector boundary, so it's not actually doing the same thing as the other two procs. Second, you've hard-coded the sector multiple as 512, which may or may not be an appropriate number fudge. The correct way to find it, as I'm sure you know, is on an individual disk basis using GetDiskFreeSpace (it's the second returned parameter, Bytes Per Sector).

The third thing is that you are also comparing, rightly or wrongly, the varying speeds of allocating the memory, which has been thrashed over before. By using VirtualAlloc you are always going to get a slower result than using the heap functions, surely? It's hardly a fair comparison therefore. I realise this was part of the point of your test, though, to find the speed of the routine when using a heap memory allocation.

For what it's worth, in my app I isolated the memory allocation from the I/O. Because I am writing apps that assume they will do a lot of I/O tasks, I don't cripple them by allocating buffer memory on an as-needed basis every time. I make a 2Mb fixed buffer (generally used as a 1Mb read and a 1Mb write buffer) with VirtualAlloc on startup and reuse that. It's guaranteed sector-multiple-sized and only needs creating once, and I always know where it is.  :P  That is probably why my app speeded up the no-buffering I/O compared to the filemapping functions which must surely do their own allocation "under the hood".

I should, therefore, probably qualify my prevous comment by saying my experience shows that if you already have an appropriately aligned buffer you can use, the no-buffering I/O should be faster from an end-user POV. Making a single buffer once on startup for reuse is a better real-world optimisation for the entire app, looking at it from a wider perspective than this macro-oriented local usage. The extra time taken for memory allocation is also subsumed into startup time where the end user won't "notice" it as much as when it's part of a work procedure.

Ian_B
Title: Re: File Compare
Post by: MichaelW on May 14, 2006, 07:00:38 PM
Thanks Ian, I was hoping you would take a look at the code and comment on it. The truncation, the choice of using virtual allocate, and the fixed sector size were all just sloppy shortcuts to save time. My focus was on reasonably verifying that file mapping is the best method for this application. I was assuming that the normal sector size is still 512 bytes, is this no longer so?

I modified the filecmp_nobuffering and filecmp_nobuffering_async procedures to allocate from the heap and calculate and use sector-aligned start addresses for the buffers. As a matter of convenience I am still truncating the files to a multiple of the sector size, but given the size of the test file this should not have any significant effect. On my system filecmp_nobuffering is a little faster and filecmp_nobuffering_async much slower. I have replaced the last attachment with the new version.
Title: Re: File Compare
Post by: Ian_B on May 15, 2006, 12:30:43 AM
OK, this is bugging me. I've rewritten the non-async no_buffering code to strip out all the slow shifts (the sector-alignment can be done with ANDs using a MOD value made from the sector-size -1) and I still can't get the elapsed time down from nearly 6000ms compared to the simple heap code at 1400ms. That's even with the sector-size check done out of the loop. I can't believe those few extra instructions are causing such an enormous slow-up, apart from those it's doing exactly the same as the simple code, allocating memory X 2, reading in 2 files to memory and doing exactly the same compare.  :eek

However... logic is now returning and is telling me that the reason may be the simple fact of reading using FILE_FLAG_NO_BUFFERING. If you do 100 reads in a loop WITHOUT using that flag, the kernel will cache it, and it will not read from disk 99 of those times but will fetch from memory. And when you test the filemapping code straight after it doesn't even have to read it once from disk!  :bdg  Whereas the no_buffered version almost certainly has no such luck. Doing the test using multiple repeat reads of the same file is not going to work to give a clear answer of which may be faster in general usage, when you tend to read a file only once.  :wink

Maybe for a better test copy the file 40 times and rename the copies with a split number. (This could be automated in the app set-up, just make sure the files are written no-buffered!! Don't forget to delete afterwards...) Then either put all these names in an array and work through them using a pointer that updates by the length of the name, or use one name and update the split number in it between tests. That way you can do 10 tests per proc (comparing against the original file) and every test gets a different but identical file that won't be cached. Doing it this way means you can simplify the code to only read in the test file too, you may as well keep the original in memory through the tests.

For interest, here's my version of the sector-alignment code, it works identically for rounding up the memory allocation startpoint. Note that there was an error in your code, you released the wrong memory pointers in the no-buffering section (the rounded numbers, not the membase values) and in the two no-buffering timer loops there's a superfluous PUSH EAX.

    invoke GetDiskFreeSpace,NULL,    ; inside or outside loop
                            ADDR junkreturn,
                            ADDR bytesPerSector,
                            ADDR junkreturn,
                            ADDR junkreturn

...

    mov ebx, bytesPerSector
    sub ebx, 1              ; make MOD value from power of 2 returned

    ; Get file size and extend to multiple of sector size

    invoke GetFileSize,hFile1,NULL
    mov ecx, ebx
    mov lof1, eax           ; save real length of file, use for compare function
    test eax, ebx           ; is there a remainder
    jz  @F

    not ecx
    add eax, ebx            ; add sector-length - 1
    and eax, ecx            ; round up to next sector-length
@@:
    mov secbytes1, eax      ; save correct sector-multiple read length
; you need this value as an extra LOCAL, must be same for second file

    invoke GetFileSize,hFile2,NULL
    cmp lof1, eax
    jne quit_nobuffer
; but you can save yourself lof2 as this must be identical to lof1

    ; Allocate memory and calc sector-aligned start address

    invoke GetProcessHeap
    mov ecx, secbytes1
    xor edx, edx
    add ecx, ebx            ; max needed memsize is rounded length + (sector-1)
    push ecx                ; set params now for next HeapAlloc call
    push edx
    push eax
    invoke HeapAlloc,eax,edx,ecx
    mov memBase1, eax
    mov ecx, ebx
    test eax, ebx           ; is mem allocation sector-aligned
    jz  @F

    not ecx
    add eax, ebx            ; add sector-length - 1
    and eax, ecx            ; round up to next sector-multiple start
@@:
    mov lpMem1, eax
    call HeapAlloc          ; params already on stack
    mov memBase2, eax
    mov ecx, ebx
    test eax, ebx           ; is mem allocation sector-aligned
    jz  @F

    not ecx
    add eax, ebx            ; add sector-length - 1
    and eax, ecx            ; round up to next sector-multiple start
@@:
    mov lpMem2, eax

    invoke ReadFile,hFile1,lpMem1,secbytes1,ADDR bytesRead,NULL
    invoke ReadFile,hFile2,lpMem2,secbytes1,ADDR bytesRead,NULL

    invoke cmpmem,lpMem1,lpMem2,lof1    ; whole file compared!

...

    ret

  quit_nobuffer:
    xor ebx, ebx
    jmp @B


Ian_B
Title: Re: File Compare
Post by: MichaelW on May 15, 2006, 06:49:47 AM
Thanks for the improved code, and for finding my mistake releasing the wrong memory pointers. Now that I look at it, using the rounded up value for the buffers and the reads and the actual length for the compare should have been a no-brainer. I made no real effort to use efficient code because I couldn't see how a few hundred clock cycles could make a significant difference with a 1,127,716-byte test file. You have a good point about the caching effects, but I coded this for an application where the files normally would be cached. I think the most common use for a file comparison would be after performing some operation on the files being compared, a COPY /V for example.

Title: Re: File Compare
Post by: ic2 on May 15, 2006, 03:34:12 PM
Just start reading this thread, This is what i get with filecmp4 on P3 846, 128 ram, XP no service pack,  but stripped down with nLite. nLite should not
be an issue i don't think. Is this strange or what 3ms each...


correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_nobuffering: x x x x  x  x 1

filecmp_nobuffering_async: x x x x  x  x 1

filecmp_heapbuffers: 3ms
filecmp_filemapping: 3ms

100.9%

filecmp_nobuffering: 3ms

filecmp_nobuffering_async: 3ms

Title: Re: File Compare
Post by: ic2 on May 15, 2006, 03:38:28 PM
The 100.9% is 100.0% .... typing error.
Title: Re: File Compare
Post by: ic2 on May 15, 2006, 03:57:06 PM
This is what i get with filecmp4 on my Intel Celeron 498 MHz, 384 ram, XP no service pack,  but stripped down with nLite just like on other machine.


correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_nobuffering: x x x x  x  x  -1

filecmp_nobuffering_async: x x x x  x  x  -1

filecmp_heapbuffers: 9ms
filecmp_filemapping: 10ms

111.1%

filecmp_nobuffering: 10ms

filecmp_nobuffering_async: 10ms
Title: Re: File Compare
Post by: P1 on May 15, 2006, 04:52:57 PM
P4 2GHz W2KSP4
correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 1
filecmp_filemapping: 1 0 0 0 -1 -1 1

filecmp_nobuffering: x x x x  x  x 1

filecmp_nobuffering_async: x x x x  x  x 1

filecmp_heapbuffers: 1639ms
filecmp_filemapping: 3226ms

filecmp_nobuffering: 6701ms

filecmp_nobuffering_async: 11401ms

Press any key to exit...


Regards,  P1  :8)
Title: Re: File Compare
Post by: Mark Jones on May 15, 2006, 07:01:34 PM
Quote from: Ian_B on May 15, 2006, 12:30:43 AM
Maybe for a better test copy the file 40 times and rename the copies with a split number.

Consider allocating contiguous blocks to eliminate file fragmentation latency.
Title: Re: File Compare
Post by: Ian_B on May 16, 2006, 12:23:20 AM
I wrote a version of Michael's test routine that worked like I suggested, saving 80 copies of windows.inc (unbuffered) and reading each one in as a separate file comparison. It's the only way to avoid caching effects and get a realistic estimate of the different speeds of handling file input. As I suspected, my previous experience was vindicated and FILE_FLAG_NO_BUFFERING is indeed the fastest way of reading files (and writing them, although there's a nasty wrinkle on the end where you have to close/reopen the file without the flag to set the filepointer to truncate the final write to the real EOF as you have been forced to over-write to a sector boundary). Having measured the substantial difference over reading/writing almost 900 1Mb files while I was testing my joiner app, I was pretty sure this more modest test would support me.  :toothy  The big surprise is the poor performance of the file-mapping code.

This code strips out everything unnecessary and keeps a single copy of the original file through the test. It is therefore getting an accurate estimate as Michael originally wanted of the "cost" of setting up a memory buffer if needed and doing the read into the buffer, and only that. Note that the asynchronous code performs pretty well, although this is a really trivial example of its use, reading in an entire file at once! To make the extra overhead of handling events and overlapped structures worthwhile, it's most generally used for dealing with chunks of a file so that you can process one data chunk while the next few chunks are being read in, so that in an ideal case the process thread(s) never has to wait for further reads to complete to get new data to process, or wait for any writes of the processed data to complete before starting a new chunk.

Michael, I understand your view of the caching working in your favour depending on what you want to do subsequently with the file, but the point of your original test was to compare read times, and non-buffering is provably the fastest way of doing that for the simple examples in the library or for most other applications where the important thing is simply to get the data in memory for some sort of processing.

Reading test file...
Creating comparison files...

filecmp_heapbuffers: 554ms

filecmp_filemapping: 687ms - 124.0%

filecmp_nobuffering: 528ms - 95.3%

filecmp_nobuffering_async: 537ms - 96.9%

Press ENTER to exit...


Make sure you alter the drive location of the testfile before testing. There may be a few easily-fixed wrinkles in the assemble due to small personalised variations in my MASM setup.

Ian_B

EDIT! Minor bugfix, doesn't affect functionality/timing, just cleanup for unmatched stack pushes on error.


[attachment deleted by admin]
Title: Re: File Compare
Post by: six_L on May 16, 2006, 01:00:23 AM
QuoteReading test file...
Creating comparison files...

filecmp_heapbuffers: 1084ms

filecmp_filemapping: 1027ms - 94.7%

filecmp_nobuffering: 1128ms - 104.1%

filecmp_nobuffering_async: 993ms - 91.6%


Press ENTER to exit...
;;;
After the minor bigfixed
;;;
QuoteReading test file...
Creating comparison files...

filecmp_heapbuffers: 1252ms

filecmp_filemapping: 1015ms - 81.1% 

filecmp_nobuffering: 1004ms - 80.2%

filecmp_nobuffering_async: 1453ms - 116.1%

Press ENTER to exit...

Title: Re: File Compare
Post by: Ian_B on May 16, 2006, 01:03:38 AM
Guess the variation might be platform-dependent - my system is P4 Northwood, what's yours six_L?
Title: Re: File Compare
Post by: six_L on May 16, 2006, 02:01:15 AM
Home xp sp2
Intel(R) Pentium(R) 1.4GHz
Title: Re: File Compare
Post by: MichaelW on May 16, 2006, 06:58:25 AM
Ian,

You have convinced me. At least on most recent systems non-buffering is the fastest method of moving disk data that is not in the cache, into a memory buffer.

Thinking about your suggestion of using multiple test files it occurred to me that rotating such a small number of copies would be unlikely to eliminate all caching effects because the system cache is typically large, greater than 300MB on my 512MB system, for example.

For the asynchronous procedure and large files, might there be some benefit to breaking up the reads into smaller chunks, comparing each pair of chunks while the next pair was being read?

Results on my P3-500, Windows 2000 SP4 system:

filecmp_heapbuffers: 1016ms

filecmp_filemapping: 821ms - 80.8%

filecmp_nobuffering: 728ms - 71.7%

filecmp_nobuffering_async: 724ms - 71.3%

Title: Re: File Compare
Post by: Ian_B on May 16, 2006, 09:44:34 AM
Quote from: MichaelW on May 16, 2006, 06:58:25 AM
You have convinced me. At least on most recent systems non-buffering is the fastest method of moving disk data that is not in the cache, into a memory buffer.

I wouldn't like to say it's cut and dried, especially from the small set of results posted here so far. It definitely seems to be so on my puter, that's all I'm sure of now. Having said that, running the test repeatedly shows a wide variation of results: although the initial heapbuffering is remarkably consistent in absolute timing for me, the filemapping always slower, the nobuffering ranges from 95% to 100% and the async code has reached 125% in one run. This really isn't a final "proof" of the relative merits of these different approaches, but it's a good indicator and a start if someone wants to make further experiments.

QuoteThinking about your suggestion of using multiple test files it occurred to me that rotating such a small number of copies would be unlikely to eliminate all caching effects because the system cache is typically large, greater than 300MB on my 512MB system, for example.

True. But the odd thing about your original test was that the non-buffering flag even seemed to be forcing the read to ignore the cached data where it existed, which was why your non-buffering examples were comparing so appallingly, so I'm reasonably sure that by writing the test files non-buffered we should eliminate all pre-caching for the test fairly. And the results being so much closer together in this test tend to support that theory.

QuoteFor the asynchronous procedure and large files, might there be some benefit to breaking up the reads into smaller chunks, comparing each pair of chunks while the next pair was being read?

Exactly how it should be done to make the most of asynchronous I/O. When I was first introduced to that, my ASM mentor decribed his own experiments in testing speed against other apps, and he was sure from his testing that the optimum size of data chunk for async reading was 64Kb, as this was an internal buffer size for Windows. I've used that size in my own code successfully (my 2Mb read/write buffer has an array of 32 extended overlaps to control it), but I can't definitely confirm whether it is still the best. Obviously setting up that amount of overhead is something best done once at startup rather than on an ad-hoc basis every time you need to read a file, though. To help the management code, so I know what should have been read and whether I have a result pending in the overlap, I extend my overlaps like this (plus having 8 DWORDs helps wrap around from the end of the overlap array to the beginning, by using a MOD value):

OVERLAPPED_PLUS STRUCT
                        OVERLAPPED <> ; as normal OVERLAPPED plus extra 3 DWORDs
; OVERLAPPED STRUCT
;   Internal            DWORD ?
;   InternalHigh        DWORD ?
;   OffsetLow           DWORD ?   ; this is better naming!
;   OffsetHigh          DWORD ?
;   hEvent              DWORD ?
; OVERLAPPED ENDS

  Transferred           DWORD ?   ; this is the output variable for the read/write call
  Requested             DWORD ?   ; this should match after a read/write call
                                  ; top bits used as flags to show last block in file
                                  ; and overlap operation pending for this block
  BufferAddress         DWORD ?   ; pointer to the 64K buffer this overlap controls
OVERLAPPED_PLUS ENDS


Without going into multi-threading, the point of asynchronous I/O is to eliminate all the waiting for reads/writes to take place. In this case, by the time you have finished comparing the first chunk the rest of the data should be stacking up nicely in the later sub-buffers ready to test. This is a perfect use of async code, where the processing of one chunk is not dependent on other parts of the file. Zipping, encoding/decoding, CRCing, doing MD5s etc. are all prime candidates for this approach where there is a lot of processing that can be done while you'd otherwise be sitting waiting for the reads to supply new data.

Ian_B
Title: Re: File Compare
Post by: Mark Jones on May 16, 2006, 05:03:32 PM
Quote from: AMD XP 2500+ / XP Pro SP2 / nVIDIA Chipset
Reading test file...
Creating comparison files...

filecmp_heapbuffers: 743ms

filecmp_filemapping: 1133ms - 152.5%

filecmp_nobuffering: 514ms - 69.2%

filecmp_nobuffering_async: 457ms - 61.5%
Title: Re: File Compare
Post by: P1 on May 17, 2006, 02:22:21 PM
P4 2GHz, W2K-SP4
Reading test file...
Creating comparison files...

filecmp_heapbuffers: 897ms

filecmp_filemapping: 870ms - 97.0%

filecmp_nobuffering: 875ms - 97.5%

filecmp_nobuffering_async: 876ms - 97.7%

Press ENTER to exit...


Regards,  P1  :8)
Title: Re: File Compare
Post by: mnemonic on May 17, 2006, 03:25:25 PM
Turion 64 / XP Home (32bit) SP2

Reading test file...
Creating comparison files...

filecmp_heapbuffers: 1364ms

filecmp_filemapping: 841ms - 61.7%

filecmp_nobuffering: 840ms - 61.6%

filecmp_nobuffering_async: 834ms - 61.1%


Press ENTER to exit...
Title: Re: File Compare
Post by: VlasRad on May 17, 2006, 07:51:42 PM
You forget important.

First, avoid fragmentation - use on defragmented disks, SetFilePointer to allocate entire output size before start WriteFile.

Also, you only compare absolute time? You also should check CPU usage - filemapping might surprise you!

When test buffered approach (heapbuffer, filemap) make sure to test new file or flush fs cache. Note that filemap has no chance be uncached, always will cahce.
Title: Re: File Compare
Post by: Ian_B on May 17, 2006, 08:09:38 PM
Quote from: VlasRad on May 17, 2006, 07:51:42 PM
You forget important.

First, avoid fragmentation - use on defragmented disks, SetFilePointer to allocate entire output size before start WriteFile.

VlasRad, there are many variables. The point of Michael's original test and my followup was to compare the different times of loading a file that a small, normal program might use. Most normal programs have to cope with fragmented disks, they don't defragment before they start reading/writing files! What we have now is something approaching a real-world test with real-world settings, I don't believe for a moment it's perfect, though, and I hope people will develop it for better results.

QuoteWhen test buffered approach (heapbuffer, filemap) make sure to test new file or flush fs cache. Note that filemap has no chance be uncached, always will cahce.

That's why in my test I make sure the copy files are written unbuffered. All 80 test files are new on every run and should be equally uncached for the four tests.

Ian_B
Title: Re: File Compare
Post by: VlasRad on May 18, 2006, 06:00:32 PM
Quote
QuoteWhen test buffered approach (heapbuffer, filemap) make sure to test new file or flush fs cache. Note that filemap has no chance be uncached, always will cahce.
That's why in my test I make sure the copy files are written unbuffered. All 80 test files are new on every run and should be equally uncached for the four tests.
Good, sorry I not look completely into your test methods :(
Still fragmentation can have big effect punish one method needlessy. If use on defragmented drive and reserve filesize before test (simple SetFilePointer should sufficient and fast on NTFS) raw test be more fair.

QuoteVlasRad, there are many variables. The point of Michael's original test and my followup was to compare the different times of loading a file that a small, normal program might use.
Yes raw speed of routine is not only important factor, you need consider environment and how is typically used. Unbuffered+async good for massive file hashing/checksum, but not necisarily best for "normal" use.

QuoteMost normal programs have to cope with fragmented disks, they don't defragment before they start reading/writing files!
Yes but if you want measure speed of actual routines and not of fragmented filesystem, you need defragment file. Of course it might interesting to test routines on fragmented files but you need exactly same fragmented file then, and to not filesystem cache.
QuoteWhat we have now is something approaching a real-world test with real-world settings, I don't believe for a moment it's perfect, though, and I hope people will develop it for better results.
Please do the test of CPU usage while reading file, it is other important test factor. In the end you should really get similar task completion speeds because task I/O bound by slow hard drive, CPU usage is not bounded like that. Page faults expensive, and you get many on using mapped file.
Title: Re: File Compare
Post by: dsouza123 on May 18, 2006, 08:21:36 PM
Will these various I/O routines work with a RAM Disk ? (such as RAMDisk from Cenatek)
If so how will the routines perform and what would affect performance ?

I don't have one so I couldn't test it.

There are also hardware memory based drives, such as the Rocket Drive also from Cenatek,
and the i-RAM from Gigabyte, that are PCI cards with sticks of RAM to hold the data.
Title: Re: File Compare
Post by: VlasRad on May 18, 2006, 08:45:56 PM
I see no reason they notwork with ram disk or i-ram, but performance difference hard to measure then, likely.

But cpu usage might be more interesting with that kind drives since not I/O bound.
Title: Re: File Compare
Post by: sixleafclover on May 26, 2006, 02:57:18 AM
Errm if you cared...

correct return vals: 1 0 0 0 -1 -1 1
filecmp_heapbuffers: 1 0 0 0 -1 -1 -1
filecmp_filemapping: 1 0 0 0 -1 -1 -1

filecmp_heapbuffers: 1ms
filecmp_filemapping: 1ms

100.0%

Press any key to exit...

seems to be no difference on my system.

2.8ghz opteron 144, 250gb maxline3 HD, xp sp2
Title: Re: File Compare
Post by: paranoidx on May 27, 2006, 05:45:11 AM
filecmp4, P4 2.8 HT/160 HDD seagate/512 RAM


                            1st run            2nd run            3rd run
correct return vals:        1 0 0 0 -1 -1 1    1 0 0 0 -1 -1 1    1 0 0 0 -1 -1 1

filecmp_heapbuffers:        1 0 0 0 -1 -1 1    1 0 0 0 -1 -1 1    1 0 0 0 -1 -1 1
filecmp_filemapping:        1 0 0 0 -1 -1 1    1 0 0 0 -1 -1 1    1 0 0 0 -1 -1 1

filecmp_nobuffering:        x x x x  x  x 1    x x x x  x  x 1    x x x x  x  x 1

filecmp_nobuffering_async:  x x x x  x  x 1    x x x x  x  x 1    x x x x  x  x 1

filecmp_heapbuffers:        623ms              662ms              632ms
filecmp_filemapping:        307ms              306ms              279ms

                            49.3%              46.2%              44.1%

filecmp_nobuffering:        6524ms             8434ms             5928ms

filecmp_nobuffering_async:  12202ms            13022ms            9108ms


After the 1st/2nd time I ran Filecmp4, my comp struggles to revive and lags terribly for about 10secs.

3rd Run: After I switch Avast AV/Tiny personal FW OFF