I was looking through the masm32lib and macros to learn how best to read a file. In the read_disk_file proc, it does-
mov fl, FUNC(GetFileSize,hFile,NULL) ; get the file length
mov hMem, alloc(fl) ; allocate a buffer of that size
invoke ReadFile,hFile,hMem,fl,ADDR bRead,NULL ; read file into buffer
invoke CloseHandle,hFile ; close the handle
The alloc macro does a - invoke GlobalAlloc,GMEM_FIXED or GMEM_ZEROINIT,bytecount
Is it the case that GlobalAlloc virtually never fails so we don't have to test the return?
GlobalAlloc virtually never fails for small sizes, but it's a bad coding practice not to check the return value. Besides, if the main heap is corrupted for some reason (a buffer overrun perhaps) then GlobalAlloc will fail no matter how much memory you request. Since on failure the return value is zero, the app using this macro will crash due to a NULL pointer reference.
I think adding an .IF is definitely worth the few extra bytes it will take. :)
global or virtual always will fail because try to read 4gb file into memory, if you have just 256mb and 128mb virtual memory, if you set virtual ram to static, if not windows will expand swap file, but what if you run out of space on hdd? also fail, better createfilemaping, and you will get address where you have whole file, and windows will handle read and paging part of file where pointer is actually
Does anyone have a feel for the relative performance of CreateFileMapping/MapViewOfFile vs. GlobalAlloc/ReadFile ?
Unless I screwed the test up (or totally don't understand what I'm trying to do :toothy) CreateFile-CreateFileMapping-MapViewOfFile is roughly 20 times faster than CreatFile-GlobalAlloc-ReadFile.
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
include \masm32\include\masm32rt.inc
.686
include timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
hMem dd 0
hFile dd 0
hFMO dd 0
lpView dd 0
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
LOOP_COUNT EQU 1000
counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
mov hMem, InputFile("\masm32\include\windows.inc")
free hMem
counter_end
print ustr$(eax)
print chr$(" cycles",13,10)
counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
invoke CreateFile,chr$("\masm32\include\windows.inc"),GENERIC_READ,
NULL,NULL,OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL,NULL
mov hFile, eax
;print uhex$(eax),13,10
invoke CreateFileMapping,hFile,NULL,PAGE_READONLY,0,0,NULL
mov hFMO, eax
;print uhex$(eax),13,10
invoke MapViewOfFile,hFMO,FILE_MAP_READ,0,0,0
mov lpView, eax
;print uhex$(eax),13,10
invoke UnmapViewOfFile,lpView
invoke CloseHandle,hFMO
fclose hFile
counter_end
print ustr$(eax)
print chr$(" cycles",13,10)
mov eax, input(13,10,"Press enter to exit...")
exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
Timings on a P3:
6137155 cycles
308841 cycles
[attachment deleted by admin]
Quote6133502 cycles
92443 cycles
Tested on my P4 2.66 GHz
I must admit I have never had a problem with GlobalAlloc() and I have used it to allocate over a gigabyte many times. Some time ago I played with allocating memory using a 1 gig VOB file and it happily chomped that much memory every time. The last time I looked and this was some time ago, File Mapping in terms of memory allocation speed was reasonably slow. Its main virtue is its global scope for passing data between applications with different memory address space.
GlobalAlloc() can easily be tested for a return value but its the thin edge of the wedge of appending a never ending set of tests for every possible condition and this eventually make big slow code. I don't hold the view that you can successfully idiot proof anything as the idiot level skill will always exceed your efforts. The reason why I supply this stuff as source code is so that anyone can modify a copy of it to suit their needs.
for doing big file copy global alloc and read whole file is better, filemapping is always faster due it doesnt read whole file into memory just part you need so if you read from begining some indexes and then jump somewhere, and use for example 10% of file and update, what is faster 10% read or 100% read? another thing windows creates pages, virtual addresses and you can alloc just 2gb due from 7fffffff there is kernel already with filemapping you can read 4gb file and im not sure even more if you fill additional structures, now that only resolves x64 where you have 64bit address space 16 petabytes or more,on 32bit sys with virtual memory max address space is 16 terabytes
also read performance depends on many things? is you drive defragmented, there is enough ram, so it doesnt need to swap into pagefile,how much ram you gave to filecache, for example xp sucks at that, it can eat whole ram for caching, once long time ago i read mpg file 1.4gb into virtualdub and my system was choking due windows eat whole my ram and was using pagefile to cache rest, just because vdub indexed mpg. on w9x you can add MinFileCache and Max into system.ini to set file cache min & max, since 2k it doesnt exist, ony solution is cacheboost, little program that runs service in background that takes care of cache, doesnt allow windows to go with caching of files beyond max you set, also you can set interval when flush data to write, on 9x its was 2s after no read write operation in cacheboost its 1 to 25s
AMD 3000+, XP sp2, 1G ram, 3G pagefile
41467 cycles
47149 cycles
???
later....
Ok, when I run the program on the drive that actually has windows.inc-
5568281 cycles
68626 cycles
duh.....
8112544 cycles
61510 cycles
My System:
CPU-Z version 1.30
------------------------------------------------------
Specification AMD Athlon(tm) XP 2000+
Clock Speed 1667.2 MHz
Memory Size 256 MBytes
Memory Frequency 133.4 MHz (1:1)
Software
------------------------------------------------------
Windows Version Microsoft Windows XP Professional Service Pack 2 (Build 2600)
DirectX Version 9.0c
that test isnt reliable due it doesnt access data so windows doesnt read anything, same is today with memory allocation, we can do mem alloc but windows just allocates memory when we first time access data, a line after mapping with mov eax,[eax] should help a little or alloc memory with size of windows.inc and do rep movsd
Quote from: Human on December 17, 2005, 01:32:41 PM
that test isnt reliable due it doesnt access data so windows doesnt read anything, same is today with memory allocation, we can do mem alloc but windows just allocates memory when we first time access data, a line after mapping with mov eax,[eax] should help a little or alloc memory with size of windows.inc and do rep movsd
Thanks, that explains the extreme speed difference. But when I copy the entire mapped file to a local buffer, CreateFile-CreateFileMapping-MapViewOfFile is still significantly faster (for a file of this size), even on my 500MHz P3.
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
include \masm32\include\masm32rt.inc
.686
include timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
hMem dd 0
hFile dd 0
hFMO dd 0
lpView dd 0
fSize dd 0
hBuffer dd 0
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
fn filesize,"\masm32\include\windows.inc"
mov fSize, eax
mov hBuffer, alloc(fSize)
print ustr$(fSize)," bytes",13,10
LOOP_COUNT EQU 1000
counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
mov hMem, InputFile("\masm32\include\windows.inc")
free hMem
counter_end
print ustr$(eax)
print chr$(" cycles",13,10)
counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
invoke CreateFile,chr$("\masm32\include\windows.inc"),GENERIC_READ,
NULL,NULL,OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL,NULL
mov hFile, eax
invoke CreateFileMapping,hFile,NULL,PAGE_READONLY,0,0,NULL
mov hFMO, eax
invoke MapViewOfFile,hFMO,FILE_MAP_READ,0,0,0
mov lpView, eax
mov ecx, fSize
shr ecx, 2
@@:
mov edx, [eax]
add eax, 4
sub ecx, 1
jnz @B
;invoke MemCopy, lpView, hBuffer, fSize
invoke UnmapViewOfFile,lpView
invoke CloseHandle,hFMO
fclose hFile
counter_end
print ustr$(eax)
print chr$(" cycles",13,10)
free hBuffer
mov eax, input(13,10,"Press enter to exit...")
exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
1140616 bytes
6182922 cycles
3096312 cycles
[attachment deleted by admin]
Hi Michael-
I still don't get this filemapping stuff. How did referencing hMem access the file gotten from doing the MapViewOfFile? And wouldn't it be better to just write a small loop in each test that just loaded each byte or dword into eax rather than writing it to another buffer to test the access time?
QuoteHow did referencing hMem access the file gotten from doing the MapViewOfFile? And wouldn't it be better to just write a small loop in each test that just loaded each byte or dword into eax rather than writing it to another buffer to test the access time?
Duh...It didn't. Thanks for pointing that out. I apparently don't get it either. Correcting the code, the advantage was less, but CreateFile-CreateFileMapping-MapViewOfFile was still significantly faster.
1140616 bytes
6163511 cycles
5411255 cycles
I was not attempting to check the access time, I was trying to force Windows to actually read the file into the allocated memory, instead of just allocating the memory and doing the setup. I used MemCopy because it was convenient. It didn't seem likely to me that a dummy access would be significantly faster, given that MemCopy uses rep movsd, but I was apparently wrong on this too. Substituting a dummy access put the cycle count for CreateFile-CreateFileMapping-MapViewOfFile at less than half of that for CreatFile-GlobalAlloc-ReadFile. I have updated my second post.
If I do as Human seems to be suggesting, and just do a single access of the memory with a mov eax,[eax], the cycle counts return to ~20:1
1140616 bytes
6147532 cycles
317582 cycles
Which makes me doubt that this single access is sufficient to cause Windows to read the entire file into memory. The only way I can think of to verify this is to compare the access times for the two buffers.
When I try your new one, I get-
1136561 bytes
5747439 cycles
3959038 cycles
map is faster, even when it accesses every byte in a loop. If I also add the access loop to the first one, which is only fair, I get-
1136561 bytes
8367308 cycles
4008251 cycles
So it's more than twice as fast. Ugly though it is, it bears looking into ....
Trying smaller files, it seems that under about 20K, the old method is faster, above 20K, the mapping method is faster.
well i think one simple explenation will help, for memorymapped file windows doesnt read whole file into memory, just swaps in and out 4kb pages, so in that mode, it better fits l1,l2 cache, also on second run same test will be faster because windows already has that file in file cache. also for better test results we should copy whole mapped memory somewhere, due maybe writeback cache technology we read memory from cache but its in cache and not in memory yet because l1,l2 cache have not wrote that into memory where we have mapped file, but its just what i assume
GlobalAlloc and VirtualAlloc shouldn't experience much of a difference for big alloc sizes, since the Windows heap implementation itself calls VirtualAlloc in such cases. I think the limit for determining when to call VirtualAlloc and when to use the heap is around 20k.
One of our members a while ago did some benchmarking on a range of memory allocation techniques and different strategies seem to have different advantages. HeapAlloc() was the fastest on repeated small allocations where the old GlobalAlloc() with fixed memory was very fast on big allocations., OLE string memory has got slower and VirtualAlloc seems to be somewhere in the middle. There is of course a very wide range of overlap with most of these methods so it tends to be what is the most convenient to use that is the most use.
Quote from: QvasiModo on December 19, 2005, 04:02:03 PM
GlobalAlloc and VirtualAlloc shouldn't experience much of a difference for big alloc sizes
I assume you're responding to Human, since nobody else has mentioned VirtualAlloc.
By paging, he was not suggesting that the entire file is copied into virtual memory (as this is what would happen with the GlobalAlloc method). Instead, only 4kb of the file is in memory at any one time. If another part is wanted, then the current 4kb is disposed of and the new 4kb is read. This should be much quicker for random-starting-point sequential access because it will only read/copy 8kb.
Copying the entire file into memory (and then possibly back to the disk :eek) is the alternative and for big files this will obviously take time.
958869 bytes
10571368 cycles
12050214 cycles
does this mean I need a new computer?