
MASM32 SDK Description, downloads and other helpful links New Forum Link
masmforum WebSite


Started by Jimg, December 16, 2005, 03:15:52 PM

Previous topic - Next topic


I was looking through the masm32lib and macros to learn how best to read a file.  In the read_disk_file proc, it does-

    mov fl, FUNC(GetFileSize,hFile,NULL)            ; get the file length
    mov hMem, alloc(fl)                             ; allocate a buffer of that size
    invoke ReadFile,hFile,hMem,fl,ADDR bRead,NULL   ; read file into buffer
    invoke CloseHandle,hFile                        ; close the handle

The alloc macro does a -  invoke GlobalAlloc,GMEM_FIXED or GMEM_ZEROINIT,bytecount

Is it the case that GlobalAlloc virtually never fails so we don't have to test the return?


GlobalAlloc virtually never fails for small sizes, but it's a bad coding practice not to check the return value. Besides, if the main heap is corrupted for some reason (a buffer overrun perhaps) then GlobalAlloc will fail no matter how much memory you request. Since on failure the return value is zero, the app using this macro will crash due to a NULL pointer reference.

I think adding an .IF is definitely worth the few extra bytes it will take. :)


global or virtual always will fail because try to read 4gb file into memory, if you have just 256mb and 128mb virtual memory, if you set virtual ram to static, if not windows will expand swap file, but what if you run out of space on hdd? also fail, better createfilemaping, and you will get address where you have whole file, and windows will handle read and paging part of file where pointer is actually


Does anyone have a feel for the relative performance of CreateFileMapping/MapViewOfFile vs. GlobalAlloc/ReadFile ?


Unless I screwed the test up (or totally don't understand what I'm trying to do :toothy) CreateFile-CreateFileMapping-MapViewOfFile is roughly 20 times faster than CreatFile-GlobalAlloc-ReadFile.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\
    include timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
        hMem    dd 0
        hFile   dd 0
        hFMO    dd 0
        lpView  dd 0
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
      mov   hMem, InputFile("\masm32\include\")
      free  hMem
    print ustr$(eax)
    print chr$(" cycles",13,10)

      invoke CreateFile,chr$("\masm32\include\"),GENERIC_READ,
      mov   hFile, eax
      ;print uhex$(eax),13,10
      invoke CreateFileMapping,hFile,NULL,PAGE_READONLY,0,0,NULL
      mov   hFMO, eax
      ;print uhex$(eax),13,10
      invoke MapViewOfFile,hFMO,FILE_MAP_READ,0,0,0
      mov   lpView, eax
      ;print uhex$(eax),13,10
      invoke UnmapViewOfFile,lpView
      invoke CloseHandle,hFMO
      fclose hFile
    print ustr$(eax)
    print chr$(" cycles",13,10)

    mov   eax, input(13,10,"Press enter to exit...")
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start

Timings on a P3:

6137155 cycles
308841 cycles

[attachment deleted by admin]
eschew obfuscation


Quote6133502 cycles
92443 cycles

Tested on my P4 2.66 GHz


I must admit I have never had a problem with GlobalAlloc() and I have used it to allocate over a gigabyte many times. Some time ago I played with allocating memory using a 1 gig VOB file and it happily chomped that much memory every time. The last time I looked and this was some time ago, File Mapping in terms of memory allocation speed was reasonably slow. Its main virtue is its global scope for passing data between applications with different memory address space.

GlobalAlloc() can easily be tested for a return value but its the thin edge of the wedge of appending a never ending set of tests for every possible condition and this eventually make big slow code. I don't hold the view that you can successfully idiot proof anything as the idiot level skill will always exceed your efforts. The reason why I supply this stuff as source code is so that anyone can modify a copy of it to suit their needs.
Download site for MASM32      New MASM Forum


for doing big file copy global alloc and read whole file is better, filemapping is always faster due it doesnt read whole file into memory just part you need so if you read from begining some indexes and then jump somewhere, and use for example 10% of file and update, what is faster 10% read or 100% read? another thing windows creates pages, virtual addresses and you can alloc just 2gb due from 7fffffff there is kernel already with filemapping you can read 4gb file and im not sure even more if you fill additional structures, now that only resolves x64 where you have 64bit address space 16 petabytes or more,on 32bit sys with virtual memory max address space is 16 terabytes

also read performance depends on many things? is you drive defragmented, there is enough ram, so it doesnt need to swap into pagefile,how much ram you gave to filecache, for example xp sucks at that, it can eat whole ram for caching, once long time ago i read mpg file 1.4gb into virtualdub and my system was choking due windows eat whole my ram and was using pagefile to cache rest, just because vdub indexed mpg. on w9x you can add MinFileCache and Max into system.ini to set file cache min & max, since 2k it doesnt exist, ony solution is cacheboost, little program that runs service in background that takes care of cache, doesnt allow windows to go with caching of files beyond max you set, also you can set interval when flush data to write, on 9x its was 2s after no read write operation in cacheboost its 1 to 25s


AMD 3000+, XP sp2,   1G ram,  3G pagefile

41467 cycles
47149 cycles

Ok, when I run the program on the drive that actually has

5568281 cycles
68626 cycles



8112544 cycles
61510 cycles

My System:
CPU-Z version 1.30
Specification       AMD Athlon(tm) XP 2000+
Clock Speed         1667.2 MHz

Memory Size         256 MBytes
Memory Frequency    133.4 MHz (1:1)
Windows Version     Microsoft Windows XP Professional  Service Pack 2 (Build 2600)
DirectX Version     9.0c


that test isnt reliable due it doesnt access data so windows doesnt read anything, same is today with memory allocation, we can do mem alloc but windows just allocates memory when we first time access data, a line after mapping with mov eax,[eax] should help a little or alloc memory with size of and do rep movsd


Quote from: Human on December 17, 2005, 01:32:41 PM
that test isnt reliable due it doesnt access data so windows doesnt read anything, same is today with memory allocation, we can do mem alloc but windows just allocates memory when we first time access data, a line after mapping with mov eax,[eax] should help a little or alloc memory with size of and do rep movsd

Thanks, that explains the extreme speed difference. But when I copy the entire mapped file to a local buffer, CreateFile-CreateFileMapping-MapViewOfFile is still significantly faster (for a file of this size), even on my 500MHz P3.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\
    include timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
        hMem    dd 0
        hFile   dd 0
        hFMO    dd 0
        lpView  dd 0
        fSize   dd 0
        hBuffer dd 0
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    fn filesize,"\masm32\include\"
    mov   fSize, eax
    mov   hBuffer, alloc(fSize)
    print ustr$(fSize)," bytes",13,10
      mov   hMem, InputFile("\masm32\include\")
      free  hMem
    print ustr$(eax)
    print chr$(" cycles",13,10)

      invoke CreateFile,chr$("\masm32\include\"),GENERIC_READ,
      mov   hFile, eax
      invoke CreateFileMapping,hFile,NULL,PAGE_READONLY,0,0,NULL
      mov   hFMO, eax
      invoke MapViewOfFile,hFMO,FILE_MAP_READ,0,0,0
      mov   lpView, eax

      mov   ecx, fSize
      shr   ecx, 2
      mov   edx, [eax]
      add   eax, 4
      sub   ecx, 1
      jnz   @B

      ;invoke MemCopy, lpView, hBuffer, fSize

      invoke UnmapViewOfFile,lpView
      invoke CloseHandle,hFMO
      fclose hFile
    print ustr$(eax)
    print chr$(" cycles",13,10)

    free hBuffer

    mov   eax, input(13,10,"Press enter to exit...")
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start

1140616 bytes
6182922 cycles
3096312 cycles

[attachment deleted by admin]
eschew obfuscation


Hi Michael-

I still don't get this filemapping stuff.  How did referencing hMem access the file gotten from doing the MapViewOfFile?  And wouldn't it be better to just write a small loop in each test that just loaded each byte or dword into eax rather than writing it to another buffer to test the access time?


QuoteHow did referencing hMem access the file gotten from doing the MapViewOfFile? And wouldn't it be better to just write a small loop in each test that just loaded each byte or dword into eax rather than writing it to another buffer to test the access time?
Duh...It didn't. Thanks for pointing that out. I apparently don't get it either. Correcting the code, the advantage was less, but CreateFile-CreateFileMapping-MapViewOfFile was still significantly faster.

1140616 bytes
6163511 cycles
5411255 cycles

I was not attempting to check the access time, I was trying to force Windows to actually read the file into the allocated memory, instead of just allocating the memory and doing the setup. I used MemCopy because it was convenient. It didn't seem likely to me that a dummy access would be significantly faster, given that MemCopy uses rep movsd, but I was apparently wrong on this too. Substituting a dummy access put the cycle count for CreateFile-CreateFileMapping-MapViewOfFile at less than half of that for CreatFile-GlobalAlloc-ReadFile. I have updated my second post.

If I do as Human seems to be suggesting, and just do a single access of the memory with a mov eax,[eax], the cycle counts return to ~20:1

1140616 bytes
6147532 cycles
317582 cycles

Which makes me doubt that this single access is sufficient to cause Windows to read the entire file into memory. The only way I can think of to verify this is to compare the access times for the two buffers.

eschew obfuscation


When I try your new one, I get-

1136561 bytes
5747439 cycles
3959038 cycles

map is faster, even when it accesses every byte in a loop.  If I also add the access loop to the first one, which is only fair, I get-

1136561 bytes
8367308 cycles
4008251 cycles

So it's more than twice as fast.  Ugly though it is, it bears looking into ....

Trying smaller files, it seems that under about 20K, the old method is faster, above 20K, the mapping method is faster.