The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: Jimg on December 16, 2005, 03:15:52 PM

Title: GlobalAlloc
Post by: Jimg on December 16, 2005, 03:15:52 PM
I was looking through the masm32lib and macros to learn how best to read a file.  In the read_disk_file proc, it does-

    mov fl, FUNC(GetFileSize,hFile,NULL)            ; get the file length
    mov hMem, alloc(fl)                             ; allocate a buffer of that size
    invoke ReadFile,hFile,hMem,fl,ADDR bRead,NULL   ; read file into buffer
    invoke CloseHandle,hFile                        ; close the handle

The alloc macro does a -  invoke GlobalAlloc,GMEM_FIXED or GMEM_ZEROINIT,bytecount

Is it the case that GlobalAlloc virtually never fails so we don't have to test the return?
Title: Re: GlobalAlloc
Post by: QvasiModo on December 16, 2005, 04:06:18 PM
GlobalAlloc virtually never fails for small sizes, but it's a bad coding practice not to check the return value. Besides, if the main heap is corrupted for some reason (a buffer overrun perhaps) then GlobalAlloc will fail no matter how much memory you request. Since on failure the return value is zero, the app using this macro will crash due to a NULL pointer reference.

I think adding an .IF is definitely worth the few extra bytes it will take. :)
Title: Re: GlobalAlloc
Post by: Human on December 16, 2005, 04:36:59 PM
global or virtual always will fail because try to read 4gb file into memory, if you have just 256mb and 128mb virtual memory, if you set virtual ram to static, if not windows will expand swap file, but what if you run out of space on hdd? also fail, better createfilemaping, and you will get address where you have whole file, and windows will handle read and paging part of file where pointer is actually
Title: Re: GlobalAlloc
Post by: Jimg on December 16, 2005, 05:49:26 PM
Does anyone have a feel for the relative performance of CreateFileMapping/MapViewOfFile vs. GlobalAlloc/ReadFile ?
Title: Re: GlobalAlloc
Post by: MichaelW on December 16, 2005, 08:31:44 PM
Unless I screwed the test up (or totally don't understand what I'm trying to do :toothy) CreateFile-CreateFileMapping-MapViewOfFile is roughly 20 times faster than CreatFile-GlobalAlloc-ReadFile.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .686
    include timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
        hMem    dd 0
        hFile   dd 0
        hFMO    dd 0
        lpView  dd 0
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    LOOP_COUNT EQU 1000
   
    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      mov   hMem, InputFile("\masm32\include\windows.inc")
      free  hMem
    counter_end
    print ustr$(eax)
    print chr$(" cycles",13,10)

    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      invoke CreateFile,chr$("\masm32\include\windows.inc"),GENERIC_READ,
        NULL,NULL,OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL,NULL
      mov   hFile, eax
      ;print uhex$(eax),13,10
      invoke CreateFileMapping,hFile,NULL,PAGE_READONLY,0,0,NULL
      mov   hFMO, eax
      ;print uhex$(eax),13,10
      invoke MapViewOfFile,hFMO,FILE_MAP_READ,0,0,0
      mov   lpView, eax
      ;print uhex$(eax),13,10
      invoke UnmapViewOfFile,lpView
      invoke CloseHandle,hFMO
      fclose hFile
    counter_end
    print ustr$(eax)
    print chr$(" cycles",13,10)

    mov   eax, input(13,10,"Press enter to exit...")
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start

Timings on a P3:

6137155 cycles
308841 cycles



[attachment deleted by admin]
Title: Re: GlobalAlloc
Post by: Vortex on December 16, 2005, 09:14:57 PM
Quote6133502 cycles
92443 cycles

Tested on my P4 2.66 GHz
Title: Re: GlobalAlloc
Post by: hutch-- on December 16, 2005, 10:55:35 PM
I must admit I have never had a problem with GlobalAlloc() and I have used it to allocate over a gigabyte many times. Some time ago I played with allocating memory using a 1 gig VOB file and it happily chomped that much memory every time. The last time I looked and this was some time ago, File Mapping in terms of memory allocation speed was reasonably slow. Its main virtue is its global scope for passing data between applications with different memory address space.

GlobalAlloc() can easily be tested for a return value but its the thin edge of the wedge of appending a never ending set of tests for every possible condition and this eventually make big slow code. I don't hold the view that you can successfully idiot proof anything as the idiot level skill will always exceed your efforts. The reason why I supply this stuff as source code is so that anyone can modify a copy of it to suit their needs.
Title: Re: GlobalAlloc
Post by: Human on December 17, 2005, 01:53:17 AM
for doing big file copy global alloc and read whole file is better, filemapping is always faster due it doesnt read whole file into memory just part you need so if you read from begining some indexes and then jump somewhere, and use for example 10% of file and update, what is faster 10% read or 100% read? another thing windows creates pages, virtual addresses and you can alloc just 2gb due from 7fffffff there is kernel already with filemapping you can read 4gb file and im not sure even more if you fill additional structures, now that only resolves x64 where you have 64bit address space 16 petabytes or more,on 32bit sys with virtual memory max address space is 16 terabytes

also read performance depends on many things? is you drive defragmented, there is enough ram, so it doesnt need to swap into pagefile,how much ram you gave to filecache, for example xp sucks at that, it can eat whole ram for caching, once long time ago i read mpg file 1.4gb into virtualdub and my system was choking due windows eat whole my ram and was using pagefile to cache rest, just because vdub indexed mpg. on w9x you can add MinFileCache and Max into system.ini to set file cache min & max, since 2k it doesnt exist, ony solution is cacheboost, little program that runs service in background that takes care of cache, doesnt allow windows to go with caching of files beyond max you set, also you can set interval when flush data to write, on 9x its was 2s after no read write operation in cacheboost its 1 to 25s
Title: Re: GlobalAlloc
Post by: Jimg on December 17, 2005, 01:57:55 AM
AMD 3000+, XP sp2,   1G ram,  3G pagefile

41467 cycles
47149 cycles

???
later....
Ok, when I run the program on the drive that actually has windows.inc-

5568281 cycles
68626 cycles

duh.....
Title: Re: GlobalAlloc
Post by: Kestrel on December 17, 2005, 12:58:09 PM
8112544 cycles
61510 cycles

My System:
CPU-Z version 1.30
------------------------------------------------------
Specification       AMD Athlon(tm) XP 2000+
Clock Speed         1667.2 MHz

Memory Size         256 MBytes
Memory Frequency    133.4 MHz (1:1)
Software
------------------------------------------------------
Windows Version     Microsoft Windows XP Professional  Service Pack 2 (Build 2600)
DirectX Version     9.0c
Title: Re: GlobalAlloc
Post by: Human on December 17, 2005, 01:32:41 PM
that test isnt reliable due it doesnt access data so windows doesnt read anything, same is today with memory allocation, we can do mem alloc but windows just allocates memory when we first time access data, a line after mapping with mov eax,[eax] should help a little or alloc memory with size of windows.inc and do rep movsd
Title: Re: GlobalAlloc
Post by: MichaelW on December 17, 2005, 02:11:45 PM
Quote from: Human on December 17, 2005, 01:32:41 PM
that test isnt reliable due it doesnt access data so windows doesnt read anything, same is today with memory allocation, we can do mem alloc but windows just allocates memory when we first time access data, a line after mapping with mov eax,[eax] should help a little or alloc memory with size of windows.inc and do rep movsd

Thanks, that explains the extreme speed difference. But when I copy the entire mapped file to a local buffer, CreateFile-CreateFileMapping-MapViewOfFile is still significantly faster (for a file of this size), even on my 500MHz P3.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .686
    include timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
        hMem    dd 0
        hFile   dd 0
        hFMO    dd 0
        lpView  dd 0
        fSize   dd 0
        hBuffer dd 0
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    fn filesize,"\masm32\include\windows.inc"
    mov   fSize, eax
    mov   hBuffer, alloc(fSize)
    print ustr$(fSize)," bytes",13,10
   
    LOOP_COUNT EQU 1000
   
    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      mov   hMem, InputFile("\masm32\include\windows.inc")
      free  hMem
    counter_end
    print ustr$(eax)
    print chr$(" cycles",13,10)

    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      invoke CreateFile,chr$("\masm32\include\windows.inc"),GENERIC_READ,
        NULL,NULL,OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL,NULL
      mov   hFile, eax
      invoke CreateFileMapping,hFile,NULL,PAGE_READONLY,0,0,NULL
      mov   hFMO, eax
      invoke MapViewOfFile,hFMO,FILE_MAP_READ,0,0,0
      mov   lpView, eax

      mov   ecx, fSize
      shr   ecx, 2
    @@:
      mov   edx, [eax]
      add   eax, 4
      sub   ecx, 1
      jnz   @B

      ;invoke MemCopy, lpView, hBuffer, fSize

      invoke UnmapViewOfFile,lpView
      invoke CloseHandle,hFMO
      fclose hFile
    counter_end
    print ustr$(eax)
    print chr$(" cycles",13,10)

    free hBuffer

    mov   eax, input(13,10,"Press enter to exit...")
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start


1140616 bytes
6182922 cycles
3096312 cycles




[attachment deleted by admin]
Title: Re: GlobalAlloc
Post by: Jimg on December 17, 2005, 08:48:40 PM
Hi Michael-

I still don't get this filemapping stuff.  How did referencing hMem access the file gotten from doing the MapViewOfFile?  And wouldn't it be better to just write a small loop in each test that just loaded each byte or dword into eax rather than writing it to another buffer to test the access time?
Title: Re: GlobalAlloc
Post by: MichaelW on December 18, 2005, 01:52:09 AM
QuoteHow did referencing hMem access the file gotten from doing the MapViewOfFile? And wouldn't it be better to just write a small loop in each test that just loaded each byte or dword into eax rather than writing it to another buffer to test the access time?
Duh...It didn't. Thanks for pointing that out. I apparently don't get it either. Correcting the code, the advantage was less, but CreateFile-CreateFileMapping-MapViewOfFile was still significantly faster.

1140616 bytes
6163511 cycles
5411255 cycles


I was not attempting to check the access time, I was trying to force Windows to actually read the file into the allocated memory, instead of just allocating the memory and doing the setup. I used MemCopy because it was convenient. It didn't seem likely to me that a dummy access would be significantly faster, given that MemCopy uses rep movsd, but I was apparently wrong on this too. Substituting a dummy access put the cycle count for CreateFile-CreateFileMapping-MapViewOfFile at less than half of that for CreatFile-GlobalAlloc-ReadFile. I have updated my second post.

If I do as Human seems to be suggesting, and just do a single access of the memory with a mov eax,[eax], the cycle counts return to ~20:1

1140616 bytes
6147532 cycles
317582 cycles

Which makes me doubt that this single access is sufficient to cause Windows to read the entire file into memory. The only way I can think of to verify this is to compare the access times for the two buffers.

Title: Re: GlobalAlloc
Post by: Jimg on December 18, 2005, 02:48:37 AM
When I try your new one, I get-

1136561 bytes
5747439 cycles
3959038 cycles

map is faster, even when it accesses every byte in a loop.  If I also add the access loop to the first one, which is only fair, I get-

1136561 bytes
8367308 cycles
4008251 cycles

So it's more than twice as fast.  Ugly though it is, it bears looking into ....

Trying smaller files, it seems that under about 20K, the old method is faster, above 20K, the mapping method is faster.
Title: Re: GlobalAlloc
Post by: Human on December 18, 2005, 11:03:13 AM
well i think one simple explenation will help, for memorymapped file windows doesnt read whole file into memory, just swaps in and out 4kb pages, so in that mode, it better fits l1,l2 cache, also on second run same test will be faster because windows already has that file in file cache. also for better test results we should copy whole mapped memory somewhere, due maybe writeback cache technology we read memory from cache but its in cache and not in memory yet because l1,l2 cache have not wrote that into memory where we have mapped file, but its just what i assume
Title: Re: GlobalAlloc
Post by: QvasiModo on December 19, 2005, 04:02:03 PM
GlobalAlloc and VirtualAlloc shouldn't experience much of a difference for big alloc sizes, since the Windows heap implementation itself calls VirtualAlloc in such cases. I think the limit for determining when to call VirtualAlloc and when to use the heap is around 20k.
Title: Re: GlobalAlloc
Post by: hutch-- on December 19, 2005, 11:27:32 PM
One of our members a while ago did some benchmarking on a range of memory allocation techniques and different strategies seem to have different advantages. HeapAlloc() was the fastest on repeated small allocations where the old GlobalAlloc() with fixed memory was very fast on big allocations., OLE string memory has got slower and VirtualAlloc seems to be somewhere in the middle. There is of course a very wide range of overlap with most of these methods so it tends to be what is the most convenient to use that is the most use.
Title: Re: GlobalAlloc
Post by: zooba on December 19, 2005, 11:34:17 PM
Quote from: QvasiModo on December 19, 2005, 04:02:03 PM
GlobalAlloc and VirtualAlloc shouldn't experience much of a difference for big alloc sizes

I assume you're responding to Human, since nobody else has mentioned VirtualAlloc.

By paging, he was not suggesting that the entire file is copied into virtual memory (as this is what would happen with the GlobalAlloc method). Instead, only 4kb of the file is in memory at any one time. If another part is wanted, then the current 4kb is disposed of and the new 4kb is read. This should be much quicker for random-starting-point sequential access because it will only read/copy 8kb.

Copying the entire file into memory (and then possibly back to the disk :eek) is the alternative and for big files this will obviously take time.
Title: Re: GlobalAlloc
Post by: DC on December 21, 2005, 04:34:38 AM
958869 bytes
10571368 cycles
12050214 cycles
does this mean I need a new computer?