News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Re: huge table access

Started by TNick, December 13, 2009, 12:04:16 AM

Previous topic - Next topic

sinsi

>it is good to use what you need, when you need it, then release it for other processes to use
heh, you can VirtualAlloc memory and reserve it - it's not actually mapped into your address space, just backed up by the pagefile (i.e. non-existent).

I don't see why windows doesn't do what dos did - allocate all memory.
Light travels faster than sound, that's why some people seem bright until you hear them.

jj2007

Quote from: sinsi on December 13, 2009, 11:58:26 AM
The trouble with an alloc/touch/free test regime is things that happen over and over tend to get cached and reused.
That is valid within the size of the cache.
22460992        cycles for HeapAlloc            01000000 bytes
22609943        cycles for VirtualAlloc         01000000 bytes
22480093        cycles for GlobalAlloc          01000000 bytes


1000000h = 16 MB

redskull

Quote from: hutch-- on December 13, 2009, 06:29:32 AM

The clock says it all, one function call to retrieve the pointer and one function call to deallocate it.

I still don't think I understand; are the heap functions not also a single call to allocate (HeapAlloc) and one to free (HeapFree), during which the pointer you receive from HeapAlloc a direct pointer to a fixed memory area of that specific size?  I was under the impression that moveable memory itself (reallocation notwithstanding) was fundamentally depreciated, if not the GlobalAlloc function.  It was my understanding that on NT, memory, once allocated, was fixed, even if you used GlobalAlloc with the moveable option.  Is that not so?

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

hutch--

r,

You have hit the distinction on the head, old Win3 movable memory is finished (fortunately) and the only viable allocation techniques are those provided by ntoskrnl.exe and the layering chain above it. This is why once memory is allocated by whatever package, there is no realistic speed difference between them as they all come from the same place.

There are a couple of techniques specific to rapid allocation and deallocation of small amounts of memory, OLE string is one, HeapAlloc() with the more recent low fragmentation heap is another but they both pay a price in speed terms for this level of memory management. Serious memory fragmentation problems are usually addressed by custom code, a web server that must stay online for long periods is one task that cannot just rely on the OS, they tend to use a large count circular buffer and keep track of each one as its no longer used by one connection so it can be set up for the next.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

redskull

Then in what way is GlobalAlloc() "simpler" to use, if both calls use fixed memory, direct pointers, and the same underlying functions?  Is there something I'm missing?  It seems the only real difference would be that GlobalAlloc() essentially calls GetProcessHeap() for you, at the expense of some additional, yet mostly negligable, overhead.  That hardly seems like a good enough reason to stay with an (ostensibly) depreciated API call; especially considering it's only a single argument which stays constant through the life of the process.  Also, if that's the case, how do you back your claim of GlobalAlloc being fastest?  There's quite a divide, especially on this fourm (and subforum), between "nothing is faster" and "there is no realistic speed difference".

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

jj2007

Timings once more, for alloc/touch/free. From what I see, there are some important observations:
- HeapAlloc is a lot faster for small allocations (we all knew that)
- VirtualAlloc suffers from overhead for the small ones
- up to ca. 1M, i.e. the cache size, roughly 600 cycles per kByte are "normal"
- above that, 1320 cycles per kByte become standard, and there is virtually no difference between the three candidates.

There is a REPEAT 8 before the counter_begin. You may use REP 9 to get 256 MB (ok), and REP 10 for 1 Giga. The latter is not really advisable unless you have a lot of RAM. On my notebook, 1 GB Ram, it started swapping, got really slow, and said byebye in the GlobalAlloc loop. Afterwards, the whole system remained very slow, and I had to reboot.

Now one last question: Why do these bloody function need over a cycle per byte if they just earmark memory as usable? There is no zeroinit code involved as far as I can see...

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
104     cycles per kByte for HeapAlloc       00001000h bytes (4 kB)
1501    cycles per kByte for VirtualAlloc    00001000h bytes (4 kB)
154     cycles per kByte for GlobalAlloc     00001000h bytes (4 kB)

28      cycles per kByte for HeapAlloc       00004000h bytes (16 kB)
819     cycles per kByte for VirtualAlloc    00004000h bytes (16 kB)
38      cycles per kByte for GlobalAlloc     00004000h bytes (16 kB)

615     cycles per kByte for HeapAlloc       00010000h bytes (64 kB)
621     cycles per kByte for VirtualAlloc    00010000h bytes (64 kB)
191     cycles per kByte for GlobalAlloc     00010000h bytes (64 kB)

8       cycles per kByte for HeapAlloc       00040000h bytes (256 kB)
568     cycles per kByte for VirtualAlloc    00040000h bytes (256 kB)
7       cycles per kByte for GlobalAlloc     00040000h bytes (256 kB)

554     cycles per kByte for HeapAlloc       00100000h bytes (1 MB)
694     cycles per kByte for VirtualAlloc    00100000h bytes (1 MB)
1311    cycles per kByte for GlobalAlloc     00100000h bytes (1 MB)

1330    cycles per kByte for HeapAlloc       00400000h bytes (4 MB)
1324    cycles per kByte for VirtualAlloc    00400000h bytes (4 MB)
1322    cycles per kByte for GlobalAlloc     00400000h bytes (4 MB)

1312    cycles per kByte for HeapAlloc       01000000h bytes (16 MB)
1284    cycles per kByte for VirtualAlloc    01000000h bytes (16 MB)
1326    cycles per kByte for GlobalAlloc     01000000h bytes (16 MB)

1321    cycles per kByte for HeapAlloc       04000000h bytes (64 MB)
1330    cycles per kByte for VirtualAlloc    04000000h bytes (64 MB)
1328    cycles per kByte for GlobalAlloc     04000000h bytes (64 MB)

redskull

IIRC, *all* pages are initialized to zero as part of the security compliance; that way, another process can't accidently get a hold of another processess old data.  "Free pages" are zeroed out to be "zeroed pages" in a low-priority background thread that only runs while no other threads are active, but if there are no zeroed pages availble it has to do it right then and there.  Doing this many allocations would almost certainly deplete any zeroed pages existing before the test was run, which means later allocations would be stuck waiting; reversing the order of the tests could test that theory.  Also, allocating memory is an expensive operation anyway; for a 64MB allocation, that's 16,000 different page table entries that have to be constructed, which ironically requires 16 *more* pages allocated to hold them, and PDE's created to point to them, etc, etc.-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

jj2007

Quote from: redskull on December 13, 2009, 10:36:00 PM
IIRC, *all* pages are initialized to zero as part of the security compliance
A dump for the pointer returned by HeapAlloc starts with 0D F0 AD BA, reversed BAAD F00D.
Same for GlobalAlloc, while VirtualAlloc yields zeroes only.
If you add the HEAP_ZERO_MEMORY flag, allocation slows down considerably. So zeroing is slower than writing bad food???

redskull

I remember seeing that BAADFOOD code mentioned elsewhere before; I guess it's an internal check that the HeapManager does for itself.  It seems to only be on pages initialized by the heap manager at the start of the process (preexisting before the call)  When I did a HeapAlloc() for 4096, the page was "tacked on" to the end of the heap, and was BAADFOOD; when I did one for 1024*1024*1024, I got back zeroed pages way up high in memory, and they were all zeroed.  Either way, I'm guessing they come back from the memory manager as zeroed.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

hutch--

If anyone is serious about test memory ALLOCATION times, you do it by reallocating the memory size larger each time so that the test cannot keep using the same memory hole.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

ecube

None of you guys have mentioned MemoryMapped files, which is the recommended method to messing file large files in memory instead of VirtualAlloc the whole thing, or reading it in smaller alloc's with global/heapallow, anyone test with memmapfiles? Also what about the SysAlloc function and the standard C Malloc function?

jj2007

Quote from: redskull on December 13, 2009, 11:40:54 PM
I remember seeing that BAADFOOD code mentioned elsewhere before; I guess it's an internal check that the HeapManager does for itself.  It seems to only be on pages initialized by the heap manager at the start of the process (preexisting before the call)  When I did a HeapAlloc() for 4096, the page was "tacked on" to the end of the heap, and was BAADFOOD; when I did one for 1024*1024*1024, I got back zeroed pages way up high in memory, and they were all zeroed.  Either way, I'm guessing they come back from the memory manager as zeroed.

Thanks, I could not test that yet but it seems plausible that on startup, the OS
- zeroes the .data? section
- writes BAADFOOD to the entire heap
So when you allocate "ordinary" heap without the HEAP_ZERO_MEMORY flag, you get bad food; if, however, the allocation exceeds your available heap, the OS switches internally to VirtualAlloc and gives you zeroed memory.
Any experts around who could confirm that based on more or less official documentation...?

hutch--

Its not hard to tell you guys are driven by your stomach, in the days of YAW you made reference to 0xBADC0DE.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

BasilYercin


dedndave

ABAD1DEA

1DEADF15h

0ADEADF15h (lol - 4 words!)