Print Page - huge table access

Title: huge table access
Post by: porphyry5 on December 08, 2009, 04:20:43 PM

I have just started to learn assembly. My program needs to make continual random access to a huge table in memory (approaching 12 Meg). Once the table is created the program does virtually nothing but access this table. On average, each input file will cause the table to be accessed more than a million times.

How do I set it up?

Its size means the Virtual functions are going to be used to create it. I need the whole thing locked into memory so I must use SetProcessWorkingSetSize before VirtualLock can be applied. But SetProcessWorkingSetSize requires a handle. VirtualAlloc does not create a handle. Passing the address returned by VirtualAlloc to SetProcessWorkingSetSize results in an error.

Do I have to set it up as a dummy file and use CreateFile to get a handle? Is there a neater way to do it?

Thank you in advance for your help.

Graham Lawrence

Title: Re: huge table access
Post by: jj2007 on December 08, 2009, 04:44:51 PM

Graham,
You need a process handle. I would check MSDN for GetCurrentProcess and OpenProcess.
Best,
Jochen

Title: Re: huge table access
Post by: dedndave on December 08, 2009, 05:06:58 PM

12 mb isn't all that large, really
i would use the heap allocation functions - a matter of preference, perhaps
it returns a handle - but the handle is the base address

Title: Re: huge table access
Post by: MichaelW on December 08, 2009, 06:18:35 PM

In my (admittedly crude and possibly bad) test I cannot detect any significant difference between memory allocated with HeapAlloc and memory allocated with VirtualAlloc, with or without setting the process working set size to an appropriate value. Random accesses take far longer than they should, presumably due to caching effects.

Title: Re: huge table access
Post by: redskull on December 08, 2009, 11:43:19 PM

Quote from: MichaelW on December 08, 2009, 06:18:35 PM
..any significant difference between memory allocated with HeapAlloc and memory allocated with VirtualAlloc...

Just for the OP's sake: there really *is* no difference. The heap manager should probably have been called the *allocation* manager, because all it does is check to see if there is enough room in a currently allocated page, and either return that address or allocate another page (or multiple pages) using VirtualAlloc. It just a convenience so that lame high-level programmers don't commit a 4K page for every 4-byte integer. For example, the first time you HeapAlloc a 10-byte space, the heap manager will allocate an entire 4K page; the second time you HeapAlloc another 10-byte space, it will eschew the allocation because there is still room in the first. After the page has been allocated, a HeapAlloced page and a VirtualAlloced page are completely the same, and subject to the same exact caching/translation/etc.

-r

PS - as far as processing the table goes, x86 has some seldom-used (and probably slow) table accessing instructions you can use, XLAT and XLAB, to make life easier

Title: Re: huge table access
Post by: dedndave on December 09, 2009, 12:16:19 AM

it seems that heap allocate is a little faster to me
although, i have not done a thorough test to verify that
(i always use it without initializing the memory to 0)

Title: Re: huge table access
Post by: porphyry5 on December 09, 2009, 12:30:36 AM

Thank you all for your help, I will go now and check out GetCurrentProcess and OpenProcess and XLAT and XLAB.

When I was reading up on the Heap vs Virtual functions, I believe it was HeapAlloc, actually calls VirtualAlloc for any area above about 0.5 Meg, so I figured I might as well use VirtualAlloc to begin with.

I may have misled you somewhat by using the term random access of a single table, its actually 160 separate tables compacted together. I make an index of the start address of each table, then use a binary search to hunt the item in the selected table, I don't know if that makes any difference.

Title: Re: huge table access
Post by: hutch-- on December 09, 2009, 12:53:35 AM

Graham,

The factor that will limit an application with a large table is what is called page thrashing, when you have a memory block that is a lot larger than the processor cache a memory read at an address outside that looaded in cache means a memory page reload and this really slows down the memory accesses when they are randomly distributed across a large address range.

If it is possible it may be worth trying to reconstruct the data into a more compact format so that you can get the size down and avoid the cost of so many cache misses.

Now RE 160 tables that you must index, putting their start addresses into a DWORD table would probably make that access speed a lot faster if you have some technique of identifying the table members.

Tell us a bit more about the size and type of data you are working on and there may be ways to get around saome of the limitations you have found.

Title: Re: huge table access
Post by: redskull on December 09, 2009, 01:03:24 AM

Quote from: dedndave on December 09, 2009, 12:16:19 AM
it seems that heap allocate is a little faster to me
although, i have not done a thorough test to verify that
(i always use it without initializing the memory to 0)

It would all depend on whether the page is already allocated; A call to VirtualAlloc will *always* require a new page be allocated, page table entries built, and VAD's constructed, which are a slow operation. Most of the time, with HeapAlloc, a page already exists, so it simply returns a pointer within that page, and nothing gets done except a little math.

Don't let Hutch scare you with his page thrashing gloom-and-doom, 12Megs is still pretty small in the grand scheme of things, and Windows does a pretty decent job of keeping track of the working set size to keep things on an even keel. :bg But, as always, smaller is usually faster. It might be more efficient to divide things into fixed-size blocks, and then use an indexing system to determine what part to load. But really, my iTunes routinely uses more than 600M without batting an eye, so I wouldn't lose much sleep.

Also, just for completeness sake, if the tables aren't changing once in memory, you are always better off to build a hash function and search them in constant time.

-r

Title: Re: huge table access
Post by: hutch-- on December 09, 2009, 02:26:11 AM

:bg

> Don't let Hutch scare you with his page thrashing gloom-and-doom

There is a reason why memory reads repeatedly outside the cache are a lot slower than those inside the cache, the time it takes the system to reload a memory page. Take this into account and you will know why its virtuous to avoid the problem if you can.

Title: Re: huge table access
Post by: porphyry5 on December 10, 2009, 01:12:15 AM

The program is to do proofreading of texts, novels, histories etc. The lists are wordlists (words, names, placenames, abbreviations) by length of word within frequency of use. Words > 19 letters are grouped together with a space-filled fixed length of 64. List size goes from 26 bytes up to 0.5+ meg

For each word it starts in the list of appropriate length and most common usage, works its way thru ever larger lists with less common words until a match is found, or not. If not it attempts to split the word in two (2 words joined together is a very frequent error, and also to find formations like ing ed 'll 'd n't etc ) letter by letter, which accounts for the really heavy use of the table. Unrecognized words are written to an output file for later manual editing. And that's all it does.

I originally wrote this in script which took 13 hours and 22 minutes to process a 1.4 meg text, hence assembly. The script took 2.5 hours merely to set up the word lists, a job which assembly does in < 1 second, about 9000 times as fast, if anyone is interested in comparative efficiency. Its so fast I could probably write each list to disk and read it back in whenever needed and still be way ahead of the game, except that the largest list already exceeds HeapAlloc's size limit, so it looks like I'm stuck with VirtualAlloc whatever I do.

A question for Jochen: Thank you, that did the trick. But I take it you meant GetCurrentProcessId and OpenProcess, GetCurrentProcess produces an error in OpenProcess?

Title: Re: huge table access
Post by: redskull on December 10, 2009, 01:55:20 AM

If I was going to do this, i would set it up not as several lists sorted by frequency, but as one large list sorted alphbetically (you are just checking for the existence of a word in the list?) When sorted alphabetically, you can "divide and conquer" the entire thing; for example, if you have a list of 10 million words sorted by first letter, you can check the word directly in the middle to determine if your test word will be in the upper or lower half, effectively searching 5 million words in one shot. Similarly, two comparisons narrows it down even further, and so on. Of course, it takes time to initially sort the list, but ces la vei. If you the file gets too large for comfort, you can divide it into fixed-sized blocks and index them by by the last word in each block. Of course, if you are sorting 9000x as fast, perhaps it's not even a concern anymore.

-ac

Title: Re: huge table access
Post by: Larry Hammick on December 10, 2009, 03:25:42 AM

Quote from: redskull on December 10, 2009, 01:55:20 AM
If I was going to do this, i would set it up not as several lists sorted by frequency, but as one large list sorted alphbetically (you are just checking for the existence of a word in the list?) When sorted alphabetically, you can "divide and conquer" the entire thing ...
-ac

I agree. A jumbo string table, already alphabetically sorted, can be provided with an index consisting of three fields for each string:
field1: what to do if this string is a match for the string being sought
field2: either zero (if no match can exist) or the address where to look next, if this string is alphabetically earlier than the string being sought
field3: either zero or the address where to look next if this string is alphabetically later than the string being sought
With this sort of index, a program can look up a string in a table of 1,000,000 strings using at most 20 string comparisons, because 2^20 is more than 1,000,000.
Once the big list is sorted, the index is not hard to generate. I will blow the dust off some old source code of mine, and post a demonstration in a day or two.

Title: Re: huge table access
Post by: hutch-- on December 10, 2009, 03:37:08 AM

A couple of things, use GlobalAlloc() with the GMEM_FiXED flag, someone will have a bleed and regurgitate that its not politically correct but it will routinely allocate over a gigabyte without batting an eyelid.

The next thing is a character table of offsets to each table of aplhabetical categories. Divides the lead time by either 26 or 52 if you have the tables set up case sensitive.

What is your total word count ? From memory English has about a half a million base words then you have modifiers like prefixes and suffixes. The total word count will effect the ultimate size of your table design.

Title: Re: huge table access
Post by: z941998 on December 10, 2009, 05:25:43 AM

Indexs can be thought as binary trees with various nodes. A four level node with 200 keys, can access 16 gig database/table with ease, each node level will have one access (comparison) and the last node there can be a potential of n - 2 key word searches. Any trees can have more or less levels, with more or less keys to search for, create routines to analysis the various structures.

I use this technique for Power Ball, I have all combination in a table with stats, a 4 level search takes less than a second to get the stats I need for a specific combination. Using sequential search access method and something like BASIC, this might take a day or two to complete.

Learning how to do Trees and Nodes, and thereby database/table indexs is a must for all.

Title: Re: huge table access
Post by: jj2007 on December 10, 2009, 07:13:14 AM

Quote from: porphyry5 on December 10, 2009, 01:12:15 AM
A question for Jochen: Thank you, that did the trick. But I take it you meant GetCurrentProcessId and OpenProcess, GetCurrentProcess produces an error in OpenProcess?

I knew this would put you on the right track, and you would eventually find the GetCurrentProcessId call. I had also hoped you would volunteer to find out what GetCurrentProcess is good for. It is the most optimised API call that Microsoft has ever produced (http://www.masm32.com/board/index.php?topic=12652.msg97577#msg97577)
:wink

Title: Re: huge table access
Post by: Mirno on December 10, 2009, 11:01:58 AM

A long time ago I had to write a searchable dictionary for a uni project (scrabble game).

Even then, it was easy to fit the dictionary in a tree - each node has 26 branches, and a boolean to denote a complete word. As it was scrabble, the word list was 15 characters long, and took 40 meg or so. Your memory foot print would go up (I think it was 30 or so meg for me), and you may have to modify it if you want to include punctuation, but it was blindingly fast.

The original project was in C, but I rewrote it in asm - it was fairly simple. Unfortunately I am working abroad, otherwise I could have possibly found the code.

Mirno

Title: Re: huge table access
Post by: z941998 on December 10, 2009, 11:20:12 AM

Mirno I would be interest in seeing your coding / app.

For that matter I would also like to see what Larry H. has also.

I have attached a file that has my feeble first atempt at multiway trees. Included is a tool for calculating level and node sizes for trees, just disregard the information relating to powerball.

Title: Re: huge table access
Post by: porphyry5 on December 10, 2009, 05:05:14 PM

whoa, this is getting off the point. asm itself I am finding to be quite straightforward, and very explicit, a welcome change. My problem is with winapi functions, and their unspoken limitations; and their documentation has to be the most turgid prose ever written. The code that builds the table and index is already written and tested, and works fine; but then the system overwrites my table, hence my first appeal to this forum.

The binary search techniques several of you refer to are already in hand, but thank you.

I did begin with a single huge file when I wrote the script version, but that created problems when separating concatenated words; eg the formation "abranch" would separate into "ab ranch" instead of "a branch" because "ab ranch" was the first valid combination found. A largely, but not entirely, successful solution was to use multiple lists, beginning with only the most common words, and progressing out to ever larger lists when a match was not found. Sometimes, of course, you need the less common combination, but I don't think the solution to that problem is programmable.

A question for Hutch: don't the Global functions simply call the Heap functions to do the job for them, which in turn call the Virtual functions when the size requested exceeds about 0.5 meg? Winapi32.hlp says so, but is that yet another of its generalizations that don't cover every situation? I have progressed now as far as VirtualLock, which fails with the system message "Insufficient quota to complete the requested service" which I take it means it does not have enough available to lock the amount of space requested.

Title: Re: huge table access
Post by: hutch-- on December 11, 2009, 05:32:52 AM

Graham,

Most of the memory allocation API calls call the same function in NTDLL.DLL with minor variations. HeapAlloc() is basically designed for C programming but there are others. If its fixed memory that you need, GlobalAlloc() set with the GMEM_FIXED flag performs well and can easily handle over 1 gigabyte if you have the memory available. Virtual Alloc tends to be slow and while it can handle other situations like fragmented memory and other problems the OS may have, for large block high performance code usually GlobalAlloc() is a lot faster.

Have a play with them, there are various strategies available depending on the performance you need.

Title: Re: huge table access
Post by: Ghandi on December 11, 2009, 07:29:50 AM

Code Select


The code that builds the table and index is already written and tested, and works fine; but then the system overwrites my table, hence my first appeal to this forum.

Are you certain it is the system overwriting your table and if so, where and why? Can you give an example of the code where it is occurring please?

HR,
Ghandi

Title: Re: huge table access
Post by: redskull on December 11, 2009, 12:36:13 PM

From MSDN:

"The global and local functions are supported for porting from 16-bit code, or for maintaining source code compatibility with 16-bit Windows. Starting with 32-bit Windows, the global and local functions are implemented as wrapper functions that call the corresponding heap functions using a handle to the process's default heap. Therefore, the global and local functions have greater overhead than other memory management functions."

Since all MMU functions essentially do the same thing, the quickness probably depends on whether you commit or just reserve the memory, but one just trades speed now for slowness later (the pages will be committed as you use them, instead of all up front). But as always, whatever you measure as fastest is just that. Also re: the page locking: you probably need to increase the size of your working set before trying to lock the pages (you can lock pages 10 pages in memory if your only allowed to have 8 in there at one time). But I renew my objection; Windows does an alright job of maintaining the working set sizes for you, and there's no gaurentee that locking a page into memory keeps it there; Windows can still override your lock if youre in a wait state, which means everytime your thread starts to run again it must reload all the pages, even if they are unused.

-ac

Title: Re: huge table access
Post by: hutch-- on December 11, 2009, 01:05:33 PM

:bg

> A couple of things, use GlobalAlloc() with the GMEM_FiXED flag, someone will have a bleed and regurgitate that its not politically correct but it will routinely allocate over a gigabyte without batting an eyelid.

Code Select


invoke GlobalAlloc,GMEM_FIXED,1024*1024*1024
mov gMem, eax  ; direct pointer to allocated memory.
........
invoke GlobalFree,gMem

Its allocated from NTDLL.DLL so why lose sleep over it.

Title: Re: huge table access
Post by: jj2007 on December 11, 2009, 01:11:30 PM

Quote from: redskull on December 11, 2009, 12:36:13 PM
Starting with 32-bit Windows, the global and local functions are implemented as wrapper functions that call the corresponding heap functions using a handle to the process's default heap. Therefore, the global and local functions have greater overhead than other memory management functions.

Have a walk with Olly, it's hilarious. Must be organically grown code (TM).

Code Select

invoke GlobalAlloc, GMEM_FIXED, 8000000

7C80FE69       FF15 0C10807C      call near [<&ntdll.RtlAllocate>; ntdll.RtlAllocateHeap
...
7C96D636       68 C0D8967C        push 7C96D8C0                  ; ASCII "RtlAllocateHeap"
...
7C94A2B1       E8 2832FCFF        call ZwAllocateVirtualMemory
...
7C94A2EB       E8 3B61FCFF        call RtlGetNtGlobalFlags
...

Title: Re: huge table access
Post by: porphyry5 on December 11, 2009, 04:18:44 PM

Sorry, but most of you are going completely over my head, I don't understand your terminology.

I gather that there are 3 competing possibilities, Heap Global and Virtual. Virtual gives a problem so I will try Heap and Global instead.

Title: Re: huge table access
Post by: jj2007 on December 11, 2009, 04:53:13 PM

Quote from: porphyry5 on December 11, 2009, 04:18:44 PM
Sorry, but most of you are going completely over my head, I don't understand your terminology.

Apologies - we have a tendency to hijack threads for rants and philosophical reflections on the virtues of VirtualAlloc... ::)

Quote
I gather that there are 3 competing possibilities, Heap Global and Virtual. Virtual gives a problem so I will try Heap and Global instead.

Just try what Hutch suggests. It will work fine.

Code Select

invoke GlobalAlloc,GMEM_FIXED,1024*1024*1024
mov gMem, eax  ; direct pointer to allocated memory.
........
invoke GlobalFree,gMem

Title: Re: huge table access
Post by: dedndave on December 11, 2009, 10:46:38 PM

Quotewe have a tendency to hijack threads for rants and philosophical reflections on the virtues of VirtualAlloc

this place would be boring if we didn't have "discussions" (aka "rants") - lol

Title: Re: huge table access
Post by: Larry Hammick on December 12, 2009, 10:12:40 AM

If it's just a question of getting a big table into memory, well, LocalAlloc is fine unless maybe the loader has not given the program a big enough heap. But this works:

Code Select


filemax = 4000000h  ;64 megs

.data
bigbuffer dd ?
amtread dd ?
hFile dd ?
filespec db "bigfile",0     ;for illustration
...

.code
invoke HeapCreate,0,filemax,filemax
test eax,eax
jz heapproblem
mov bigbuffer,eax
invoke CreateFileA,addr filespec,GENERIC_READ,0,0,OPEN_ALWAYS,FILE_ATTRIBUTE_NORMAL,0
inc eax
jz fileproblem
dec eax
jz fileproblem
mov hFile,eax
invoke ReadFile,eax,bigbuffer,filemax,addr amtread,0
test eax,eax
jz short readproblem
invoke CloseHandle,hFile
...

z941998, I'll put some code together to illustrate how to make and use indexes, and put in the laboratory in a day or two.

Title: Re: huge table access
Post by: porphyry5 on December 12, 2009, 04:45:32 PM

Sure can't argue with success, Global works and Heap works, Virtual does not. Its interesting how some of you rooted for Global, some for Heap, but absolutely no one rooted for Virtual. Should tell me something, eh!

Let's see, already I have: avoid Virtual, what's the point of GetCurrentProcess, and FormatMessage spurns eax. Winapi is shaping up just like Notetab's script, riddled with undocumented quirks.

Thank you all again for your help, I really appreciate it.

Title: Re: huge table access
Post by: dedndave on December 12, 2009, 05:15:39 PM

http://msdn.microsoft.com/en-us/library/aa366574(VS.85).aspx

QuoteNote The global functions have greater overhead and provide fewer features than other memory management functions. New applications should use the heap functions unless documentation states that a global function should be used. For more information, see Global and Local Functions.

now, you know why i like the heap functions
although, VirtualAlloc should work - i prefer not to use it unless, perhaps, i want memory initialized to 0 (with HeapAlloc that is optional)

http://msdn.microsoft.com/en-us/library/aa366887(VS.85).aspx

as for GetCurrentProcess - it always returns -1
but - it would be bad coding practice to "hard-code" -1 into everything
what if it changes for some reason - or - what if i want to use the ensuing code in a thread
best to call the function - it is fast

Title: Re: huge table access
Post by: TNick on December 12, 2009, 05:34:36 PM

Quote from: porphyry5 on December 12, 2009, 04:45:32 PM
... avoid Virtual, ...

Wait... what?

Code Select


.486
.MODEL FLAT, STDCALL
OPTION CASEMAP:NONE

        INCLUDE         windows.inc
        INCLUDE         kernel32.inc
        INCLUDELIB      kernel32.lib
        INCLUDE         user32.inc
        INCLUDELIB      user32.lib
        
        u64     STRUCT
                l       DWORD           ?
                h       DWORD           ?
        u64     ENDS


.DATA
        hFile                   DWORD           INVALID_HANDLE_VALUE
        pFContent               DWORD           0
        ReadEcho                DWORD           0
        F_Size                  u64             <>
        strFileToRead           BYTE            "TstVirtual.asm",0
        
        strErrCapt              BYTE            "Virtual Test has a problem. Please don't tell $M about it!!!",0
        strERR_CantOpen         BYTE            "File could not be opened",0
        strERR_Size             BYTE            "GetFileSizeEx failed on selected file",0
        strERR_Sane             BYTE            "File too large to process",0
        strERR_Virt             BYTE            "The system was unable to provide memory for this operation",0
        strERR_Read             BYTE            "Could not read selected file",0


.CODE

AppEntryPoint:



        INVOKE  CreateFile,     ADDR strFileToRead,GENERIC_READ,FILE_SHARE_READ,NULL,OPEN_EXISTING,0,NULL
        cmp     eax,            INVALID_HANDLE_VALUE
        mov     hFile,          eax
        je      ERR_CantOpen
        INVOKE  GetFileSizeEx,  eax,ADDR F_Size
        test    eax,            eax
        jz      ERR_Size
        cmp     F_Size.h,       0
        jne     ERR_Sane
        cmp     F_Size.l,       0FFFFFh
        ja      ERR_Sane
        INVOKE  VirtualAlloc,   NULL,F_Size.l,MEM_COMMIT or MEM_RESERVE,PAGE_READWRITE
        test    eax,            eax
        mov     pFContent,      eax
        jz      ERR_Virt
        INVOKE  ReadFile,       hFile,eax,F_Size.l,ADDR ReadEcho,NULL
        test    eax,            eax
        jz      ERR_Read
        
        ; do what you need to do with the file content
        mov     edi,            pFContent
        mov     ecx,            F_Size.l
        xor     eax,            eax
        repne   scasb

        jmp     CleanExit

        
ERR_CantOpen:
        INVOKE  MessageBox,     NULL,ADDR strERR_CantOpen,ADDR strErrCapt,MB_OK or MB_ICONERROR
        jmp     CleanExit
        
ERR_Size:
        INVOKE  MessageBox,     NULL,ADDR strERR_Size,ADDR strErrCapt,MB_OK or MB_ICONERROR
        jmp     CleanExit

ERR_Sane:
        INVOKE  MessageBox,     NULL,ADDR strERR_Sane,ADDR strErrCapt,MB_OK or MB_ICONERROR
        jmp     CleanExit

ERR_Virt:
        INVOKE  MessageBox,     NULL,ADDR strERR_Virt,ADDR strErrCapt,MB_OK or MB_ICONERROR
        jmp     CleanExit

ERR_Read:
        INVOKE  MessageBox,     NULL,ADDR strERR_Read,ADDR strErrCapt,MB_OK or MB_ICONERROR
        ;jmp    CleanExit

        
CleanExit:
        cmp     hFile,          INVALID_HANDLE_VALUE
        je      @F
                INVOKE  CloseHandle,    hFile
        @@:
        cmp     pFContent,      0
        je      @F
                INVOKE  VirtualFree,    pFContent,0,MEM_RELEASE
        @@:
        
        INVOKE  ExitProcess,    NULL


END AppEntryPoint

I would say heap functions are great for small chunks of memory. The downside with Virtual is that bytes after F_Size.l 'till end of the page have the potential to remain unused. However, when you process large files and use the heap that came with the process for current needs, you may find useful to use Virtual...
Do not avoid Virtual!!! :bdg

Nick

Title: Re: huge table access
Post by: dedndave on December 12, 2009, 05:45:39 PM

QuoteI would say heap functions are great for small chunks of memory.

i use them for large chunks, as well - you failed to mention a downside in that

Title: Re: huge table access
Post by: TNick on December 12, 2009, 06:01:16 PM

Hello, Dave!

Heap function need to check if that kind of memory exists in what that heap already has and, if it does not, it will call Virtual function to add some more memory. It will get some full pages that it need to handle (manage those extra bytes at the end that the user does not need, write some headers for those pages or, maybe, add a pointer to allocated range to an array).

Using Virtual you may skip this overhead.

Nick

PS I have never done a study about how Windows Heap works. My thougths are based on how I would implement a heap. $M people may be far more smart and they may have found a way to avoid this overhead for large block, in which case I retract everything I said and get back to my cold. :red

Title: Re: huge table access
Post by: dedndave on December 12, 2009, 07:11:29 PM

yah - if your request is under, let's say ~ 2 Gb, HeapAlloc shouldn't bark
if you poke around in the Laboratory sub-forum, i am sure you can find some timing results

Title: Re: huge table access
Post by: TNick on December 12, 2009, 07:41:50 PM

Yes... or we can go here (http://msdn.microsoft.com/en-us/library/ms810466.aspx). (click it fast, before it becomes a dead link. :)
And since we're there, entire "Memory Management" category looks interesting, so I'm going back to read some more.

Nick

PS If you're too late and the link is broken, try this (http://www.google.ro/search?q=Heap:+Pleasures+and+Pains).

Title: Re: huge table access
Post by: jj2007 on December 12, 2009, 08:27:23 PM

Quote from: dedndave on December 12, 2009, 07:11:29 PM
if you poke around in the Laboratory sub-forum, i am sure you can find some timing results

Here are some cycle counts from an old proggie of mine. VirtualAlloc is really slow for small allocations - the overhead is a lot higher.

Code Select

HeapAlloc, 00010000h bytes, 103 per kByte
HeapAlloc, 00001000h bytes, 846 per kByte
HeapAlloc, 00000100h bytes, 2228 per kByte

VirtualAlloc, 00010000h bytes, 51 per kByte
VirtualAlloc, 00001000h bytes, 3102 per kByte
VirtualAlloc, 00000100h bytes, 86028 per kByte

Title: Re: huge table access
Post by: TNick on December 12, 2009, 08:41:39 PM

Hello, jj!
Do you mind sharing this test program? These are some interesting results right there...

Nick

Title: Re: huge table access
Post by: redskull on December 12, 2009, 09:41:52 PM

Those timing are somewhat misleading, which I say cautiously having not seen the code. There's no such thing as a VirtualAlloc call for x100 bytes; on an x86 system, you allocate memory in integer multiples of 4k pages, no discussion. So to say 'x cycles per Kbyte' really doesn't make any sense. A VirtualAlloc for x100 bytes will be (nearly) the same as a VirtualAlloc for x1000, but x1001 will actually be a call for x2000. Also, to say that Heap functions are faster is self-evident; most of the time when you call a Heap function (for a small value), no more memory is actually allocated. Also, the Heap will try to "defragment" those smaller chunks to free up room. It's like saying allocating memory is slower than NOT allocating memory; most of the time, for small values, a call to HeapAlloc is unnecessary; your process already "owns" that memory, and can read and write to without needing to call the function. A quick look at the Olly memory map will show this: anytime a new page is allocated, it will be highlighted in red. If nothing changes after a call to the heap functions, then you flat wasted your time with the call: your process could have stored data there already. Also, @dedndave, you can't ever allocate more than 2Gb with *any* of the functions; there's simply just not enough address space to go round. There's no real "downside" to using the Heap Functions for large values; there's just no real upside to doing it either. If you *know* you need at least another page allocated, why waste the time having Windows check to make sure?

-r

Title: Re: huge table access
Post by: jj2007 on December 12, 2009, 10:05:48 PM

Quote from: TNick on December 12, 2009, 08:41:39 PM
Hello, jj!
Do you mind sharing this test program? These are some interesting results right there...

Nick,
I didn't attach it because it's old and very messy. But you may use bits and pieces to roll your own. Here it is. No warranty whatsoever, all kinds of disclaimers apply :wink

@redskull: What you write is entirely correct. My timings just confirm common sense: Use HeapAlloc for small to medium allocations, switch to VirtualAlloc for big ones. Cautious with the statement "most of the time, for small values, a call to HeapAlloc is unnecessary; your process already "owns" that memory" - once in a while, this strategy will fail, and lead to very unpredictable behaviour...

Title: Re: huge table access
Post by: redskull on December 12, 2009, 11:33:58 PM

I can't say for certain, but in most of my memories the heap starts at 3 pages long, and takes up about the first 2-1/2 for it's data structures; while you could certainly read and write to there, it would probably corrupt the heap and throw an exception. And while that's certainly not a strategy to use for normal programming, the point is that if your process has a VAD for a committed page with R/W access, thats your memory to do with what you please. I just did a quick experiment: for heap allocations up to ~x650, no new memory was allocated, and the pointer returned was in that "last half a page". For greater allocations, it expanded that 3 pages to 4, then 5, etc. For a full 1GB allocation, it presumably couldn't find the address space down low (~14000) and stuck it up high (~410000). I *am* interested in why the VirtualAlloc for x100 took so long in the test...

-r

Title: Re: huge table access
Post by: hutch-- on December 13, 2009, 12:00:56 AM

Graham,

VirtualAlloc() does have its place and this is why Windows has a range of memory allocation strategies. VirtualAlloc() apparently handles discontinuous blocks and presents them to the caller as a linear address range where allocation functions that grab a single block of linear space don't do this. It depends very much on your application but in your own case with tables loaded for speed of operation, it would be a single linear block of memory that will give you what you are after.

Apart from people quoting out of date Microsoft data without keeping up on what the functions actually do, (Microsoft have been changing their mind regularly for the last 30 years) most memory allocation functions reduce down through kernel32.dll to ntdll.dll to ntoskrnl.exe and hal.dll and at the bottom, memory is memory, effectively pick the packaging that best suits your purpose. I suggested using GlobalAlloc() but ONLY with the GMEM_FIXED flag because its simpler to use, nothing is faster than it and it can allocate as much memory as your computer has available. Other style of GlobalAlloc() allocation are either for legacy special purpose tasks like the clipboard or are leftovers from 16 bit Windows where the distinctions no longer exist.

I am sorry you have been fed such a confusing range of alternatives as we run the Campus to avoid that but the diversity of views here is probably useful to you in understanding some of the different strategies available for allocating memory in Windows.

Title: Re: huge table access
Post by: porphyry5 on December 13, 2009, 05:10:02 PM

Aha, so there are Virtual defenders, after all.

Hutch, thank you for the information about Global. I do know that VirtualAlloc works, I used it alone at first and the table got built, but it didn't stay in place. That may be due to the amount of time that elapsed before I discovered it. I was so disbelieving of assembly's speed that I installed tests throughout the table-build process, literally counting and checking every byte each time it was moved. I was single-stepping through these tests with Windbg, which took about 20 minutes. Everything checked until the last test, a scan of the completed table to see if there were still embedded zeros in it from the initialization. And there were, but when I viewed the area, it was not a mixture of zeros and text as expected, it was heterogeneous, I think executable code had been written there. Hence my attempt to use VirtualLock.

Now my question, given assembly's speed, I anticipate the completed program will run about 7 seconds on average. Is it reasonable to expect a memory area to remain intact for that long, or is that completely unpredictable?

Title: Re: huge table access
Post by: dedndave on December 13, 2009, 05:19:23 PM

nah - lol
something else is putting data there - or, at least, you are not putting things in there like you think you are
i wondered what you wanted VirtualLock for - should be no need for that
you own that memory

Title: Re: huge table access
Post by: hutch-- on December 13, 2009, 06:10:49 PM

Graham,

If your machine is very close to its limit of physical memory you run the risk of the OS trying to page memory out to make more room for a later loaded program but usually its the case with GlobalAlloc() as fixed memory that it does not normally get paged out and if its allocated when the machine is low on memory the original allocation fails in the first place.

Now its simple enough to test if you get what you allocate but the next trick is to make sure your app only writes to memory addresses that your app owns otherwise it will go bang as soon as it tried to read or write outside that address range. This is roughly bounds checking stuff but once you are satisfied that its safe in its reads and writes, as long as the machine has enough memory left over not to effect normal operations you should be able to concentrate on tweaking the overall design to try and get it a bit faster.

Now another factor is if your memory usage is subject to reallocation to increase size, almost exclusively the address will change so if this is the situation, always calculate from the initial memory pointer and use offsets from that pointer for your targets. If you are ending up with rubbish in addresses that should have your own content you are getting leakage somewhere and its probably from within your own application.

Title: Re: huge table access
Post by: porphyry5 on December 13, 2009, 07:06:32 PM

oh shit, I am so embarrassed now. I abjectly apologize for taking you all on a wild goose chase.

Instead of

mov al,0         ; look for any byte in compacted tables still set to binary zero
lea edi,alltbls      ; start addr of compacted tables, ecx is preset to their total size
repne scasb      ; scan all of compacted tables for binary zero

I should have done

mov al,0         ; look for any byte in compacted tables still set to binary zero
mov edi,alltbls      ; start addr of compacted tables, ecx is preset to their total size
repne scasb      ; scan all of compacted tables for binary zero

The scan works fine, everything is where it should be, even 30 minutes later. Again, I am totally sorry for creating this mare's-nest. :'(

Title: Re: huge table access
Post by: hutch-- on December 13, 2009, 07:19:40 PM

No problems, we have all been there, I am pleased you have got it to work properly. Now you can play with it to get it faster. :bg

Title: Re: huge table access
Post by: jmnewcomer on December 25, 2009, 03:25:42 AM

12MB is a pretty small table as sizes go.

I'm not sure why you say you need the Virtual functions, since malloc should work just dandy. Certainly the size has nothing to do with this, since 12MB isn't a whole whopping lot of memory. I don't think data structures are getting big until they can be measured in integer-hundreds-of-megabytes.

As already pointed out, TLB cache thrashing is going to be a factor, but also L1 and L2 cache thrashing can be an issue.

You can quite possibly set working set size, but using VirtualLock requires that someone with admin privileges grant you the right to do that, and set a quota. It is a last resort. Even working set size is questionable. You don't have any data to suggest that this is a problem, and there is nothing worse than pre-optimizing to solve a non-existent problem. It wastes time and gains nothing.

I'm working with someone right now who wants to VirtualLock 30GB in Win64. In a 32GB system. Probably not possible. But a paltry 12MB is about 1/2% of the virtual address space you have available, which is 2GB. Generally, cache misses and TLB updates are going to be a bigger bottlenect than paging because you have such a tiny table relative to the address space available. You should also mention the size of physical memory installed on the machine(s) you intend to run it on. While smaller physical memory increases the likelihood of page faults, it also means that VirtualLock is going to have a much more profound negative impact on overall system performance, and even the performance of your app (with data locked down, you are more likely to find other, unlocked, data pages have been moved out, and/or code pages moved out). Note also that increasing working set requires administrative permissions be established for most users, so it is not clear that you can even use this as a technique.

You would need to say something about the nature of the data and its accesses (you said 12MB, but is that 12 1-MB records, or 1,000,000 12-byte records, or what?) In some cases, arrays are your worst possible choice for the representation, so you have to say something about the nature of the keys used to search it, the nature of the searches, the expected rate of inserts and deletions, etc. before a reasonable representation can be chosen. Back In The Day (as we say), we spent a lot of time "packing" our tables to minimize page faults and maximize cache hits, when we knew the nature of the table and the data. So there is no "one right answer" to your problem.

The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: porphyry5 on December 08, 2009, 04:20:43 PM