News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

huge table access

Started by porphyry5, December 08, 2009, 04:20:43 PM

Previous topic - Next topic

porphyry5

I have just started to learn assembly.  My program needs to make continual random access to a huge table in memory (approaching 12 Meg).  Once the table is created the program does virtually nothing but access this table.  On average, each input file will cause the table to be accessed more than a million times.

How do I set it up?

Its size means the Virtual functions are going to be used to create it.  I need the whole thing locked into memory so I must use SetProcessWorkingSetSize before VirtualLock can be applied.  But SetProcessWorkingSetSize requires a handle.  VirtualAlloc does not create a handle.  Passing the address returned by VirtualAlloc to SetProcessWorkingSetSize results in an error.

Do I have to set it up as a dummy file and use CreateFile to get a handle?  Is there a neater way to do it?

Thank you in advance for your help.

Graham Lawrence

jj2007

Graham,
You need a process handle. I would check MSDN for GetCurrentProcess and OpenProcess.
Best,
Jochen

dedndave

12 mb isn't all that large, really
i would use the heap allocation functions - a matter of preference, perhaps
it returns a handle - but the handle is the base address

MichaelW

In my (admittedly crude and possibly bad) test I cannot detect any significant difference between memory allocated with HeapAlloc and memory allocated with VirtualAlloc, with or without setting the process working set size to an appropriate value. Random accesses take far longer than they should, presumably due to caching effects.
eschew obfuscation

redskull

Quote from: MichaelW on December 08, 2009, 06:18:35 PM
..any significant difference between memory allocated with HeapAlloc and memory allocated with VirtualAlloc...

Just for the OP's sake: there really *is* no difference.  The heap manager should probably have been called the *allocation* manager, because all it does is check to see if there is enough room in a currently allocated page, and either return that address or allocate another page (or multiple pages) using VirtualAlloc.  It just a convenience so that lame high-level programmers don't commit a 4K page for every 4-byte integer.  For example, the first time you HeapAlloc a 10-byte space, the heap manager will allocate an entire 4K page; the second time you HeapAlloc another 10-byte space, it will eschew the allocation because there is still room in the first.  After the page has been allocated, a HeapAlloced page and a VirtualAlloced page are completely the same, and subject to the same exact caching/translation/etc.

-r

PS - as far as processing the table goes, x86 has some seldom-used (and probably slow) table accessing instructions you can use,  XLAT and XLAB, to make life easier
Strange women, lying in ponds, distributing swords, is no basis for a system of government

dedndave

it seems that heap allocate is a little faster to me
although, i have not done a thorough test to verify that
(i always use it without initializing the memory to 0)

porphyry5

Thank you all for your help, I will go now and check out GetCurrentProcess and OpenProcess and XLAT and XLAB.

When I was reading up on the Heap vs Virtual functions, I believe it was HeapAlloc, actually calls VirtualAlloc for any area above about 0.5 Meg, so I figured I might as well use VirtualAlloc to begin with.

I may have misled you somewhat by using the term random access of a single table, its actually 160 separate tables compacted together.  I make an index of the start address of each table, then use a binary search to hunt the item in the selected table, I don't know if that makes any difference.

hutch--

Graham,

The factor that will limit an application with a large table is what is called page thrashing, when you have a memory block that is a lot larger than the processor cache a memory read at an address outside that looaded in cache means a memory page reload and this really slows down the memory accesses when they are randomly distributed across a large address range.

If it is possible it may be worth trying to reconstruct the data into a more compact format so that you can get the size down and avoid the cost of so many cache misses.

Now RE 160 tables that you must index, putting their start addresses into a DWORD table would probably make that access speed a lot faster if you have some technique of identifying the table members.

Tell us a bit more about the size and type of data you are working on and there may be ways to get around saome of the limitations you have found.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

redskull

Quote from: dedndave on December 09, 2009, 12:16:19 AM
it seems that heap allocate is a little faster to me
although, i have not done a thorough test to verify that
(i always use it without initializing the memory to 0)

It would all depend on whether the page is already allocated; A call to VirtualAlloc will *always* require a new page be allocated, page table entries built, and VAD's constructed, which are a slow operation.  Most of the time, with HeapAlloc, a page already exists, so it simply returns a pointer within that page, and nothing gets done except a little math.

  Don't let Hutch scare you with his page thrashing gloom-and-doom, 12Megs is still pretty small in the grand scheme of things, and Windows does a pretty decent job of keeping track of the working set size to keep things on an even keel.  :bg  But, as always, smaller is usually faster.  It might be more efficient to divide things into fixed-size blocks, and then use an indexing system to determine what part to load.  But really, my iTunes routinely uses more than 600M without batting an eye, so I wouldn't lose much sleep.

Also, just for completeness sake, if the tables aren't changing once in memory, you are always better off to build a hash function and search them in constant time.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

hutch--

 :bg

> Don't let Hutch scare you with his page thrashing gloom-and-doom

There is a reason why memory reads repeatedly outside the cache are a lot slower than those inside the cache, the time it takes the system to reload a memory page. Take this into account and you will know why its virtuous to avoid the problem if you can.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

porphyry5

The program is to do proofreading of texts, novels, histories etc.  The lists are wordlists (words, names, placenames, abbreviations) by length of word within frequency of use.  Words > 19 letters are grouped together with a space-filled fixed length of 64. List size goes from 26 bytes up to 0.5+ meg

For each word it starts in the list of appropriate length and most common usage, works its way thru ever larger lists with less common words until a match is found, or not.  If not it attempts to split the word in two (2 words joined together is a very frequent error, and also to find formations like ing ed 'll 'd n't etc ) letter by letter, which accounts for the really heavy use of the table.  Unrecognized words are written to an output file for later manual editing.  And that's all it does.

I originally wrote this in script which took 13 hours and 22 minutes to process a 1.4 meg text, hence assembly.  The script took 2.5 hours merely to set up the word lists, a job which assembly does in < 1 second, about 9000 times as fast, if anyone is interested in comparative efficiency.  Its so fast I could probably write each list to disk and read it back in whenever needed and still be way ahead of the game, except that the largest list already exceeds HeapAlloc's size limit, so it looks like I'm stuck with VirtualAlloc whatever I do.

A question for Jochen:  Thank you, that did the trick.  But I take it you meant GetCurrentProcessId and OpenProcess, GetCurrentProcess produces an error in OpenProcess?

redskull

If I was going to do this, i would set it up not as several lists sorted by frequency, but as one large list sorted alphbetically (you are just checking for the existence of a word in the list?)  When sorted alphabetically, you can "divide and conquer" the entire thing; for example, if you have a list of 10 million words sorted by first letter, you can check the word directly in the middle to determine if your test word will be in the upper or lower half, effectively searching 5 million words in one shot.  Similarly, two comparisons narrows it down even further, and so on.  Of course, it takes time to initially sort the list, but ces la vei.  If you the file gets too large for comfort, you can divide it into fixed-sized blocks and index them by by the last word in each block.  Of course, if you are sorting 9000x as fast, perhaps it's not even a concern anymore.

-ac
Strange women, lying in ponds, distributing swords, is no basis for a system of government

Larry Hammick

Quote from: redskull on December 10, 2009, 01:55:20 AM
If I was going to do this, i would set it up not as several lists sorted by frequency, but as one large list sorted alphbetically (you are just checking for the existence of a word in the list?)  When sorted alphabetically, you can "divide and conquer" the entire thing ...
-ac
I agree. A jumbo string table, already alphabetically sorted, can be provided with an index consisting of three fields for each string:
field1: what to do if this string is a match for the string being sought
field2: either zero (if no match can exist) or the address where to look next, if this string is alphabetically earlier than the string being sought
field3: either zero or the address where to look next if this string is alphabetically later than the string being sought
With this sort of index, a program can look up a string in a table of 1,000,000 strings using at most 20 string comparisons, because 2^20 is more than 1,000,000.
Once the big list is sorted, the index is not hard to generate. I will blow the dust off some old source code of mine, and post a demonstration in a day or two.

hutch--

A couple of things, use GlobalAlloc() with the GMEM_FiXED flag, someone will have a bleed and regurgitate that its not politically correct but it will routinely allocate over a gigabyte without batting an eyelid.

The next thing is a character table of offsets to each table of aplhabetical categories. Divides the lead time by either 26 or 52 if you have the tables set up case sensitive.

What is your total word count ? From memory English has about a half a million base words then you have modifiers like prefixes and suffixes. The total word count will effect the ultimate size of your table design.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

z941998

Indexs can be thought as binary trees with various nodes.  A four level node with 200 keys, can access 16 gig database/table with ease, each node level will have one access (comparison) and the last node there can be a potential of n - 2 key word searches.  Any trees can have more or less levels, with more or less keys to search for, create routines to analysis the various structures.

I use this technique for Power Ball, I have all combination in a table with stats,  a 4 level search takes less than a second to get the stats I need for a specific combination.  Using sequential search access method and something like BASIC, this might take a day or two to complete.

Learning how to do Trees and Nodes, and thereby database/table indexs is a must for all.