News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

huge table access

Started by porphyry5, December 08, 2009, 04:20:43 PM

Previous topic - Next topic

hutch--

No problems, we have all been there, I am pleased you have got it to work properly. Now you can play with it to get it faster.  :bg
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jmnewcomer

12MB is a pretty small table as sizes go.

I'm not sure why you say you need the Virtual functions, since malloc should work just dandy.  Certainly the size has nothing to do with this, since 12MB isn't a whole whopping lot of memory.  I don't think data structures are getting big until they can be measured in integer-hundreds-of-megabytes.

As already pointed out, TLB cache thrashing is going to be a factor, but also L1 and L2 cache thrashing can be an issue.

You can quite possibly set working set size, but using VirtualLock requires that someone with admin privileges grant you the right to do that, and set a quota.  It is a last resort.  Even working set size is questionable.  You don't have any data to suggest that this is a problem, and there is nothing worse than pre-optimizing to solve a non-existent problem.  It wastes time and gains nothing.

I'm working with someone right now who wants to VirtualLock 30GB in Win64.  In a 32GB system.  Probably not possible.  But a paltry 12MB is about 1/2% of the virtual address space you have available, which is 2GB.  Generally, cache misses and TLB updates are going to be a bigger bottlenect than paging because you have such a tiny table relative to the address space available.  You should also mention the size of physical memory installed on the machine(s) you intend to run it on.  While smaller physical memory increases the likelihood of page faults, it also means that VirtualLock is going to have a much more profound negative impact on overall system performance, and even the performance of your app (with data locked down, you are more likely to find other, unlocked, data pages have been moved out, and/or code pages moved out).  Note also that increasing working set requires administrative permissions be established for most users, so it is not clear that you can even use this as a technique.

You would need to say something about the nature of the data and its accesses (you said 12MB, but is that 12 1-MB records, or 1,000,000 12-byte records, or what?)  In some cases, arrays are your worst possible choice for the representation, so you have to say something about the nature of the keys used to search it, the nature of the searches, the expected rate of inserts and deletions, etc. before a reasonable representation can be chosen.  Back In The Day (as we say), we spent a lot of time "packing" our tables to minimize page faults and maximize cache hits, when we knew the nature of the table and the data.  So there is no "one right answer" to your problem.