News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Linked list and linked array

Started by gabor, November 27, 2006, 03:15:44 PM

Previous topic - Next topic

gabor

Hello!

I am finally ready with my implementation of linked list and linked array. :bg  But I am not sure whether it is satisfying or just crap.
I'd like to have this topic to analyse and to discuss this work of mine and the possiblities of improvements.
First of all, please test it!

Some words about my implementation:

  • The linked list is the classic linked list linked in both directions. The list can be sorted (when insertion is done in the sort order) can contain objects of different sizes.
  • The linked array is a linked list of arrays (subarrays) that togather can build up a huge array. The array is unsorted (the new element can only be appended to the array; removing an element usually corrupts the sorting order because the gap produced by the remove is filled with the last element).
  • The most important parts are the insert/remove/retrieve methods. Since insert and remove both uses retrieve this one is the top most important method. I believe it can be optimized though according to my observations the slowest part is the Heap usage...

I think the code is very simple, the symbol names speak for themself and there are many comments. There is also a small Dialog-application as a testbed for my lib.
I attached the zipped RadAsm project. (BTW. What is the .def file in a RadAsm project good for?)

Edit: 01.12.2006 I attached a newer zip. It contains all make files to create the executables (but to be 100% sure, I packed the exe too). There are 2 executables, one using the library as a dll, the other as a static lib.

Edit: 02.12.2006 I attached a newer zip again. The executable displays the amount of consumed memory now.

Edit: 19.12.2006 Another newer version. There is no display of amount of consumed memory any more, but the test app became more usable and has more tests.


Greets, Gábor

[attachment deleted by admin]

PBrennick

Gábor,
I am very new to RadASM but a def file as you know, is used to access exports. It is only used during the build process to tell the linker what functions are needing to be exported so they can be accessed by the project. Without the def file the DLL (I think) would be of no use. A static library does not need a def file and in a lot of cases is a better solution. When you build a project (the exe, not the DLL) using a static lib, only the functions called by the process is included in the final product. Using a DLL, the process gets ALL the function wheth they are used or not, so unless you are using ALL of the functions there will be less bloat in the final product if a static lib is used instead.

Paul
The GeneSys Project is available from:
The Repository or My crappy website

stanhebben

I ever programmed such data structure in Java, I'm sure such structure has it's uses.

As for the memory allocation, I wrote a few macros a fair while ago. They are extremely fast, but can only be used with fixed sizes. (see attachment)

[attachment deleted by admin]

zooba

Nice work gabor  :clap:

One thing I noticed skimming through your code was that you're mostly using heap functions but there are some GlobalAlloc/Free functions in there, it seemed for the list headers. Is there any reason for this, or did it slip through proofing :wink ?

Cheers,

Zooba :U

gabor

Zooba,

first of all thanks for the question! Using the GlobalAlloc was by intension. Of course my idea might be wrong: first I have to create a list object that will have its own heap to store the members. So, before creating the heap I though I create the list object that is why I allocate them via GlobalAlloc. I wanted to bind the memory usage clearly to the list object. Think of an application using several lists. I believe assigning a private heap to every list object can make a lot of things simple. For example when destroying the list I don't need to deallocate every list member, just destroy the heap.

I discovered some minor errors and I think there are some methods that would be nice to have. So, I am working on the next version...

Greets, Gábor

PBrennick

Gábor,
It would have been helpful if the def file was not empty. Currently, we have to figure out
what is supposed to be exported for ourselves. Personally, I am not going to bother. Why is the DLL not included, either.

Paul
The GeneSys Project is available from:
The Repository or My crappy website

zooba

gabor,

Very good answer :thumbu . It makes a lot of sense. Perhaps we should do some benchmarking to see which way is quicker and uses less memory. I would expect that for a lot of small lists, using a single heap would be easier on memory but as the lists get longer it probably makes less difference.

In any case, it's sure to be better than what a HLL can produce. I've been doing some C# this evening and have watched the memory usage incresing by 10MB a second when busy, then drop by 50-100MB every now and then when the garbage collector kicks in.  :eek Shouldn't be hard to beat that :lol

Cheers,

Zooba :U

gabor

Hello again!

Quote from: PBrennick on November 29, 2006, 10:48:22 AM
Currently, we have to figure out what is supposed to be exported for ourselves. Personally, I am not going to bother. Why is the DLL not included, either.
Paul
I haven't taken the time yet to learn how to fill in a .def file  :eek
Though, this library was intended to be a static lib and I haven't thought about creating a dll, I guess it might be reasonable to put them all into a dll. (First I have to figure out how to create a dll...) When I attach the newer version with those mentioned improvments, I'll try to create a dll too.  :bg
Of course, ideas and comment that could improve the methods are appreciated and implemented.

Quote from: zooba on November 29, 2006, 01:37:57 PM
Perhaps we should do some benchmarking to see which way is quicker and uses less memory....
Zooba :U
The little test dialog lets you make some bechmarking. I haven't implemented a memory usage benchmark, for that purpose I used windows' task manager performance tab.  :bg It is interesting indeed! The linked list consists of 7 dwords (3 are in LINKED_LIST struct, 4 are in the user data area). It is not much, but when 10 milliion members are inserted, the memory usage goes up by 360 MBs. Quick calculation: 10.000.000*7*4=280.000.000, that is 280 MB. This is not a correct calculation, so I'll build in a memory usage checking...
The linked array, allocating about the same size (Element size=28) consumes about 100MB less memory. Again this is a rough approximation... I think the reason can be in the number of heapalloc calls. The linked array uses far less memory blocks (displayed with "count of members" text). That is the main reason for the faster execution too.
Actually, for the linked array using the VirtualAlloc would could be the most optimal solution. It is far more faster, but allocates memory in pages.

Paul, Zooba! Thank you for your posts. I am looking forward for your further worthy comments!

Greets, Gábor

stanhebben

My attempt at creating linked arrays efficiently.

(note: in order to compile the files, you need to place kernel32.lib, windows.inc and kernel32.inc in that directory)

You can use lalist.inc for assembly programs, and lalist.h for C/C++ applications.

Timing results:
Add: 18
Iterate: 11
Iterate + remove: 36


I'll create a fast linked list and compare the results.

[attachment deleted by admin]

zooba

gabor,

Unfortunately I don't have RadASM so I can't compile your example. Could you maybe post an executable file?

Stan,

By themselves, those numbers aren't really useful. How many elements does your algorithm iterate in 11 cycles? Is it iterating and removing the first item in 36 or the last item in 36? Are you adding at the start or at the end? I'm sure I can dig through the code and find the answers, but I'd give your marketing team a kick and tell them to be more specific :wink :lol

Cheers,

Zooba :U

gabor

#10
Hello!

I modified the thread starter post because I replaced the attached zip. This message is to ensure that the thread gets the updated status. (Maybe this would be unnecessary...)
The features, I mentioned earlier will be implemented soon...
According to my measures, the heapalloc does not perform slow at all. It must be considered, that it handles memory block of different sizes. Another conclusion is, that Stan's fastalloc has a little "cheat" since it preallocates 1024 elements.
To "simulate" this with my app use the settings shown in this picture:http://ng210.freeweb.hu/linked.jpg

Greets, Gábor

stanhebben

Zooba,

The numbers are per call.

When iterating and removing, the code asks for the next element, then removes it, and repeats the procedure. (until the list is empty) Iterating through one element and removing it thus costs 36 cycles. One add operation takes 18 cycles.

sincerely,

Marketing & PR division :P

Gabor, as your code is so small, I wouldn't bother creating a dll. I rather find it useful if you have large pieces of code which you want to reuse in multiple applications.

zooba

Wow, and it doesn't even look like they're lying...  :P

Agree with Stan on the DLL - just stick with a static library. We all seem to prefer them that way, probably since everything will compile down into a single file and redistribution (if any of us get that far...) is much simpler.

I managed to get the EXE to compile (I had to remove the absolute paths from makeit.bat first) and did the testing you suggested:

Linked array
  insert time: 62
  count of members (with sentinels): 979
Size of consumed memory: 30MB
  remove time: 77


I'm guessing I have a slower computer than you... :(

Is that 'Size of consumed memory' field a calculation or did you ask the OS how much?

Cheers,

Zooba :U

PBrennick

I pretty much agree with Stan and zooba about the DLL thing. The only reason why I mentioned it all is because sice I found an empty DEF file it was apparent to me that you wanted to do that.

Personally, I put ANY reusable code into a lbrary AND a DLL as separate builds because some prefer a DLL and some prefer a library. I bumped into this fact with the GeneSys Project. Since I do not wish to impose my will upon others, the best solution was to provide both. If you download the lib and DLL portions of my project, you will see how to build a DLL and you will also see that the way I have designed them it is easily expanded upon. All you have to do is drop an ASM file in both directories and update the PROTO list in the INC file. For the DLL, you need to add any new functions to the EXPORT list in the DEF file. This may be overkill in terms of your project, but who knows what the future will hold for you and other others of your projects so portions of this paragraph are offtopic

Paul
The GeneSys Project is available from:
The Repository or My crappy website

Relvinian

This is probably getting a little off-topic for this post but still fits in line with what people are posting on...

As for .LIB vs .DLL compiling, I'm in the mindset of:  If you use the functions rarely; as in not many public applications for people to use; I see .LIB over .DLL because of compile-time function inclusion and any functions in the .LIB not used aren't included...But once you start using those functions a lot (as in the case of multiple EXEs) or many people start to use them, I see .DLLs over .LIBs because as more processes use a common code base, you actually have LESS code being loaded in memory of a machine.

And to expand on that a little, take a common C/C++ DLL that 90% of applications use:  MFC42.DLL and/or MFC71.DLL.  Same goes for any type of .DLL that is used frequently...If we as an ASM community can agree on common code, we could write a ASM DLL (a lot like MSVCRT.DLL, etc) which all ASM programmers could use (including updating old functions, etc).

The biggest drawback with .DLLs though are the different versions that could be out on the market...Something you won't have by including the code directly into your .EXE.

The two reasons I create .DLLs are:
1) Share code base between multiple EXE processes (so you only have one code base loaded --- saves memory).
2) True Dynamic Loading of a .DLL and using GetProcAddress to access the functions you need...Mostly seen as a plug-in .DLL to some application.

The hardest part about creating functions/variables, etc for use with .DLLs is the exporting what you need. You do not need a .DEF file if you dynamically load the .DLL (using GetProcAddress to get exported functions), but you do need a .DEF file if you want compile-time linking of a .DLL to your .EXE or .DLL file so it "auto-loads" during run time.  I think this is what gets most programmers who write code and favor .LIBs...They aren't really comfortable with how .DLLs are written and what is all involved.

Example function that is exported and will work for Dynamic Loading (LoadLibrary/GetProcAddress/FreeLibrary) but won't link with an EXE (if you don't define a .DEF file) when linking your .DLL file:

MyFunc proc stdcall export param1:dword
  ; code goes here
MyFunc endp


Most people by default don't put the 'export' on their function declarations that are really exported so GetProcAddress will fail on Dynamically Loading. To get around this, they use a .DEF file always which forces this function to be exported.

If the function isn't going to be exported from a .DLL, you can put the 'public' (which is the default, btw), and very few people use this but that's ok, which tells the compiler, that this function is available to all code within the .LIB or .EXE....If you put 'private', even less use this method, this function is available ONLY to the source file which the function is in. This also helps the compiler to know when generating symbol tables, which functions, etc will be exported, public, private, etc to fill in a "more correct" .LIB/.DLL, etc for future linking, etc.

Anyway, as far as using something like the link-list from a .DLL, as long as your functions are exported (and if you are using a .LIB, make sure you have a .DEF file with correct definitions), writing a .DLL isn't that complex. Instead of having a typical "start" "end start" for your EXE starting address, you define something else (usually a DllMain) as your entry point.  Just remember to always return a TRUE (1) when leaving that function as to signal the OS that the DLL is ok.

Those are my views and thoughts. :P

Relvinian