News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Graph algorithms

Started by gabor, March 21, 2006, 08:35:30 AM

Previous topic - Next topic

gabor

Hi everyone!

I've decided to create a library for graph handling. I am not sure whether I use the right term, so a really short description:
I call a structure a graph, that is built by verteces and edges between the verteces. A tree is a special graph.

I though before implementing anything I share my idea with you and ask for you opinions!

My biggest problem is the storing of the edges. I'd like to create generic, scalable methods, this means they should work fine for little and huge graph as well.
My concept is give the opportunity to add/remove verteces and edges dynamically and these operations affect the less data the smallest part of the graph as possible.

I designed 4 methods for storing:
1. Very lame and primitive array storage. I allocate new memory every time a new edge is inserted. The edge array of the connected verteces is extended. This means, those arrays must be copied over into the new, one element bigger array. This is very slow, but does not waste a sinlge byte of memory.

2. A bit smarter array storage. I allocate an array of a give size. When the number of edges are greater than the array could store the array is extended by the given number of entries. The main difference between the previous solution is the number of array entries handled at a time. There it was only 1, here it is a given value. This solution gives faster performance, because to allovate a new array, to copy over the old one is not needed so often. On the other hand, this solution wastes memory and leads to very difficult issues about memory defragmentation. Example: the given value n=3. The basic unit of allocating edge arrays is 3. When the 1st, the 4th, the 7th... edge is added to a vertex a new array is allocated, etc. Verteces with 1 or 2 edges store unnecessariely 2 or 1 array entries.

3. Linked list storage. I allocate every edge as a separete object and put them into a linked list of a vertex. This solution is slow, since instead a simple indexing I must follow the links of the linked lists. True, it does not use unnecessary memory, but the linking gives an overhead for both memory and time.

4. The last way is a hybrid of arrays and linked lists (solution 2 and 3 united). I allocate an array of given entries. If the array proves to be full and needs to be extended, I allocate another array of given size and link it to the first array. I guess it is easy to understand the benefits of this solution. If the average number of edges a vertex can have can be determined using this number as the array entry limit should give a pretty good overall perforamnce for huge graphs, since the average access of edges will be a simple array entry read. On the other hand if a vertex has lots of edges it won't waste that much memory.

Please give me feedback on these ideas or even better give me new ideas! My point is: the less memory access the better.
Thanks!

Greets, Gábor

zooba

I've used option 4 before. I almost finished a library implementation but haven't touched it in a while. I found it wasn't hard to implement and was quite effective.

However, if you will be adding items at places other than the end, it's a bit harder to deal with and I haven't attempted that  ::) :P

Cheers,

Zooba :U

gabor

Hello friends!

About the topic representing, storing a graph I made some progress. I'd like to share my final conclusion with you hoping that I get some feedback.
I investigated 3 different graph storing method: arrays, linked lists, link lists of array (linked array).
The linked array seem to be the best, taking the advantages of the other two. Using this data model makes it possible to add and remove verteces and edges as you like, it is fully dynamic. The drawback is the weaker efficiency, performance.
My opinion is, that there are tasks which demand dynamic methods, so this solution won't end up in the trash after all :)

Other though would be the static way, that is before building up the graph I collect information:
- number of verteces
- links at every vertex
This kind of approach needs first an information gathering phase (which might be fully dynamic...), after that the graph can be constructed vertex by vertex using exactly as many space as it needs.

Now the question is, which method is better?
1. Dynamic
Pro
- the graph can be modified freely at any time
- does not need any information before construction
Contra
- allocates more space than necessary
- slower (because the indirect access in the linked lists)

2. Static:
Pro
- fast (an item in an array can be accessed directly)
- allocates exactly as many space as necessary
Contra
- modifying the graph is problematic (a change can lead to total reconstruction)

I'd like to create a data model that is able to handle even a million verteces. With such huge quantity it is important how quick the routines are and how much space they eat. If a vertex is 64 byte small only, a graph of 1 million verteces would take 64MB of memory, and there are the edges too that can be at least as many as the verteces.

What do you think of these ideas and concerns?

Greets, Gábor

gabor

Hello!

I thought I share my results with you:
- I created a lib for basic graph functionality
- I included a small test application

Basic functionality is
- creating a graph object
- adding/removing vertices
- adding/removing edges
- method for determining whether 2 vertices are adjacent

Still to come:
- main graph algos, like DFS, BFS
- whatever comes into my mind

Please feel free to test and comment my work!

Greets, Gábor

[attachment deleted by admin]