News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Dynamic Memmory Allocation (ASM version of Malloc)

Started by Infro_X, September 23, 2005, 04:58:36 PM

Previous topic - Next topic

Infro_X

I was wondering if anyone had any better ideas for my solution. (I'm not currently very fond of my solution, but its the only one that I've come up with, within the asspect of my resources).

My Dll Currently works as follows.

DllEntry, ProcessDetach releases all the allocated memmory when the last process detachs.
Exported function i_alloc takes 1 variable, size.
Size is compared to a set of "predefined" sizes, once one that is big enough is found, size is changed to said predefined size.
Predefined sizes can vary depending on wether I decided to make the user who loads the dll capable of deciding the SHIFTSIZE, MAXSHIFTS, and MINSIZE. (All together Very dynamic).
hMem is the only data variable allocated by the windows loader currently.
hMem is a pointer to a peice of allocated memmory the size of MAXSHIFTS*4 (Min of 4k)
hMem is an array of pointers to linked lists, each virtually allocated When the current i_alloc is called and a bin is full/non-existent

Part I don't feel is very well done:
[hMem+shifts*4]=Pointer to a linked list, the size is BLOCKSIZE or a multipule of it (Depending on how big the size is.)
[[hMem+shifts*4]+counter] while 1<= counter <= BLOCKSIZE/(SIZE*8)      This is a table of bits. Each bit is a pointer a corrospondent of the state of that peice of memmory. EG. If that peice has been allocated or not.

Example:

;code, determining number of left shifts to get big enough size
mov ecx,[shifts] ;Determined based off size
mov edi,[hMem]
mov edi,[edi+ecx*4]

;more code, determining counter and BLOCKSIZE of this block
xor ecx,ecx

FIRSTLOOP: ;Find 0 Bit
add ecx,1
mov eax,[edi+ecx]
cmp ecx,[counter]
jz NEXTBLOCK:
xor eax,-1
jz FIRSTLOOP
xor ebx,ebx

SECONDLOOP: ;Find 0 Bit
shr eax,1
jc BLANKSPOTFOUND
add ebx,1
jmp SECONDLOOP

BLANKSPOTFOUND:
xor eax,-1
bts eax,ebx
mov [edi+ecx*4],eax
sub ecx,1 ;(Remeber, we started at +1)
shl ecx,5 ;ecx*32
shl ecx,[shifts]
shl ebx,[shifts]
add ecx,[counter]
shl ebx,[MINSIZE]
shl ecx,[MINSIZE]
add ecx,ebx
lea eax,[edi+ecx]
ret
;BTW this code is all untested, made all for this post

NEXTBLOCK:
cmp D[edi],0
jz NEWBLOCK:
mov edi,[edi]
jmp FIRSTLOOP

NEWBLOCK:
invoke VirtualAlloc,BLOCKSIZE,yada,yada,yada
mov [edi],eax
mov edi,[edi]
jmp FIRSTLOOP


returned with eax=a pointer to an allocated peice of memmory, with [size] size, at [ [hMem+shifts*4] + counter + number_of_bits_to_current_bit*[size] ]

and for i_Unalloc,pMem    well that just clears said bit, whereever it may be. ;), and if all the bits for 1<=counter<=BLOCKSIZE/(SIZE*8),, are 0, deallocate block and addjust all pointers. (adjust the pointers first)

I just don't like the way it works, it is efficient and accurate

Tedd

Research "the buddy system" - see if it suits you any better :wink
No snowflake in an avalanche feels responsible.

Codewarp

Infro_X,

It would be a good idea for you to fully evaluate the way your allocator will be used and why you are writing one.  Repeated allocation and release without coalesing will slowly fragment, degrade then fill up your allocator (read Knuth).  Also, if you are doing this for speed reasons, note that the multi-threaded malloc( ) wastes 80% of the cycles it uses :eek, just in its critical section.  The single-threaded malloc( ) doesn't have the critical section, so it runs 5x faster.  If you are using your allocator in a multi-threaded application, you must either restrict its use to a single-thread, or provide a fully synchronized implementation.

As a point of comparison, I wrote a buddy-system allocator in pure C++ (no assembler) that runs 3x faster than single-threaded malloc( ).  However when restricted to one thread in a multi-threaded environment, it runs 15x faster than multi-threaded malloc( ).  Be aware that writing memory allocators faster than the system, is possible, but you better be ready for the deep water...

Randall Hyde

Quote from: Infro_X on September 23, 2005, 04:58:36 PM
I was wondering if anyone had any better ideas for my solution. (I'm not currently very fond of my solution, but its the only one that I've come up with, within the asspect of my resources).

You might want to take a look at the scheme that the HLA mem.alloc code uses (Win32 and Linux). You can find the source code as part of the HLA stdlib sources download at
http://webster.cs.ucr.edu/AsmTools/HLA/dnld.html

It uses a linked list implementation that looks to be a bit more efficient than the bit-map scheme you're using.
Cheers,
Randy Hyde