Where is the memory function "Malloc", I can't find it.
there are a few different types of allocation
global/local
virtual
heap
http://msdn.microsoft.com/en-us/library/aa366781(VS.85).aspx
If you are looking for COM task memory implemented through IMalloc, you can use CoTaskMemAlloc.
If you are looking for the CRT malloc and related functions, they are accessible with the MASM32 libraries by prefixing the function names with "crt_".
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
include \masm32\include\masm32rt.inc
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
invoke crt_malloc, 1000000
mov ebx, eax
print uhex$(ebx),"h",13,10
invoke crt__msize, ebx
print ustr$(eax),13,10
invoke crt_realloc, ebx, 2000000
mov ebx, eax
print uhex$(ebx),"h",13,10
invoke crt__msize, ebx
print ustr$(eax),13,10
invoke crt__expand, ebx, 1000000
print uhex$(eax),"h",13,10
print uhex$(ebx),"h",13,10
invoke crt__msize, ebx
print ustr$(eax),13,10,13,10
invoke crt_free, ebx
inkey "Press any key to exit..."
exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
I am sorry, I meant that I can't find it in "m32lib" folder, but it was in "masmlib.chm" help file.
I think the actual procedure is Alloc defined in \masm32\m32lib\alloc.asm, and the mating procedure to free the memory is Free defined in \masm32\m32lib\free.asm.
I agree with you. But, it's fault in the help file???
Thanks for everyone's help.
The Masm32 library offers two handy macros:
mov edi, alloc$(1024)
... do stuff ...
free$ edi
mov edi, halloc(1024)
... do stuff ...
hfree edi
Below Olly's disassembly. I would go for halloc/hfree rather than the OLE strings. By the way: Does anybody know what happens to OLE strings if a program exits abnormally? Heap allocated memory is no problem, it's cleared automatically...
0040101D |. 68 00040000 push 400
00401022 |. 6A 00 push 0
00401024 |. E8 1D010000 call <jmp.&oleaut32.SysAllocStringByteLen>
00401029 |. C600 00 mov byte ptr [eax], 0
0040102C |. 8BF8 mov edi, eax
0040102E |. 57 push edi
0040102F |. E8 18010000 call <jmp.&oleaut32.SysFreeString>
00401034 |. E8 E9000000 call <jmp.&kernel32.GetProcessHeap> ; [GetProcessHeap
00401039 |. 68 00040000 push 400 ; /HeapSize = 400 (1024.)
0040103E |. 6A 00 push 0 ; |Flags = 0
00401040 |. 50 push eax ; |hHeap
00401041 |. E8 E2000000 call <jmp.&kernel32.HeapAlloc> ; \HeapAlloc
00401046 |. 8BF8 mov edi, eax
00401048 |. E8 D5000000 call <jmp.&kernel32.GetProcessHeap> ; [GetProcessHeap
0040104D |. 57 push edi ; /pMemory
0040104E |. 6A 00 push 0 ; |Flags = 0
00401050 |. 50 push eax ; |hHeap
00401051 |. E8 D8000000 call <jmp.&kernel32.HeapFree> ; \HeapFree
There is also the alloc and free macros that use GlobalAlloc and GlobalFree. This is a quick test of the alignment for the allocated memory. Of the 6 methods tested, only Alloc maintains an alignment > 8.
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
include \masm32\include\masm32rt.inc
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
alignment MACRO ptr
xor eax, eax
mov ecx, ptr
bsf ecx, ecx ;; starting at bit0, find first set bit
jz @F
mov eax, 1
shl eax, cl
@@:
EXITM <eax>
ENDM
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
print "alloc$:",13,10
REPEAT 10
mov ebx, alloc$(16384)
print ustr$(alignment(ebx)),9
ENDM
print chr$(13,10)
print "alloc:",13,10
REPEAT 10
mov ebx, alloc(16384)
print ustr$(alignment(ebx)),9
ENDM
print chr$(13,10)
print "halloc:",13,10
REPEAT 10
mov ebx, halloc(16384)
print ustr$(alignment(ebx)),9
ENDM
print chr$(13,10)
print "Alloc:",13,10
REPEAT 10
invoke Alloc, 16384
mov ebx, eax
print ustr$(alignment(ebx)),9
ENDM
print chr$(13,10)
print "CoTaskMemAlloc:",13,10
REPEAT 10
invoke CoTaskMemAlloc, 16384
mov ebx, eax
print ustr$(alignment(ebx)),9
ENDM
print chr$(13,10)
print "crt_malloc:",13,10
REPEAT 10
invoke crt_malloc, 16384
mov ebx, eax
print ustr$(alignment(ebx)),9
ENDM
print chr$(13,10)
inkey "Press any key to exit..."
exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
Results under Windows 2000:
alloc$:
4 4 4 4 4 4 4 4 4 4
alloc:
16 8 32 8 16 8 128 8 16 8
halloc:
32 8 16 8 64 8 16 8 32 8
Alloc:
32 32 32 32 32 32 32 32 32 32
CoTaskMemAlloc:
16 8 256 8 16 8 32 8 16 8
crt_malloc:
32 8 8 16 8 32 8 16 8 128
Same under Win XP, SP2. Ever heard of "OLE Chicken (http://blogs.msdn.com/oldnewthing/archive/2004/07/05/173226.aspx)"?
And I still like to know what happens if during testing of my apps they repeatedly crash with a hundred megs allocated... ::)
alloc$:
4 4 4 4 4 4 4 4 4 4
alloc:
8 16 8 32 8 16 8 64 8 16
halloc:
8 32 8 16 8 512 8 16 8 32
Alloc:
32 32 32 32 32 32 32 32 32 32
CoTaskMemAlloc:
8 16 8 64 8 16 8 32 8 16
crt_malloc:
32 8 8 16 8 32 8 16 8 128
For the SSE2 fans, here is a wrapper that uses HeapAlloc and aligns on a 16-byte boundary:
mov ebx, Alloc16$(16384)
... do stuff ...
Free16 ebx
Alloc16$ MACRO ct
push ct
call jallocP
EXITM <eax>
ENDM
Free16 MACRO ptr
push ptr
call jallocP0
ENDM
.code
jallocP proc ; arg: bytecount
pop eax ; trash return address
xchg eax, [esp] ; exchange counter with return address
add eax, 16 ; we need 8 or 16 extra bytes
push eax ; the byte count
LET_IT_CRASH = 0
if LET_IT_CRASH
push HEAP_GENERATE_EXCEPTIONS
else
push 0
endif
invoke GetProcessHeap
push eax
call HeapAlloc ; mov eax, rv(HeapAlloc,rv(GetProcessHeap),0,bytecount)
push 8
pop ecx ; create some space for the alignment flag
test al, 15 ; original pointer aligned 16?
jne @F
add ecx, ecx ; yes: add 16, no: add 8
@@: add eax, ecx
mov [eax-4], ecx ; set the flag
ret 0 ; we popped one value, and there is no stack frame
jallocP endp
jallocP0 proc ; arg: pointer
pop eax ; trash return address
xchg eax, [esp] ; exchange ptr with return address
sub eax, [eax-4] ; subtract flag
push eax ; original pointer
push 0
invoke GetProcessHeap
push eax
call HeapFree ; invoke HeapFree,rv(GetProcessHeap),0,memory
ret 0 ; we popped one value, and there is no stack frame
jallocP0 endp
.const
EDIT: I shortened the proc a bit - now 64 bytes exactly :bg
EDIT(2): Down to 58 bytes for the proc, and 10+6 bytes for the alloc+free calls, as compared to 18+14 for a halloc/hfree pair. For comparison: BloatOS (XP, SP2) needs 1,500,000,000 bytes on disk, and apparently that was not enough to provide an API that takes account of the fact that SSE2 (on the market since 2001) needs 16-byte boundaries.
What is really surprising is that invoke crt_malloc, 16384 returns 8-byte alignment only. I checked with Olly, it does call malloc. The latter, according to MSDN (//http://), "is required to return memory on a 16-byte boundary". The full MSDN malloc page (http://msdn.microsoft.com/en-us/library/6ewkz86d%28VS.80%29.aspx) says cryptically "The storage space pointed to by the return value is guaranteed to be suitably aligned for storage of any type of object". Well, any except memory needed for SSE2... :tdown
I think the "suitably aligned for storage of any type of object" statement far predates SSE2.
i was thinking that requests should always be page-sized, as well
i.e. if you need 1009 bytes, request 1024
that way, issued block sizes are always on 16-byte boundries
i don't know if that would cause them to start on 16-byte boundries, as well
in DOS, each heap allocation was aligned to 16-bytes and was preceeded in memory by a 16-byte "heap allocation header"
but i think that was done because of x86 real-mode segmented architecture
Quote from: dedndave on August 06, 2009, 01:42:10 PM
in DOS, each heap allocation was aligned to 16-bytes and was preceeded in memory by a 16-byte "heap allocation header"
but i think that was done because of x86 real-mode segmented architecture
Correct, the segment register addressing scheme gave you
16 byte granularity in segment placement. One could have done
something else, but that would have bought almost nothing.
Cheers,
Steve
Quote from: dedndave on August 06, 2009, 01:42:10 PM
i was thinking that requests should always be page-sized, as well
i.e. if you need 1009 bytes, request 1024
that way, issued block sizes are always on 16-byte boundries
i
movaps and company don't care about size, they crash on unaligned addresses
Quote
i don't know if that would cause them to start on 16-byte boundries, as well
Apparently not - with the exception of Alloc aka CoGetMalloc, they all return 8-byte alignment and worse. That's why I wrote the wrapper for HeapAlloc.
i had started writing one
i figured the starting address would already be 16-aligned
but, i will make it that way
the PROC would perform a few tasks for you
in my (limited) experience using allocation, i found myself continually
1) getting and saving the heap handle
2) saving the address to a location for use later
3) i was going to write an error routine to attempt compact - (this is what prompted me to putting the whole thing in one proc)
so, the game-plan was to have a structure like this:
MEMCTRL STRUCT
lpMemAddr dd ?
dwMemSize dd ?
dwMemFlag dd ?
MEMCTRL ENDS
each allocated block has a seperate structure, although the structure may be re-used once the block has been released
the routine gets the process heap handle for you, although, i was thinking of allowing that to be a value in the structure, as well
(-1 = use process heap - otherwise = created heap handle)
to allocate a block, set lpMemAddr to 0, set dwMemSize to the requested size, set the flags, if desired
invoke the function with a pointer to the structure
it returns the block address in lpMemAddr (so you don't have to save it)
it returns the actual allocated size in dwMemSize (16-byte justified)
the values are also returned in eax and edx for convenience (ecx preserved)
to release the allocated block, just invoke the function again with the pointer to the structure
because lpMemAddr is non-zero, the routine knows it is a "free" function, rather than an "allocate" function
it returns with both lpMemAddr and dwMemSize set to 0
the next time you want to use the structure, the lpMemAddr is already 0, the flags are already set (unless you want to change them)
you just need to set the desired size and invoke with the structure pointer
if the HeapAlloc function fails, the routine attempts a HeapCompact and tries again
if it fails the second time, the allocated addr and size is set to 0 (eax also) and GetLastError output is returned in edx
the flags are masked with 0Dh for allocating a block and 1 for freeing a block
i guess i will add another variable to the structure to store the assigned block address (dwReserved)
then, the lpMemAddr variable can be 16-byte aligned
when it comes time to release the block, the routine can use the second address value to free it
Is it insufficient to write align 4/8/16 etc.. before the HeapAlloc call?
Anyone any links to more information on this? I need to learn a lot more about memory.
Best regards,
Astro.
Quote from: Astro on August 09, 2009, 11:28:17 PM
Is it insufficient to write align 4/8/16 etc.. before the HeapAlloc call?
align is for positioning a pointer in the .data and .data? segments (by inserting nops etc), and has nothing to do with HeapAlloc.
Masm32 has memalign, but caution: this aligns the pointer but you must keep a copy of the original one for the
free call.
You might check my algo above, it works fine.
noone ever likes my ideas :(
i must suck really bad at this stuff
all that typing for nothing
Quote from: dedndave on August 09, 2009, 11:36:32 PM
noone ever likes my ideas :(
i must suck really bad at this stuff
all that typing for nothing
I read it...I just didn't understand it...
Definitely have more research to do.
Quotealign is for positioning a pointer in the .data and .data? segments (by inserting nops etc), and has nothing to do with HeapAlloc.
OK! I know precisely NOTHING on this subject. :tdown
Best regards,
Astro.
.data
db 12
dw 3 ;the address for this word will be odd
;words are best aligned by 2
;dwords are best aligned by 4
;qwords are best aligned by 8
;in many cases, data caches more smoothly if it is aligned by 16
;with some mmx/sse instructions, the data MUST be 16-aligned
align 16
dd 128 dup (?) ;we know the address of this array will be evenly divisible by 16
OK...
All aligned correctly:
align 4
Item1 DWORD ?
Item2 DWORD ?
Item3 WORD ?
Item4 BYTE 2 DUP (?)
...or is the BYTE mis-aligned because it doesn't START aligned??
Mis-aligned:
align 4
Item1 DWORD ?
Item2 DWORD ?
Item3 WORD ?
Item4 BYTE 4 DUP (?) ; this is mis-aligned
Quote;we know the address of this array will be evenly divisible by 16
Would that still be true if the *length* was NOT divisible by 16?
e.g.:
align 16
ByteArray BYTE 70 DUP (?) ; is this still aligned?
Best regards,
Astro.
byte arrays or strings don't usually care much about being aligned
you are going to access a byte on an odd address, no matter how they are set up
also, you can figure the first item declared in a segment is going to be 16-aligned (i think - lol)
i try to put dwords first, then words, then bytes last
then i don't use align at all
Thanks to something I read early on in this forum, I sort them in that order, too.
If you have the case:
.data?
;auto-aligned 16 here
Item1 DWORD ?
Item2 DWORD ?
Item3 BYTE 11 DUP (?) ;odd size
align 4 (16?) ;necessary?
Item4 BYTE 23 DUP (?) ;odd size
align 4 (16?) ;necessary?
Item5 BYTE 8 DUP (?)
is the align between each BYTE declaration optimal, or a waste of time? What would be best alignment assuming it was necessary at all?
Best regards,
Astro.
waste of time - they are byte declarations
there are exceptions to this, as you may want to search that array with a scasw instruction or something
but, generally, it won't matter
if you are worried, just place item3 and item4 after the even-length item5
OK!
Best regards,
Astro.