arrfile$ macro causing my program to crash with files with odd numbers of lines

Started by bhy56, August 02, 2011, 06:06:32 PM

Previous topic - Next topic

ToutEnMasm


An write outside the allocate memory is more current than  a defaut of the OLE memory.
Why use it ?
A HeapAlloc is perfect here.There is no need of shared memory (shared between further processes) and no need of OLE.


hutch--

OLE string memory in either the unicode or ansi forms are designed for high count small allocations and also have their length stored below the start address. HeapAlloc() and most of the other strategies have serious fragmentation problems while VirtualAlloc() is not well suited for large count small allocations due to granularity problems. The pointer array is in GlobalAlloc() fixed memory as you get linear address but where every new or changed string involves a de-allocation then re-allocation, OLE is the correct strategy to use.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

very interesting, Hutch
it sounds ideal for something like a text editor, where individual lines or groups of lines grow, shrink and get moved around or deleted
just hafta get past the BaSTRd thing   :P

ToutEnMasm


Quote
HeapAlloc() and most of the other strategies have serious fragmentation problems
Only if size of the buffers are in bytes,round of the size to 16,32 give better performances.
Quote
but where every new or changed string involves a de-allocation then re-allocation, OLE is the correct strategy to use.
HeapReAlloc made this very well.Here it is just a question of time, What is the faster ?

jj2007

HeapAlloc is always faster, ToutEnMasm. By the way, folks, you realise that this is all old stuff - see Is there any reason why this code should fail?  :bg

hutch--

HeapAlloc() may be faster but that will not help you with the fragmentation problems when you have 5 million strings of variable length. The cost of aligning large counts to an arbitrary alignment starts to become very high in terms of memory usage. Remember that with a dynamic array design of this type you are performing an allocation for every string that contains content in the array.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

bhy56

Quote from: hutch-- on August 05, 2011, 05:33:17 AM
bhy56,

Just try the last algo I posted directly in your source code, you will have to manually invoke the procedure and MOV eax to the same return value but this is the easiest way to test it and see if it does the job. I might also say thanks for finding such an obscure bug, when I wrote this system about 4 years ago I tested the hell out of it but never found the bug you reported.

Let us know how it turns out.

Thanks for writing these and thanks for your help. I tried to set up a test with arrfile as a local function but do not seem to be getting it right, because now it will not work at all. I am not that good at assembly (yet) and think I must be calling the function wrong. Any help would be appreciated. Thanks

code attached



ToutEnMasm

Quote
HeapAlloc() may be faster but that will not help you with the fragmentation problems when you have 5 million strings of variable length. The cost of aligning large counts to an arbitrary alignment starts to become very high in terms of memory usage
I have study a little the code in the arr*.asm.EACH line of the text files is put in a separated buffer.This is a disaster in terms of time.As say microsoft , allocating and desallocating memory is slow.
If you test your proc on a text of 5 million strings ,only one things is granted.You have got here a good pass time.
It's for that than the sample i have posted avoid a too much number of memory access.


dedndave

bhy56,
i have looked at your "t01.asm" file

first, at the beginning of the file, you will want a prototype for arrfile2...
arrfile2 proto :DWORD
now, you may use INVOKE to call it

now, to call it, you have passed the address of a pointer to the filename
just pass the address of the filename string
.data
    m_inFile BYTE "test.txt",0
    m_inArray      DWORD   0       
.code

start:
    print chr$("Test output one.",13,10)
    lea EBX, m_inArray
    push EBX
    call arrfile2
    pop EBX
    mov m_inArray,EAX
    print chr$("Test output two.",13,10)
    exit

also, there is no need to balance the stack after the call
that is because the StdCall convention is used

so, try this...
include \masm32\include\masm32rt.inc

arrfile2 PROTO :DWORD
get_line_count PROTO :DWORD,:DWORD

.data
m_inFile db "test.txt",0
m_inArray dd 0

.code

start:
    print   chr$("Test output one.",13,10)
    INVOKE  arrfile2,offset m_inFile
    mov     m_inArray,eax
    print   uhex$(eax),13,10
    arrfree$ m_inArray
    print   chr$("Test output two.",13,10)
    exit


i have added the arrfree$ to deallocate the array before exit
if you don't free allocated memory, you may have a "memory leak"

hutch--

Yves,

With a design that handles variable length strings, you have no other choice than to perform an allocation for every string that contains data. Now this can be from a couple of bytes to many hundreds of megabytes. Fixed arrays are simple and much faster and you can allocate the pointers and the data slots in one allocation but it either must use massive amounts of memory where each slot is at least large enough to hold the longest string or you must limit the string length to preserve memory.

This is why you also have a variable length string array that will handle far larger string arrays than a fixed array can handle. The price is it is a lot slower when allocated and deallocated but it can do what a fixed array cannot do.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

ToutEnMasm


Quote
With a design that handles variable length strings, you have no other choice than to perform an allocation for every string that contains data.
Two soluces to do that.
*** Made an array with the pointers and the lenght of the variables (the example I have posted do that)
       data are not re-copied.This method is usable for Textfiles. The translatorr use that and all my other applies also.
*** copy the variables in a single buffer (grow dynamically by increment) and made at the same time the same array as upper.
       This method is usefull when adding variables one by one.
You need just  a QWORD to find the adress and the size of the variable.


jj2007

Just for fun, here a little testbed comparing arrfile$ and Recall on a) 73*windows.inc and b) 7*a file composed of all includes, with about 9 MB, which implies it does not fit into the cache. Results are pretty similar for both cases. String #2 is printed in all cases to check if the code works.

Create the alltest.inc from the command line of \masm32\include with copy *.inc alltest.inc

Testing arrfile$ on Windows.inc
      WINDOWS.INC for 32 bit MASM (Version 1.4c RELEASE April 2008)

1984 milliseconds for arrfile$

Testing Recall on Windows.inc
      WINDOWS.INC for 32 bit MASM (Version 1.4c RELEASE April 2008)

235 milliseconds for Recall

Testing arrfile$ on alltest.inc
  ; ===========================================

2000 milliseconds for arrfile$

Testing Recall on alltest.inc
  ; ===========================================

296 milliseconds for Recall


I like HeapAlloc. It is fast :bg