News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Reading a file - Dynamic size

Started by Ic3D4ne, January 26, 2005, 06:06:07 PM

Previous topic - Next topic

Ic3D4ne

If I were to read a file into a buffer, without knowing its file size before coding/assembling the application, how would I go about dynamicly allocate space for the buffer?

And also, can someone explain the argument lpNumberOfBytesRead in the ReadFile function for me?

BOOL ReadFile(

    HANDLE hFile, // handle of file to read
    LPVOID lpBuffer, // address of buffer that receives data 
    DWORD nNumberOfBytesToRead, // number of bytes to read
    LPDWORD lpNumberOfBytesRead, // address of number of bytes read
    LPOVERLAPPED lpOverlapped // address of structure for data
   );


Help greatly appreciated.

-Ic3D4ne

Relvinian

#1
Quote from: Ic3D4ne on January 26, 2005, 06:06:07 PM
If I were to read a file into a buffer, without knowing its file size before coding/assembling the application, how would I go about dynamicly allocate space for the buffer?

And also, can someone explain the argument lpNumberOfBytesRead in the ReadFile function for me?

BOOL ReadFile(

    HANDLE hFile, // handle of file to read
    LPVOID lpBuffer, // address of buffer that receives data 
    DWORD nNumberOfBytesToRead, // number of bytes to read
    LPDWORD lpNumberOfBytesRead, // address of number of bytes read
    LPOVERLAPPED lpOverlapped // address of structure for data
   );


Help greatly appreciated.

-Ic3D4ne

Here is a very simple function to open a existing file on your hardrive, find out the size, dynamically allocate a buffer, read the file contents in to the buffer and close the file handle.

I won't provide a lot of error checking for this sample but will try to comment the code well. Also, I haven't included any includes necessary to get the correct defines being used by CreateFile, etc so you'll need to do that to use this function into your own application.

NOTE:
I always wrap [] around my variable names. Some people do and some don't. It is personal preference with MASM but some assemblers do care if you put [] around your variables.

Function returns one of two different possibilities:
    EAX = NULL    is some sort of error
    EAX = address of buffer otherwise
   

LoadFile PROC PUBLIC FileSpec:DWORD
   LOCAL fileHandle  : HANDLE   ; handle to opened file
   LOCAL fileSizeLow : DWORD   ; size of file (low dword value)
   LOCAL fileSizeHigh : DWORD  ; size of file (high dword value)
   LOCAL bytesRead : DWORD   ; need a variable to hold the amount of bytes read from file
   LOCAL fileBuffer   : DWORD   ; a pointer to a dynamically allocated buffer (memory location)
 
   ; try and open the file specified by FileSpec parameter
   invoke CreateFile, [FileSpec], GENERIC_READ, FILE_SHARE_READ or FILE_SHARE_WRITE, NULL, \
        OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL or FILE_FLAG_SEQUENTIAL_SCAN,  NULL
   
   ; make sure we at least opened the file
   cmp eax, INVALID_HANDLE_VALUE
   mov [fileHandle], eax
   je FailedOpen
   
   ; now that the file has been successfully opened, figure out the size of the file.
   invoke GetFileSize, [fileHandle], NULL, addr fileSizeHigh
   mov [fileSizeLow], eax

   ; NOTE: this example assumes the file size will be less then two gigabytes so we are only going to use
   ; the fileSizeLow value and ignore the fileSizeHigh value

   ; allocate a buffer to hold the file contents into. There are *many* different allocation routines to use
   ; but two of the most common for dynamic memory and files are HeapAlloc and VirtualAlloc. In this
   ; example, we are using VirtualAlloc to create our buffer
   invoke VirtualAlloc, NULL, [fileSizeLow], MEM_COMMIT, PAGE_READWRITE
   mov [fileBuffer], eax

   ; at this point, we are ready to read the conents of the file into the newly allocated buffer
   invoke ReadFile, [fileHandle], addr [fileBuffer], [fileSizeLow], addr bytesRead, NULL

   ; EAX will be either a 0 (failure) or a 1 (success).
   ; you can see how many bytes were actually read from the file by checking 'bytesRead' variable.
   ; unless there was some type of file corruption, the 'fileSizeLow' should equal 'bytesRead'. This
   ; is a way to check and make sure you read as much as you where expecting.

   ; other uses for 'bytesRead' can be to read in "chuncks" at a time from a file. Say you allocate a 64k
   ; memory block to read the file into and process that segment before reading another. With this
   ; way, you would check to bytesRead to see how many bytes were read (in case they were less
   ; then the 64k block size you specified).

   ; close the file handle
   invoke CloseFile, [fileHandle]
   
   ; at this point, you have succesfully opened, read the contents and close the file.
   ; return the pointer to the buffer.
   mov eax, [fileBuffer]
   jmp Done

FailedOpen:
   ; return a NULL value in EAx
   xor eax, eax

Done:
   ret
LoadFile ENDP


Hope this helps you.

Relvinian

Ic3D4ne

Wow, thanks a lot.

Heh, that really, really, really helped me.

Just a 3 line answer would have sufficed, but that's good too. ;)

Just one more thing, could you explain what the difference is between using [] wraps, and not using them?

-Ic3D4ne

Relvinian

Quote from: Ic3D4ne on January 26, 2005, 09:03:55 PM
Wow, thanks a lot.

Heh, that really, really, really helped me.

Just a 3 line answer would have sufficed, but that's good too. ;)

Just one more thing, could you explain what the difference is between using [] wraps, and not using them?

-Ic3D4ne

[] wrapping around variables is a way to tell the compiler that you want the contents of what is [] points to or to use the address of the variable.

Example:

mov eax, [fileHandle]
mov eax, fileHandle


In MASM, the two statement listed above are the same thing.  MASM doesn't distinguish be ADDRESS or the CONTENTS of a variable. It always uses CONTENTS unless you specifiy the ADDR keyword first. So the above two lines of code are saying:

Move the contents of variable 'fileHandle' into the EAX register. fileHandle has some value as '0x00000fdc'.  The address of fileHandle might be something like 0x0f23561f.


In some other assemblers, the above code could produce different results based on assembler defintion of what [] is.

Relvinian

Titan

Yes, thanks for posting that Relvinian.  So if I wanted to use "LoadFile" to check if a file exists... I simply run LoadFile, and if eax is NULL then the file doesn't exist?

Relvinian

Quote from: Titan on January 26, 2005, 09:51:58 PM
Yes, thanks for posting that Relvinian.  So if I wanted to use "LoadFile" to check if a file exists... I simply run LoadFile, and if eax is NULL then the file doesn't exist?

You could do that but it would be *VERY* inefficient because if the file does exist, you are loading it into memory. You would also have to worry about freeing the buffer if it did load it.

A better solution to check and see if a file exists on the hard drive would be:

   EAX = 0  ( file doesn't exist )
   EAX = 1  ( file does exist )

DoesFileExist PROC PUBLIC FileSpec: dword
    LOCAL  findInfo : WIN32_FIND_DATA <>
   
    ; check to see if the file exist on the hard drive by simple using a single API call.
    invoke FindFirstFile, [FileSpec], addr [findInfo]
   
    ; check to return value of FindFirstFile.  If the value is INVALID_HANDLE_VALUE,
    ; the file didn't exist, otherwise it did.
    cmp eax, INVALID_HANDLE_VALUE
    je NotFound

    ; since the file was found, we need to close the handle
    invoke FindClose, eax
    mov eax, 1
    jmp Done

NotFound:
    xor eax, eax

Done:
    ret
DoesFileExist ENDP


This is in my opinion the best way to check the existence of a file to see if it exists or not

Relvinian

Titan

Thank you Relvinian. :U  I had to do some tweaking for my assembler to like that code, but it works... and it was just what I've been looking for. ;)  Thanks again for sharing your work. :)

Relvinian

Titan,

No problem there. Glad the example function helped you out.

Relvinian

Nilrem

I adapted my code (ReadFile thread, recently posted), however instead of displaying what was there, it now displays what was in the text file plus some weird characters that look like webdings. Here is my code, might have missed something:

.data

OpenFileError DB " Could not be found.",0
CurrentFile DB ".\Resource\Read.txt",0
FileRead DB "File was read.",0
;dwNumRead DW ?
;lpstring DW 128
       
.data?

Numb DD ?
FileSize DD ?
hFile HANDLE ?

LOCAL lpstring[128]:DWORD;Used for input()
LOCAL buffer[128]:BYTE; Used for putting strings together
LOCAL buffer2:DWORD; A pointer to a dynamically allocated buffer (memory location)
LOCAL fileSizeLow:DWORD; size of file (low dword value)
    LOCAL fileSizeHigh:DWORD; size of file (high dword value)
   
invoke CreateFile, ADDR CurrentFile, GENERIC_READ, FILE_SHARE_READ,
        NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL
       
        mov [hFile], eax; Move handle value to handle variable
       
        invoke GetLastError; Get the error value
        xor ebx, ebx; Set ebx to 0
        mov ebx, eax; Move the value of GetLastError to ebx
        .if ebx != 0; If the file does not exist
        strcat ADDR [buffer], ADDR [CurrentFile], ADDR [OpenFileError]; Put the strings together
        invoke StdOut, addr [buffer]; Print the new combined string to the screen
        mov lpstring, input(); Wait for user input
        ret
    .endif
       
    ;Now that the file has been successfully opened, figure out the size of the file.
    invoke GetFileSize, [hFile], addr fileSizeHigh
    mov [fileSizeLow], eax
    ; allocate a buffer to hold the file contents into. There are *many* different allocation routines to use
    ; but two of the most common for dynamic memory and files are HeapAlloc and VirtualAlloc. In this
    ; example, we are using VirtualAlloc to create our buffer
    invoke VirtualAlloc, NULL, [fileSizeLow], MEM_COMMIT, PAGE_READWRITE
    mov [buffer2], eax
   
    invoke ReadFile, [hFile], ADDR [buffer2], [fileSizeLow], ADDR [Numb], NULL; Read the file
    invoke StdOut, ADDR [buffer2]
    invoke CloseHandle, ADDR [hFile]; Close the file
    mov lpstring, input (); Wait for user input
        ret

Relvinian

#9
Nilrem,

After your ReadFile line and before your StdOut line, you need to make sure your buffer is NULL terminator before sending it to StdOut.

An easy way to do this in your code is allocate at least ONE byte more then you read. Since you are using VirtualAlloc to allocate your buffer, the buffer gets zeroed out when created so you won't have to worry about adding the NULL at the end at long as you allocate more then you read.

Also, since you are declaring a pointer to a buffer on your stack and allocating that buffer with VirtualAlloc, you need to make sure you free the buffer before your function ends or you'll have major memory leaks.  Just before your RET statment, call VirtualFree, [buffer2], 0, MEM_RELEASE.

Relvinian

PS - In my example of reading a file contents, I had a bug in the ReadFile statement. The correct syntax should be:

invoke ReadFile, [fileHandle], [fileBuffer], [fileSizeLow], addr bytesRead, NULL


Note that I removed the 'addr' in front of the [fileBuffer] because we dynamically allocated it.

Nilrem

#10
Are you sure because once I do that I get a blank output screen. Also when allocating +1 byte like you said, could I do this (asking because I still get a blank screen):
invoke ReadFile, [hFile], addr [buffer2], [fileSizeLow+1], ADDR Numb, NULL; Read the file
Thanks once again 8-)

Actually I thought that looked untidy so I did this:


invoke GetFileSize, [hFile], addr fileSizeHigh
    xor edx,edx
    mov edx,1
    add eax,edx
    mov [fileSizeLow], eax
....
invoke ReadFile, [hFile], [buffer2], [fileSizeLow], ADDR Numb, NULL; Read the file
    invoke StdOut, ADDR [buffer2]
    invoke CloseHandle, ADDR [hFile]; Close the file
    mov lpstring, input (); Wait for user input
    invoke VirtualFree, [buffer2], 0, MEM_RELEASE
        ret
but still a black screen, if I do addr [buffer2] it works but with extra characters.

Relvinian

Nilrem,

When you dynamically allocate a buffer and have a DWORD ptr to it, you don't use the ADDR to reference it.  You only need the ADDR when you need a ptr to some data that is on the stack.

Example:

MyFunc proc public String:dword
   local   CopyOfString1[256] : byte
   local   CopyOfString2 : dword

   ; copy the string passed in into our local string buffer.
   invoke lstrcpy, addr [CopyOfString1], [String]

   ; allocate a buffer to hold the string and copy the string passed in
   invoke GetProcessHeap
   invoke HeapAlloc, eax, 0, 260
   mov [CopyOfString2], eax
   invoke lstrcpy, eax, [String]

   ; now, both CopyOfString1 and CopyOfString2 have identitcal copies of our string passed in.

   ; free the allocated string so we don't have a memory leak
   invoke GetProcessHeap
   invoke HeapFree, eax, 0, [CopyOfString2]

   ; return back to caller
   ret
MyFunc endp
/


In your example you had:

   invoke ReadFile, [hFile], addr [buffer2], [fileSizeLow+1], addr Numb, NULL


This is reading the incorrect amount because you are adding one to the address of 'fileSizeLow' not the value then using the new address and also placing the data in the wrong place corrupting your stack.



invoke ReadFile, [hFile], [buffer2], [fileSizeLow], ADDR Numb, NULL; Read the file
    invoke StdOut, ADDR [buffer2]
    invoke CloseHandle, ADDR [hFile]; Close the file
    mov lpstring, input (); Wait for user input
    invoke VirtualFree, [buffer2], 0, MEM_RELEASE
    ret


In this section of code above, your ReadFile line is correct but you can't put the ADDR in front of 'buffer2' in your StdOut invoke statement.  Same goes for the CloseHandle call.  Don't put the ADDR or you are trying to close the address of 'hFile' and not the value.

Relvinian

Nilrem

Thankyou, that worked perfectly, and I understood it. 8-) Now how would I maniuplate these? What I want to do is read the lines of the file in, for example it will output
file.txt
file1.txt now I want to know how to check if these exist, I know how to do it with CreateFile but how to manipulate the data read in to do this? I don't want the code writing for me (not at first anyways) just some hints and clues to get me started, best way to learn. Thanks a lot by the way guys. Thanklyou. 8-)

Relvinian

Nilrem,

Once you have read in some data from the file into your buffer, you just pass the buffer pointer to some routine which will parse the information you are looking for. The MASM32.lib has some parsing routines which may be of use to you. Just browse through there and see if anything suits your fancy.

Let me know what kind of checking/parsing you are looking for and I'll give you more guidance and more details on what type of functions/API calls to use for your routines.

Relvinian

Nilrem

Thankyou before I have a look (currently at school) at the masm32.lib I will tell you my parsing routines. Ok I want to check if a file exists,  I have already done that in my code using 'createfile' with 'getlasterror'. However when I read in the text file it reads in names of files (which is what I want it to do), now with this I want to use it to check if the files exist (the filenames read in from my initial txt file). I am really stumped on this. But trying to use some logic could I use findfirstfile and findnextfile, and then when it has found all the files I could compare it against a text file that lists the filenames to see if it matches? A bit of a long shot but I want to prove I am trying myself not just asking endless questions without first trying it myself. Thanks again.