The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: jdoe on January 10, 2009, 05:29:38 AM

Title: Reading large file byte per byte
Post by: jdoe on January 10, 2009, 05:29:38 AM

Hi,

If I want to read a large file byte per byte, is it better to decrement the file size until it reach zero or I can increment the file pointer until it reach the end pointer.

The reason I ask that question is because it came to my mind that, if the pointer reach FFFFFFFFh and I do a check if the start pointer is below than the end pointer, it would fails past that limit.

What I should know about addressing small versus large files ?

Thanks

Title: Re: Reading large file byte per byte
Post by: donkey on January 10, 2009, 09:33:02 AM
When lpDistanceToMoveHigh is NULL, lDistanceToMove is a 32 bit signed value, when lpDistanceToMoveHigh is not NUL the two make up a 64 bit signed value. For files over 4GB you use a 64 bit signed pointer. You would perform a compare on lpDistanceToMoveHigh and lDistanceToMove as one signed value.
Title: Re: Reading large file byte per byte
Post by: hutch-- on January 10, 2009, 10:13:00 AM
JD,

Edgar is right, if the file is over 4 gig you need to use the extra value to get its size but there is another approach, work out a convenient buffer size and read that data in a block then read the next one etc .... Its probably faster than reading one byte at a time using the file system even though the file system uses memory cache to store blocks of data.

Basics are if you choose a 1 meg buffer for example, load the first meg into memory, scan it, read the next meg etc .... This approach will work if you are reading the file sequentially. If you need random access or reverse reads you will need to use the file system.
Title: Re: Reading large file byte per byte
Post by: donkey on January 10, 2009, 01:02:52 PM
Hutch is right, you should be using buffers of a predetermined size to read the file in chunks. However to test a file pointer to see if it is less than a given value you only need a few extra steps...

// 64 bit pointer value in EDX:EAX
test edx,edx
jnz >
cdq // Sign extend EAX into EDX if EDX is NULL
:
cmp edx,[PointerHigh]
jl >.ISSMALLER
cmp eax,[PointerLow]
jl >.ISSMALLER

.ISLARGEROREQUAL
...


.ISSMALLER
...
Title: Re: Reading large file byte per byte
Post by: jdoe on January 16, 2009, 01:23:41 AM

Thanks guys for these informations and I'm sorry for my late reply.

I realize that my question wasn't really precise. I know that getting a file pointer through MapViewOfFile for a large file could be slow but to explain what I try to understand let say that I do it like that. So I get a file pointer into EAX with MapViewOfFile and I want to read that file byte per byte (BYTE PTR [eax]). If the file is large, I'll get a situation where I'll reach the FFFFFFFFh pointer and after that point, what is happening. It is not reliable to get the end pointer and stop reading when the start pointer reach the end pointer. Am I better to use the file size and decrement it until it is zero. Sorry if I wasn't clear.

Title: Re: Reading large file byte per byte
Post by: sinsi on January 16, 2009, 01:55:45 AM
If you're using MapViewOfFile, you won't have to worry about the pointer overflowing since you can only map less than 4 GiB anyway - it's got to fit into your program's address space.

If you use ReadFile to read byte-by-byte just look at lpNumberOfBytesRead - ReadFile returns no error but lpNumberOfBytesRead=0 at EOF.
Using ReadFile to read single bytes is very slow, you're better off using a buffer, then having a proc of your own to read that a byte at a time.
Once the buffer is empty, fill it up again until you get to EOF. Of course, if you're not reading sequentially you don't have a lot of choice.
Title: Re: Reading large file byte per byte
Post by: jdoe on January 17, 2009, 01:50:55 AM

Thanks sinsi. You just gave me few clue about stuff I have to learn.

With basic knowledge, I can do a lot a stuff and with only one more piece of the puzzle in place, I'm gonna do things differently but more accurately. I love assembly for that... you never know when "that little something" will take place in your head.

If I had more time to read instead of learning by coding...   :green2

:U