News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Memory-mapped files: Reading beyond the border

Started by jj2007, June 01, 2011, 11:21:20 AM

Previous topic - Next topic

jj2007

mov pMap, rv(MapViewOfFile, eax, FILE_MAP_READ, 0, 0, 10000h) returns a pointer to mapped memory.
This loop...
mov esi, pMap
add esi, 10000h-10
xor ecx, ecx
.Repeat
movzx eax, byte ptr [esi]
inc esi
inc ecx
.Until ecx>20

... goes bang when esi reaches the end of the allocated memory. Olly can handle this, but my code can't. So a simple len(esi) would cause an access violation if esi happens to be near the border. How would you handle this problem? SEH?

dedndave

that's how C programmers would do it   :P
they have something called "try" - lol

isn't there a way to get the size of the file ?
another approach - extend the file beyond the data size a little bit

clive

Well the lazy route is to use an SEH to catch the page fault. But the real solution is to recognize you have a limit, and handle the spanning explicitly. You need to have a len(esi) function that is aware of where the end is, and act accordingly.

It could be a random act of randomness. Those happen a lot as well.

drizz

The truth cannot be learned ... it can only be recognized.

jj2007

Quote from: drizz on June 01, 2011, 02:01:06 PM
I see that some lessons are still not learned!

http://www.masm32.com/board/index.php?topic=1807.msg81031#msg81031 -- actively participated
http://www.masm32.com/board/index.php?topic=14353.0 -- participated
http://www.masm32.com/board/index.php?topic=10925.msg80337#msg80337 -- you are the thread starter

back to the drawing board.

drizz,

I am aware of my own threads, don't worry. I just had hoped that somebody had a more elegant solution than SEH, e.g. an API that makes the page behind the border writeable, and allows to poke a nullbyte at the border ::)

qWord

Quote from: jj2007 on June 01, 2011, 02:33:41 PMI just had hoped that somebody had a more elegant solution than SEH, e.g. an API that makes the page behind the border writeable, and allows to poke a nullbyte at the border ::)
you know VirtualProtect(+VirtualAlloc)?
FPU in a trice: SmplMath
It's that simple!

drizz

QuoteSo a simple len(esi) would cause an access violation if esi happens to be near the border
Is null byte inside the mapped region?

  • If yes, then as shown in the linked threads, your strlen function is faulty.
  • If not, then create or use strnlen ( crt_strnlen ? )
    http://msdn.microsoft.com/en-us/library/z50ty2zh(v=vs.80).aspx
    QuoteThese functions return the number of characters in the string (excluding the terminating null). If there is no null terminator within the first numberOfElements characters, then numberOfElements is returned to indicate the error condition; null-terminated strings have lengths strictly less than numberOfElements.

    room_left = map_file_length - (current_ptr - map_ptr )
    if strnlen ( current_ptr, room_left , ) == room_left then "handle non null-terminated string"

The truth cannot be learned ... it can only be recognized.

jj2007

Quote from: drizz on June 01, 2011, 02:49:20 PM
QuoteSo a simple len(esi) would cause an access violation if esi happens to be near the border
Is null byte inside the mapped region?

Yeah, you are right. I'll have to redesign the whole thing. I have a 2.x Gigabyte ASCII file from which I want to extract floats - and it needs to be fast as you can imagine. So I thought of streaming it in with a 64k window; but the more I think about it, the less attractive more stupid seems the idea to test for end of streaming window using a nullbyte... so I must rewrite the parser, which I wanted to avoid.

Thanks anyway :thumbu

clive

Quote from: jj2007
I just had hoped that somebody had a more elegant solution than SEH, e.g. an API that makes the page behind the border writeable, and allows to poke a nullbyte at the border ::)
You map 0x11000 from the file, and then advance the offset into the file 0x10000 at a time.

Each time your pointer into the mapping exceeds 0x10000 you move the mapping forward 0x10000, and dereference the pointer back by 0x10000, so the new reference falls in the 0x000-0xFFF range in the front of new mapping.

This will work for strings or packets, or whatever that might span, but don't exceed 4KB.

Adjust the mapping size, and overlap according to your needs.
It could be a random act of randomness. Those happen a lot as well.

jj2007

Quote from: clive on June 01, 2011, 11:30:52 PMYou map 0x11000 from the file, and then advance the offset into the file 0x10000 at a time.

Thanks, Clive - good idea :U

P.S.: Memory mapped file EOF - another tricky question :wink

hutch--

JJ,

You are safer using the file system file length rather than an EOF character. I have recently written a scanner for files where you use the normal CreateFile() ReadFile() as this handles the read length easily but it will depend on the type of data you are reading, text or binary data. If I am processing the data myself I tend to read it in larger chunks but depending on the data type you may need a strategy for handling target data that falls across the boundary that you set for chunks.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: hutch-- on June 02, 2011, 07:30:05 AM
JJ,

You are safer using the file system file length rather than an EOF character.

Hutch,

I haven't seen an EOF character since the days of DOS, and I do not plan to rely on one, don't worry :bg

FORTRANS

Quote from: jj2007 on June 02, 2011, 08:12:22 AM
I haven't seen an EOF character since the days of DOS, and I do not plan to rely on one, don't worry :bg

Hi,

   And it is not really needed in DOS for anything except for
CP/M compatibility.  Its main use seemed to be truncating
binary files randomly when using the COPY command.

Cheers,

Steve N.

Rockoon

The EOF character actually comes from the pre-DOS days of CP/M, because that OS used a file system that only stored the number of sectors allocated to the file (it had no concept of byte length)
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

FlySky

Hey guys,

I am playing with the MapViewOfFile API and somehow it's not doing what I want.

I have an 50 MB file which I map into the Memory using:

      Invoke CreateFile, [FilePath], GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0
     Cmp Eax, INVALID_HANDLE_VALUE
     Jne > FileMapping
     Invoke MessageBox, [hWnd], 'File not created succesfully', 'File not created succesfully', MB_OK
FileMapping:
     Mov [hFile], Eax
     Invoke CreateFileMapping, [hFile], 0, PAGE_READONLY, 0, 0, 0
   Cmp Eax, NULL
   Jne > MapFile
   Invoke MessageBox, [hWnd], 'CreateFileMapping failed', 'CreateFileMapping failed', MB_OK
   Invoke CloseHandle, [hFile]
MapFile:
     Mov [hMap], Eax
;     Invoke GetFileSize, [hFile], NULL
;     Mov [FileSize], Eax
     Invoke MapViewOfFile, [hMap], FILE_MAP_READ, 0, 0, 0
     Cmp Eax, NULL
     Jne > StartCheckingPE
     Invoke MessageBox, [hWnd], 'MapViewOfFile failed', 'MapViewOfFile failed', MB_OK
     Invoke CloseHandle, [hMap]

Here is the thing:

The MapViewOfFile returns a pointer to the mapped view for example: 05A70000

When looking at that view I can see that at offset 3C is the offset to the PEHeader. That seems to go perfectly fine.

The thing is it somehow is not mapping the file complete / right.

For example:

The file on the disk has:

At offset: 1D86049

02186049 9C                                            pushf
0218604A 50                                            push    eax
0218604B 51                                            push    ecx
0218604C 52                                            push    edx
0218604D 53                                            push    ebx
0218604E 54                                            push    esp

When I looked at the Mapped view:

I get at the same offset: 1D86049 so at address 5A70000 + 1D86049 = 77F6049

MZ_:077F6049 db  61h ; a
MZ_:077F604A db 0D6h ; Í
MZ_:077F604B db  70h ; p
MZ_:077F604C db  3Dh ; =
MZ_:077F604D db 0DFh ; ¯
MZ_:077F604E db  2Bh ; +
MZ_:077F604F db  59h ; Y

See the bytes pattern does not match. Now when I search the pattern: 9C 50 51 52 53 54

MZ_:06DEE249 db  9Ch ; £
MZ_:06DEE24A db  50h ; P
MZ_:06DEE24B db  51h ; Q
MZ_:06DEE24C db  52h ; R
MZ_:06DEE24D db  53h ; S
MZ_:06DEE24E db  54h ; T
MZ_:06DEE24F db  55h ; U
MZ_:06DEE250 db  56h ; V
MZ_:06DEE251 db  57h ; W

Now when I substract: 06DEE249 - 5A70000 = 137E249

This is an completely different offset. Is there any limitation on the MapViewOfFile API? I don't understand why it doesn't map the file right.