News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Real experts here

Started by rags, February 18, 2012, 04:30:29 AM

Previous topic - Next topic

hutch--

JJ,

Timing the disk read is an unusual way to do it, the first try will always be slow with the following passes will be reading the file from the cache. I don't like loop code timings for tasks of this type so I tested it on a larger file, 64 meg of C header file with 1.9 million lines.

It keeps turning up at 125 ms for a 2 pass operation, first to count the LF (10) then the tokenise pass. You could drop it by near half if you use an estimation method for the LF count based on the file length but in this context I don't have the luxury of using that much extra memory. I do the first pass to get the count then allocate the correct amount of memory to hold the pointer array.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: hutch-- on February 19, 2012, 11:23:10 PM
JJ,

Timing the disk read is an unusual way to do it, the first try will always be slow with the following passes will be reading the file from the cache. I don't like loop code timings for tasks of this type so I tested it on a larger file, 64 meg of C header file with 1.9 million lines.

It keeps turning up at 125 ms for a 2 pass operation, first to count the LF (10) then the tokenise pass. You could drop it by near half if you use an estimation method for the LF count based on the file length but in this context I don't have the luxury of using that much extra memory. I do the first pass to get the count then allocate the correct amount of memory to hold the pointer array.

Hutch,

Which CPU?
WinBoth.inc is 2MB, 52357 lines, 2 millisecs for my Celeron (without the disk timing - I know, I know...)
C header is 64Mb, 1.9 Mio - so roughly a factor 35*2 =>> 70 ms expected

The difference is probably that MasmBasic Recall is SSE2, and that I use a single pass. There are limits to that technique, but those who are working with 2GB files surely have the right amount of RAM installed, so getting a temporarily too generous buffer shouldn't pose any practical problems. Two passes, on the other hand, means twice the work, and outside the data cache.

Can you upload the file somewhere, so that we can run a test?