News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Asynchronous File I/O (Win32)

Started by Glenn9999, January 08, 2011, 02:51:41 AM

Previous topic - Next topic

Glenn9999

I'm studying how to do asynchronous file i/o anyway, but I thought I would ask this.  Do any of you find any benefits in doing this (Async I/O in one thread, process in another)?  I know all the elementary tricks (file buffer a big multiple of sector size/buffer size, I usually use 64K), but just wondering what kind of results people have gotten in trying other things?

For example, I thought to do some File I/O tests on the SHA-1 code I posted in another thread and I ended up with an I/O-only time of 80% the full run time for doing an SHA-1 on a file.  So I'm thinking it'd be possible to have the CPU busy while on I/O wait processing the SHA-1 out of another thread and bring the run time closer to the I/O-only time?

hutch--

Glen,

The logic of separating the file IO from the processing of the data makes sense as long as you manage the data in large enough pieces. You will need some form of synch of one thread finishing then waiting for the other to use the data, perhaps alternate memory locations so that when one is finished the next thread writes to the alternate address.

There is one potential catch, the idea if put together properly will probably be faster on a multi-core processor but may be slower if its run on a single core.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

when it comes to file I/O, it seems that every application is vastly different
all the techniques employed by the OS are pretty damn good - lol
you have to be on your toes to improve on them
if you select a method that is appropriate for the app to begin with, you may not get much improvement out of a lot of coding

JW149

I have recently downloaded and had a little play with MASM and it was specifically this problem of asynchronous I/O that I wanted to solve.
I have a mainframe database that I wanted to port to PC AIX/Linux/Windows which is coded in PL/I
I want to be able to read/write any one of 2000 files. each with a unique structure (DCB in mainframe parlance) without defining 2000 files to PL/I, and I wanted to optionally leave the I/O to get on with it and check later.
If you wanted a focus for your own work, or help me, or cooperate then I would be very happy!

clive

Or just write a multi-threaded application, where the "work" is performed in a different thread from the "fetch". The whole file/driver stack is designed to be asynchronous, and is not sitting around polling the hardware depending on how you choose to do file IO.

If you wanted to do SHA, use a pair of ping-pong buffers, light off the next overlapped read, wait for and then SHA the previous block, light off the next. Rinse and Repeat. This way the file system can fetch while you work.

There isn't much point of having a lot of in-flight IO transactions, they get serialized at the hardware driver. Try to localize your access so the head isn't thrashing over the entire media. This is less true of solid state media, but still the internal blocking/banking is likely to much larger than the sector size.
It could be a random act of randomness. Those happen a lot as well.

jj2007

Quote from: clive on January 16, 2011, 05:13:48 PM
... use a pair of ping-pong buffers, ... light off the next. Rinse and Repeat

Question from a hobby programmer: In which kind of courses did you learn this secret language?
::)  :bg

dedndave

 :bg "ping-pong buffers" - i got that one right away
"rinse and repeat" - i am thinking David Letterman, here   :lol
leave it to the Brits to mangle our language   :P

clive

My formal training is in electronics, and being English there is also an amount of sarcasm.

Ping-Pong, is a term often used in video and audio processing circles, one buffer is prepared while the other is displayed/played. The goals being to not have the updating being visible/audible or cause contention on dual-ported/DMA memory, and second not to have to move things unnecessarily.
http://en.wikipedia.org/wiki/Multiple_buffering#Page_Flipping

Where the details/instructions/concepts should be apparent without explaining in grinding detail.
http://en.wikipedia.org/wiki/Lather,_rinse,_repeat

See also wax-on, wax-off
It could be a random act of randomness. Those happen a lot as well.

vanjast

Gee! it's years since I've looked at that... Ping Pong must be the new buzzword  :green2
I had this impression of a couple of Chinese guys jumping around a square table at great distances  :bg

dedndave

rinse and repeat still doesn't sound very cricket to me - lol

hutch--

Its more to do with your grasp of English as a language, ping-pong makes perfectly good sense if you can visualise the back and forth, alternate buffering is a very good technique if you have the thread support but it asumes that the post disk IO processing is slower than than the disk IO and that is not always the case. It is subject to sensible variations though, one thread doing the disk IO and another doing the processing of the data performing waits until the disk IO has caught up. You at least get some multicore processing overlap that way.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dioxin

QuoteYou at least get some multicore processing overlap that way
It's not just muliticore processors that benefit. Single cores do too.
The non-overlapped method might do something like this:

DO
    READ next block of data
    Process block of data
    Write results back to disk
LOOP until all blocks done.

but read and write are not CPU intensive as they hang around waiting for the disk and they wait until complete so no processing can be done while  reading and writing.

R=read data from disk W=write data to disk p=process data   i=CPU almost idle

RRRRRRRRRRppppppppppWWWWWWWWWWRRRRRRRRRRppppppppppWWWWWWWWWW
111111111111111111111111111111222222222222222222222222222222
iiiiiiiiii          iiiiiiiiiiiiiiiiiiii          iiiiiiiiii


complete in 60 time units, cpu almost idle for 40

If you thread the process (which is what overlapped i/o does) then you get this:

RRRRRRRRRRppppppppppWWWWWWWWWW
111111111111111111111111111111
          RRRRRRRRRRppppppppppWWWWWWWWWW
          222222222222222222222222222222
iiiiiiiiii                    iiiiiiiiii


complete in 40 time units cpu almost idle for 20 (in practice the processing may take a fraction longer because it's giving up a small amount of time on a single core to handle the R/W)


What hapens is that the read thread takes very little cpu time, it's mostly hardware/DMA with a little CPU overhead for control, so it can take place during the processing time and barely impact on the time to process. It may need to be given a higher priority thread than the processing thread to make sure it gets the small amount of time it needs. This allows the second block of data to be fetched in a second thread at the same time as processing the first block so when the first block is finished processing there is no need to wait for the second block to be fetched as it's already been fetched.

You can make the data blocks smaller to reduce the idle time at each end but this might cause disk thrashing which can slow things down unless you read data from 1 physical disk and write the results to another disk which is the ideal solution.


The benefit you get depends on the ratio of disk i/o to processing time. The case demonstrated above is ideal because the i/o and processing times are equal.

Paul.