News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

How to get list of files on http site

Started by Jimg, October 07, 2009, 05:44:26 PM

Previous topic - Next topic

Jimg

What is the magic wininet command to get a list of files from an http site?

For example, on http://www.physicsxxi.com/CurrentImages/ there are several image files.  The list show up if I open this link in a browser.

I would like my program to be able to get a list of the files currently on the site so it can alert me if there are any new ones to look at.

I am currently connecting with InternetOpen and then anonymously using InternetConnect, but I can't seem to find the next command to get the listing.


NervGaz

InternetOpenUrl iirc check your trusty SDK documentation for usage...

MichaelW

I don't know much about any of this, but downloading the index can be fairly straightforward:

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    include \masm32\include\urlmon.inc
    includelib \masm32\lib\urlmon.lib
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    invoke URLDownloadToFile, NULL,
            chr$("http://www.physicsxxi.com/CurrentImages/"),
            chr$("index.htm"),
            0, 0

    inkey "Press any key to exit..."
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start

eschew obfuscation

dedndave

the file i found is "Index of _CurrentImages.htm"

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html><head><title>Index of /CurrentImages</title></head><body>
<h1>Index of /CurrentImages</h1>
<pre><img src="Index%20of%20_CurrentImages_files/blank.gif" alt="     "> <a href="http://www.physicsxxi.com/CurrentImages/?N=D">Name</a>                    <a href="http://www.physicsxxi.com/CurrentImages/?M=A">Last modified</a>       <a href="http://www.physicsxxi.com/CurrentImages/?S=A">Size</a>  <a href="http://www.physicsxxi.com/CurrentImages/?D=A">Description</a>
<hr>
<img src="Index%20of%20_CurrentImages_files/back.gif" alt="[DIR]"> <a href="http://www.physicsxxi.com/">Parent Directory</a>        06-Oct-2009 20:50      - 
<img src="Index%20of%20_CurrentImages_files/image2.gif" alt="[IMG]"> <a href="http://www.physicsxxi.com/CurrentImages/P1AME20081006IC0.png">P1AME20081006IC0.png</a>    06-Oct-2009 17:14   148k 
<img src="Index%20of%20_CurrentImages_files/image2.gif" alt="[IMG]"> <a href="http://www.physicsxxi.com/CurrentImages/P1AME20090906IC0.png">P1AME20090906IC0.png</a>    06-Oct-2009 17:14   142k 
<img src="Index%20of%20_CurrentImages_files/image2.gif" alt="[IMG]"> <a href="http://www.physicsxxi.com/CurrentImages/P1AME20090929IC0.png">P1AME20090929IC0.png</a>    06-Oct-2009 17:14   132k 
<img src="Index%20of%20_CurrentImages_files/image2.gif" alt="[IMG]"> <a href="http://www.physicsxxi.com/CurrentImages/P1AME20091005IC0.png">P1AME20091005IC0.png</a>    06-Oct-2009 17:14   143k 
<img src="Index%20of%20_CurrentImages_files/image2.gif" alt="[IMG]"> <a href="http://www.physicsxxi.com/CurrentImages/P1AME20091006IC0.png">P1AME20091006IC0.png</a>    06-Oct-2009 17:14   145k 
<img src="Index%20of%20_CurrentImages_files/image2.gif" alt="[IMG]"> <a href="http://www.physicsxxi.com/CurrentImages/sc200910051600.png">sc200910051600.png</a>      05-Oct-2009 14:09   1.2M 
<img src="Index%20of%20_CurrentImages_files/image2.gif" alt="[IMG]"> <a href="http://www.physicsxxi.com/CurrentImages/sm200910051729.png">sm200910051729.png</a>      05-Oct-2009 14:09   724k 
</pre><hr>

</body></html>

MichaelW

index.htm is the name I selected for the destination file. To download one of the image files you would append the name of the file to the original url, specify an appropriate destination file name, and call URLDownloadToFile to download it.
eschew obfuscation

Jimg

Thanks Michael.  It's the list of files I wanted.  That's certainly simple, even if the results are pretty ugly, I can parse out the file names.

I was just hoping there was something like  httpfindfirstfile (as there is for ftp and gopher protocols) that I couldn't find.

jj2007

On most sites you will get error 800401e4, download not allowed aka "Directory listing denied".

Tedd

There are no functions to list the files, any file list is entirely at the discretion of the web-server to send it.
When you connect to a site, you request "/" by default, and it's then up to the webserver what to send you, often this turns out to be the contents of "index.html", but it could equally be anything.
When you try to open some random url on the site, it's the same situation (if it's not a specific file) - you'll often get the index.html under that location, or a directory listing (if it's allowed by access rights; and an error message if it's not).
Anyway, short answer: the directory listing is entirely webserver generated, as a webpage, so anything you get will need to be extracted from the html listing to be useful.
No snowflake in an avalanche feels responsible.

Jimg


PBrennick

Jimq,

I am a bit vague in this area but URLOpenPullStream and URLOpenStream look interesting. The list of APIs that I could find that are internet related is as follows.

AsyncInstallDistributionUnit
CoGetClassObjectFromURL
CoInternetCombineIUri
CoInternetCombineUrl
CoInternetCombineUrlEx
CoInternetCompareUrl
CoInternetGetSecurityUrlEx
CoInternetParseIUri
CoInternetQueryInfo
CompatFlagsFromClsid
CopyBindInfo
CopyStgMedium
CreateAsyncBindCtx
CreateAsyncBindCtxEx
CreateFormatEnumerator
CreateUri
CreateURLBinding (not currently implemented)
CreteURLMoniker
CreateURLMonikerEx
CreateURLMonikerEx2
FindMediaType
FindMediaTypeClass
FindMimeFromData
GetClassFileOrMime
GetClassURL (not currently implemented)
IsAsyncMoniker
IsValidURL
MkParseDisplayNameEx
ObtainUserAgentString
RegisterBindStatusCallback
RegisterFormatEnumerator
RegisterMediaTypeClass
RegisterMediaTypes
ReleaseBindInfo
RevokeBindStatusCallback
RevokeFormatEnumerator
URLDownloadToCacheFile
URLDownloadToFile
UrlMkGetSessionOption
UrlMkSetSessionOption
URLOpenBlockingStream
URLOpenPullStream
URLOpenStream

hth,
Paul
The GeneSys Project is available from:
The Repository or My crappy website

Jimg

Thanks Paul.  It make my poor mind boggle.

nathanpc

O_o
It's a very big list!
Í'm going to try! :D

elmo

i found almost complete example in:
http://www.masm32.com/board/index.php?PHPSESSID=1fba624f0f2404ac83e75becf8fb6c7b&topic=3950.0

I want to list a file on my site.
i find in wininet.inc:
-FtpFindFirstFile
-InternetFindNextFile

maybe we can combine it. but I don't found enough example on the NET
could someone out there share their knowledge about this?
be the king of accounting programmer world!