The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: Jimg on October 07, 2009, 05:44:26 PM

Title: How to get list of files on http site
Post by: Jimg on October 07, 2009, 05:44:26 PM
What is the magic wininet command to get a list of files from an http site?

For example, on http://www.physicsxxi.com/CurrentImages/ there are several image files.  The list show up if I open this link in a browser.

I would like my program to be able to get a list of the files currently on the site so it can alert me if there are any new ones to look at.

I am currently connecting with InternetOpen and then anonymously using InternetConnect, but I can't seem to find the next command to get the listing.

Title: Re: How to get list of files on http site
Post by: NervGaz on October 07, 2009, 05:56:53 PM
InternetOpenUrl iirc check your trusty SDK documentation for usage...
Title: Re: How to get list of files on http site
Post by: MichaelW on October 07, 2009, 06:25:47 PM
I don't know much about any of this, but downloading the index can be fairly straightforward:

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    include \masm32\include\urlmon.inc
    includelib \masm32\lib\urlmon.lib
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    invoke URLDownloadToFile, NULL,
            chr$("http://www.physicsxxi.com/CurrentImages/"),
            chr$("index.htm"),
            0, 0

    inkey "Press any key to exit..."
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start

Title: Re: How to get list of files on http site
Post by: dedndave on October 07, 2009, 06:30:47 PM
the file i found is "Index of _CurrentImages.htm"

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html><head><title>Index of /CurrentImages</title></head><body>
<h1>Index of /CurrentImages</h1>
<pre><img src="Index%20of%20_CurrentImages_files/blank.gif" alt="     "> <a href="http://www.physicsxxi.com/CurrentImages/?N=D">Name</a>                    <a href="http://www.physicsxxi.com/CurrentImages/?M=A">Last modified</a>       <a href="http://www.physicsxxi.com/CurrentImages/?S=A">Size</a>  <a href="http://www.physicsxxi.com/CurrentImages/?D=A">Description</a>
<hr>
<img src="Index%20of%20_CurrentImages_files/back.gif" alt="[DIR]"> <a href="http://www.physicsxxi.com/">Parent Directory</a>        06-Oct-2009 20:50      - 
<img src="Index%20of%20_CurrentImages_files/image2.gif" alt="[IMG]"> <a href="http://www.physicsxxi.com/CurrentImages/P1AME20081006IC0.png">P1AME20081006IC0.png</a>    06-Oct-2009 17:14   148k 
<img src="Index%20of%20_CurrentImages_files/image2.gif" alt="[IMG]"> <a href="http://www.physicsxxi.com/CurrentImages/P1AME20090906IC0.png">P1AME20090906IC0.png</a>    06-Oct-2009 17:14   142k 
<img src="Index%20of%20_CurrentImages_files/image2.gif" alt="[IMG]"> <a href="http://www.physicsxxi.com/CurrentImages/P1AME20090929IC0.png">P1AME20090929IC0.png</a>    06-Oct-2009 17:14   132k 
<img src="Index%20of%20_CurrentImages_files/image2.gif" alt="[IMG]"> <a href="http://www.physicsxxi.com/CurrentImages/P1AME20091005IC0.png">P1AME20091005IC0.png</a>    06-Oct-2009 17:14   143k 
<img src="Index%20of%20_CurrentImages_files/image2.gif" alt="[IMG]"> <a href="http://www.physicsxxi.com/CurrentImages/P1AME20091006IC0.png">P1AME20091006IC0.png</a>    06-Oct-2009 17:14   145k 
<img src="Index%20of%20_CurrentImages_files/image2.gif" alt="[IMG]"> <a href="http://www.physicsxxi.com/CurrentImages/sc200910051600.png">sc200910051600.png</a>      05-Oct-2009 14:09   1.2M 
<img src="Index%20of%20_CurrentImages_files/image2.gif" alt="[IMG]"> <a href="http://www.physicsxxi.com/CurrentImages/sm200910051729.png">sm200910051729.png</a>      05-Oct-2009 14:09   724k 
</pre><hr>

</body></html>
Title: Re: How to get list of files on http site
Post by: MichaelW on October 07, 2009, 06:39:47 PM
index.htm is the name I selected for the destination file. To download one of the image files you would append the name of the file to the original url, specify an appropriate destination file name, and call URLDownloadToFile to download it.
Title: Re: How to get list of files on http site
Post by: Jimg on October 07, 2009, 06:50:07 PM
Thanks Michael.  It's the list of files I wanted.  That's certainly simple, even if the results are pretty ugly, I can parse out the file names.

I was just hoping there was something like  httpfindfirstfile (as there is for ftp and gopher protocols) that I couldn't find.
Title: Re: How to get list of files on http site
Post by: jj2007 on October 07, 2009, 07:30:48 PM
On most sites you will get error 800401e4, download not allowed aka "Directory listing denied".
Title: Re: How to get list of files on http site
Post by: Tedd on October 07, 2009, 10:27:04 PM
There are no functions to list the files, any file list is entirely at the discretion of the web-server to send it.
When you connect to a site, you request "/" by default, and it's then up to the webserver what to send you, often this turns out to be the contents of "index.html", but it could equally be anything.
When you try to open some random url on the site, it's the same situation (if it's not a specific file) - you'll often get the index.html under that location, or a directory listing (if it's allowed by access rights; and an error message if it's not).
Anyway, short answer: the directory listing is entirely webserver generated, as a webpage, so anything you get will need to be extracted from the html listing to be useful.
Title: Re: How to get list of files on http site
Post by: Jimg on October 07, 2009, 11:28:24 PM
Thanks everyone.  It'll work.
Title: Re: How to get list of files on http site
Post by: PBrennick on October 08, 2009, 07:54:57 PM
Jimq,

I am a bit vague in this area but URLOpenPullStream and URLOpenStream look interesting. The list of APIs that I could find that are internet related is as follows.

AsyncInstallDistributionUnit
CoGetClassObjectFromURL
CoInternetCombineIUri
CoInternetCombineUrl
CoInternetCombineUrlEx
CoInternetCompareUrl
CoInternetGetSecurityUrlEx
CoInternetParseIUri
CoInternetQueryInfo
CompatFlagsFromClsid
CopyBindInfo
CopyStgMedium
CreateAsyncBindCtx
CreateAsyncBindCtxEx
CreateFormatEnumerator
CreateUri
CreateURLBinding (not currently implemented)
CreteURLMoniker
CreateURLMonikerEx
CreateURLMonikerEx2
FindMediaType
FindMediaTypeClass
FindMimeFromData
GetClassFileOrMime
GetClassURL (not currently implemented)
IsAsyncMoniker
IsValidURL
MkParseDisplayNameEx
ObtainUserAgentString
RegisterBindStatusCallback
RegisterFormatEnumerator
RegisterMediaTypeClass
RegisterMediaTypes
ReleaseBindInfo
RevokeBindStatusCallback
RevokeFormatEnumerator
URLDownloadToCacheFile
URLDownloadToFile
UrlMkGetSessionOption
UrlMkSetSessionOption
URLOpenBlockingStream
URLOpenPullStream
URLOpenStream

hth,
Paul
Title: Re: How to get list of files on http site
Post by: Jimg on October 08, 2009, 08:32:36 PM
Thanks Paul.  It make my poor mind boggle.
Title: Re: How to get list of files on http site
Post by: nathanpc on October 08, 2009, 11:08:06 PM
O_o
It's a very big list!
Í'm going to try! :D
Title: Re: How to get list of files on http site
Post by: elmo on April 20, 2011, 07:05:09 AM
i found almost complete example in:
http://www.masm32.com/board/index.php?PHPSESSID=1fba624f0f2404ac83e75becf8fb6c7b&topic=3950.0

I want to list a file on my site.
i find in wininet.inc:
-FtpFindFirstFile
-InternetFindNextFile

maybe we can combine it. but I don't found enough example on the NET
could someone out there share their knowledge about this?