The MASM Forum Archive 2004 to 2012

Miscellaneous Forums => The Orphanage => Topic started by: Magnum on October 13, 2011, 11:12:36 PM

Title: Go to selected websites and save the page to a file
Post by: Magnum on October 13, 2011, 11:12:36 PM
I am thinking of writing a program that would go to selected websites, wait til the pages are loaded, and then save the web page including its' images.

Saving of the webpage is what I don't know how to do.

Title: Re: Go to selected websites and save the page to a file
Post by: fearless on October 13, 2011, 11:27:17 PM
Could use COM to allow IE webcontrol to save pages (dont know if thats possible, but probably is, and also might be a lot of work) or use InternetOpenUrl then InternetReadFile to get the content of the page - save it, then you would need to go through the page (parse) looking for links (<img= etc etc) and resubmitting requests (InternetOpenUrl) for images etc that are found and then use InternetReadFile to get the images and save them in a subfolder or somewhere relative to the image url path found in the main index page. Im sure i seen an example of simple web browser in asm somewhere that might be worth searching for as it might be a good place to start.
Title: Re: Go to selected websites and save the page to a file
Post by: Magnum on October 14, 2011, 12:05:20 AM
Thanks fearless.

Title: Re: Go to selected websites and save the page to a file
Post by: Tedd on October 15, 2011, 06:13:34 PM
If you're doing it for the challenge, good luck :lol
Though it's not too much work to knock something together that will work as long as nothing goes wrong (send HTTP request, receive reply, save html to file, parse html to find links to css files and images, make requests for those and save them to files.)

If you want a program that already does it for you.. http://gnuwin32.sourceforge.net/packages/wget.htm
Title: Re: Go to selected websites and save the page to a file
Post by: jj2007 on October 15, 2011, 06:22:38 PM
Quote from: Magnum on October 13, 2011, 11:12:36 PM
I am thinking of writing a program that would go to selected websites, wait til the pages are loaded, and then save the web page including its' images.

The good news is that such a program exists, see attachment ::)
Title: Re: Go to selected websites and save the page to a file
Post by: anunitu on October 15, 2011, 07:50:42 PM
There is this,and it seems it also has source available..though I think it is in C++

http://www.httrack.com/

I have used this off and on,it is concidered an offline viewer,and it can download a whole site,or pick and choose pages(including all graphics on the page.

The source might give you some idea about writing your own program in assembler.
Title: Re: Go to selected websites and save the page to a file
Post by: Magnum on October 15, 2011, 11:29:06 PM
Thanks Tedd,anunitu, and jj2007.

I will try out your suggestions, no need to re-invent the wheel.

Title: Re: Go to selected websites and save the page to a file
Post by: jj2007 on October 16, 2011, 05:41:16 AM
Quote from: Magnum on October 15, 2011, 11:29:06 PM
I will try out your suggestions, no need to re-invent the wheel.

It has always worked for MSIE. I remember vaguely that Mozilla developers refused to implement it in Firefox for "security reasons" ::)

MSIE 7 greeted me with a "run once" page, saying I could choose my search provider. Either the predefined Live Search, or another one. So I clicked "another one", and guess what? MSIE didn't find Google :green2