News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Forum Downloader 1.0

Started by Vektor, November 28, 2007, 08:13:19 AM

Previous topic - Next topic

Vektor

Forum Downloader is a program that does what its name says - it downloads forums. I made it to download our own forum, to import it for hosting with HeXHub (another project of mine, written in asm that compiles with masm32) which supports HTML with scripting macros in them, so next to be added will be support for HTML templates for downloaded data. Forum Downloader can also be used to download forums to do a single search in all of them at once or to share archives of your own forums to allow your users to search them offline instead of using forum search scripts.
Currently supported forum softs are PHPBB and SMF, but it might work with others also. For this version to download posts, they must be in a table with 2 columns, first column with width less than 20% or less than 200px, no matter if they include themselves other tables.
This being the first version, there are not too many options in it. It will never be multithreaded, the idea is to download the forum not to attack it. The program should "browse" the forum like a normal user with a normal browser traversing all threads and thread pages.
Support for proxies or forum authentification is not yet added nor it detects weird forum setups. No main index file is generated.
Saved threads are simple and clean HTMLs with no javascript, no styles, no banners, no iframes or other crap. File names have assigned a unique incremental topic identifier and are named by topic's name, they are saved in directories named by forum's name that also have an incremental identifier, sub-forums are saved in sub-directories. It should not download same file twice unless the link has something different in it. File size limit (for download) in this version is 1 Mb, and 10 Mb when saving (all posts from all pages are merged in one big table and saved in a single file per topic).
On some forums it should auto-detect multi-page sub-forums, sub-sub-forums etc., topics, posts, authors, attachments, HTTP redirects, etc. It will also download external images and attachments and link them to be viewed offline if it finds them. This should allow an easy and fast offline search by filename or content in downloaded forum data.
I think would be a good idea to share on http://masm32.com website a monthly archive of this forum for those who want to search it offline.

[attachment deleted by admin]

Shell

Thanks Vektor, this sounds like something I've been searching for. Does it download in simple-view or does it just do skinned view? Any Joomla / InvisionCube support planned?

PS. Your avatar reminds me a lot of Shub Niguraath  :eek

ramguru

This is very nice indeed. Some improvements that could be made: choose sub forum that I want to download, take into account tags .. or maybe reformat code even better with syntax highlighting (like http://tohtml.com/ does)

Vektor

Yes, i'm planning to add support for more forum softs like IPB and vBulletin. I just tested it with a Joomla forum and it works fine.
All downloaded topics are saved as tables with 2 main columns that may have other tables included, but all styles and most of the formatting is removed so they don't look that nice. Next i will add support for HTML templates and add more options and this colourless table style will be just an optional template.
Also, choosing sub-forums is a good idea and will be added.  :bg

Vektor

I made some improvements to Forum Downloader,

  • attachments with duplicate names are no longer saved as file.ext_N but file_N.ext
  • attachments and images can have any size
  • corrected error while checking timeout, it was multiplied by 1000
  • if there is any error while downloading - 404, 403, connection timeout or other, no file is saved.
  • images that cannot be downloaded or have html code or scripts in them are no longer linked
  • new option: minimize to tray
  • new option: save debug messages to log
  • new option: select sub-forums
  • images from hosting companies like imageshack.us, postimage.org and others that are linked as thumbnails are also saved at their full-sized version linked locally by those thumbnails
It should also detect a few more forum softwares and i'll add detection for more. I didn't add yet options for output format or HTML templates.

[attachment deleted by admin]

Vektor

I made many improvements to Forum Downloader and added more options. Now it should be able to download from all forum softwares if it is configured right.
  • added: support for HTTP proxies
  • added: support for using accounts, the program will search for a form that has an 'input type="password"'
  • added: options to disable downloading external images or attachments
  • added: advanced options to improve forum detection with support for forum-specific wildcards
  • added: an option to set / change some linked variabiles (like "daysprune=-1" to get everything)
  • added more detection methods, the old heuristic analysis of forum tables will be used only if everything else fails
  • added sid detection, the program removes sid when comparing links
Support for HTML templates will be added after i finish HeXHub's forum functions and it will be able to host forums.

And this is how i made triple post on this forum :)

[attachment deleted by admin]

Vektor

I made some corrections in Forum Downloader,
  • corrected: problem with encoding on some forums
  • corrected: on some forums same attachment could be saved more than once
  • corrected: on some forums pages with greater number than 100 were not saved if a string was defined to match next pages
  • corrected some parsing errors in the "nearest match" searching procedures
  • memory usage for keeping history of saving links was optimized (the program no longer allocates 8192 bytes for every new link)[/link]

    [attachment deleted by admin]