DonationCoder.com Forum

Main Area and Open Discussion => General Software Discussion => Topic started by: David.P on November 19, 2008, 11:27 AM

Title: How to make a local copy of an ancient Web forum?
Post by: David.P on November 19, 2008, 11:27 AM
Hi Forum,

there is this stone-age Web forum by the L&H company that has ceased to exist already like a decade ago:

http://support.lhsl.com/databases/dragon/webdisc.nsf/

It is still operative but it is to be feared that the current site owner is simply going to shut the forum down pretty soon as he has done with other similar forums.

Therefore it would be great to find a way how to make a local copy of all that forum's threads in some way.

I have already tried Adobe Acrobat  :-[ and WebsitePacker (this is great, makes *.CHM files out of an entire site).

However both take ages and download literally Gigabytes of stuff because these tools are too stupid not to click on each and every link (especially the "Collapse" and "Expand" links next to every thread :mad:). Therefore, every single posting sort of gets downloaded like a dozen times instead only once, and it would take days, create hundreds of thousands of files and use up dozens of Gigabytes to download everything.

Any more ideas how I could actually rip that entire forum to one compact file or directory structure?

Thanks heaps already,
David.P
Title: Re: How to make a local copy of an ancient Web forum?
Post by: mouser on November 19, 2008, 11:38 AM
This is actually a great question -- i look forward to hearing the replies about what to do.
Do i understand correctly that you do not have access to the backend database? ie you just have normal forum member rights to view posts?
Title: Re: How to make a local copy of an ancient Web forum?
Post by: David.P on November 19, 2008, 11:46 AM
Do i understand correctly that you do not have access to the backend database? ie you just have normal forum member rights to view posts?

I have the same rights to that L&H forum e.g. as you, or as everyone else reading the present thread.

Probably it would need some sort of "intelligent" crawler that only downloads links up to a certain depth (like about depth "2" or something) while neglecting other links (like especially those "Collapse" and "Expand" links).

Cheers David.P
Title: Re: How to make a local copy of an ancient Web forum?
Post by: mouser on November 19, 2008, 12:26 PM
yeah for a forum, what you might be better off with is using a web spider thing but NOT in the mode that crawl a website, but rather in a mode that grabs all pages of the form:
https://www.donationcoder.com/forum/index.php?topic=1
https://www.donationcoder.com/forum/index.php?topic=2
https://www.donationcoder.com/forum/index.php?topic=3
etc.

The only tricky thing is for topics that are multiple pages long.
Title: Re: How to make a local copy of an ancient Web forum?
Post by: city_zen on November 21, 2008, 07:41 AM
Teleport Pro (http://www.tenmax.com/teleport/pro/features.htm) is supposedly the best of its class, but it's a bit expensive. It's probably worth taking a look at it, though, to see if it has the features you need for this job.
Title: Re: How to make a local copy of an ancient Web forum?
Post by: mouser on November 21, 2008, 07:46 AM
i still prefer Offline Explorer (http://www.metaproducts.com/mp/Offline_Explorer.htm) (Pro?) over teleport (last time i checked at least), but it is also expensive.
Title: Re: How to make a local copy of an ancient Web forum?
Post by: David.P on November 21, 2008, 08:09 AM
Thanks for the tips -- however it seems that basically, each of Httrack and BlackWidow 4.4 (both free) basically can do most of what those commercial website rippers offer.

Otoh -- Httrack, BlackWidow and also WebsitePacker all seem to load tens of thousands of links from that forum where actually there are only are a few thousand useful entries. It seems as if that forum's software generates unique URL's every time something is accessed. This is why every single posting of that forum gets checked and downloaded like a dozen times instead of only once.

So far, I think that BlackWidow is the best software since it can build an explorer-like tree representation of the web site without actually downloading everything first. Then in the end, you only pick the folder containing the stuff you want and download that folder.

It however still seems to take days for BlackWidow to build that folder tree (if it ever gets there at all), for the same "unique URL" reason above...  >:(

David.P
Title: Re: How to make a local copy of an ancient Web forum?
Post by: Paul Keith on November 21, 2008, 08:27 AM
I haven't tried these before. Can anyone provide an estimate for how much space it will take to download an entire forum/website on average? I'm wondering whether as a casual user, I'm better off with the Scrapbook Firefox extension though to be honest, I don't know much about capturing to a certain depth.
Title: Re: How to make a local copy of an ancient Web forum?
Post by: J-Mac on November 23, 2008, 01:20 AM
I haven't tried these before. Can anyone provide an estimate for how much space it will take to download an entire forum/website on average? I'm wondering whether as a casual user, I'm better off with the Scrapbook Firefox extension though to be honest, I don't know much about capturing to a certain depth.

That depends, Paul. What do you consider to be an average forum/website? "Average" is relative to each user, I've found.

However even a relatively small forum has tens of thousands of links and thousands of separate page views. I would guess that for most forums it would be more tha most users would want to store on their home computers. I'm talking here about the front-end views, of course. If you had access to a forum's back-end you could store just the database, which would be considerably smaller.

Jim
Title: Re: How to make a local copy of an ancient Web forum?
Post by: Paul Keith on November 23, 2008, 09:48 AM
Oh that's too bad. I would consider DonationCoder to be an average sized forum.
Title: Re: How to make a local copy of an ancient Web forum?
Post by: city_zen on November 23, 2008, 12:31 PM
I would consider DonationCoder to be an average sized forum

From the forum's frontpage:
Total Posts: 139,067
Total Topics: 14,933

IMHO, this forum is rather on the large-ish side ....

Anyway, maybe mouser can tell us the size of the forum's back end database in MBs, so we can have at least a ballpark approximation of how big a forum database can be. Yes, I do understand that it may vary GREATLY, depending on whether attachments and/or images are allowed or not, but still it'd be better than nothing

Title: Re: How to make a local copy of an ancient Web forum?
Post by: Paul Keith on November 23, 2008, 12:41 PM
city_zen, while that's true, I think just compared to the really popular forums, it falls short but my litmus test for a really active forum is one where you can't read one topic and go back to find that the entire first page has become different in a matter of minutes where only topics that are constantly churning out replies are the ones getting back up and new topics have to be constantly bumped to the top.
Title: Re: How to make a local copy of an ancient Web forum?
Post by: Lashiec on November 26, 2008, 09:04 AM
Compared to some behemoths, DonationCoder is almost like a grain in the sand. Just for reference, the IGN boards have exactly 189,401,031 messages in this exact moment. And they have a couple of users with more than 100,000 posts, so...
Title: Re: How to make a local copy of an ancient Web forum?
Post by: city_zen on November 26, 2008, 11:49 AM
Yup, you're right. Turns out Donation Coder is in fact an average sized forum, my "humble opinion" was wrong  :P
Since I don't usually participate in or even visit such huge forums, I wasn't aware of the gigantic size some of them have acquired.
After I realized it, it wasn't long before I thought "Hmmm, I wonder which is *the largest* forum in the WWW", which led me to this page: Big Boards Ranking (http://rankings.big-boards.com/)

 :o
What can I say? I can't believe there's actually a forum with over a billion posts and over 15 million members, but the info is there ...  :huh:
Title: Re: How to make a local copy of an ancient Web forum?
Post by: Paul Keith on November 26, 2008, 03:00 PM
Well Gaia Online is constantly fed by their in-game currency. You see, people there gain in-game money for every post they have which allows them to buy new stuff for their avatars.

To add to that, there are constantly poem/story contests there which promises big huge in-game money which further feed the post counts. Finally, before it jumped the shark, it had a decent Religion/Politics/etc. "Expanded Discussion" board that was on par with DonationCoder's friendliness except on popular topics.

Most of the discussions never reached the advanced stages of discussions but many also were able to avoid flame fests and the fact that, that section of the forum had no censorships, made it one of the more popular boards for awhile even when topics were repeatedly being made so it's something fundamentally different from the way you approach DonationCoder's more traditional forum design.

The same can be said for 4chan which allows for anonymous postings which if it isn't already helped by the porn in it, constantly creates and popularizes memes mainly due to it's anonymity and ease of inputting. It's kind of like the original more bad-ass version of Twitter and Plurk. (though the design isn't original and in fact has been copied from the design of many Japanese forums)

That's why these two are incomparable. The rest pretty much have 24/7 internet netizens topic in mind.
Title: Re: How to make a local copy of an ancient Web forum?
Post by: MrCrispy on November 26, 2008, 06:45 PM
Some speculation without having used the ripping software or knowing how the forum works -

1. if the various ways to get to the same post (or thread) actually resolve to the same url, then the ripper should only download it one. Do they do this?

2. In some forums, expand/collapse/quick reply etc are Ajax/Javascript actions and not a page load. Can these be ignored at the page level?

3. If #1 is not true, then I guess some form of content analysis (where the ripper would detect the page has been downloaded previously as it has the same html) could be used to detect duplicates. I'm sure no one does this since it won't be reliable and even if it was, would be slow

4. Can you tell the ripper to exclude certain links, like those matching "expand/collapse" etc?
Title: Re: How to make a local copy of an ancient Web forum?
Post by: f0dder on November 27, 2008, 02:24 AM
Compared to some behemoths, DonationCoder is almost like a grain in the sand. Just for reference, the IGN boards have exactly 189,401,031 messages in this exact moment. And they have a couple of users with more than 100,000 posts, so...
That's insane :-s
Title: Re: How to make a local copy of an ancient Web forum?
Post by: David.P on November 27, 2008, 02:28 AM
Wow, I can't believe that you simply could do that! Thanks, I'll look into this database and see what I can excerpt from it.

Otherwise, what I have found out is that regarding "spidering intelligence", the software Blackwidow is one of the best, since you can specify

a) which pages are scanned for your wanted links only; and
b) which pages are actually downloaded.

This way, Blackwidow actually manages to crawl from posting to posting in that forum, downloading only the posts and ignoring EVERYTHING else on the website. It also manages not to download every posting multiple times.

Blackwidow however seems to have a hard time to rewrite the html code such that it actually becomes browsable offline  :(

The latter however is what the program WinHttrack does marvelously. WinHttrack unfortunately has the drawback that it doesn't have such sophisticated filter settings as Blackwidow. With WinHttrack, you can't differentiate between pages that are only scanned for links and pages that are downloaded. Therefore, WinHttrack (that does a beautiful job in converting the pages for offline browsing) ends up downloading much more than what you actually are after.

Thanks everyone,
David.P

PS: Filter settings in Blackwidow:
(http://666kb.com/i/b46nq6k24ohovhhi7.jpg)
Title: Re: How to make a local copy of an ancient Web forum?
Post by: agentsteal on November 27, 2008, 11:28 AM
It wasn't simple.
Why did you want a copy of this forum?

--Edit:
A moderator deleted my post  :mad:

We had a number of comments that this content is not appropriate for donationcoder.com.

It is not ethical to hack into websites and it is not something we condone or want to encourage.

I don't know where you are based but in some countries admitting what you did could land you in a court case and possibly prison. (Certainly the UK and US are getting much tougher on people hacking). Just because a forum appears to be abandoned wouldn't give you any protection.

The forum belongs to a company that went bankrupt in 2001. I wasn't doing anything malicious I just made a copy of the forum database.
Title: Re: How to make a local copy of an ancient Web forum?
Post by: Carol Haynes on November 27, 2008, 12:02 PM
Sorry I didn't mean to upset anyone - even if the company went bust in 2001 some one owns the domain name and must be paying to host the site. I can't see how hacking into the backend of a website can really be condoned unless it is your site and you have lost the login credentials.
Title: Re: How to make a local copy of an ancient Web forum?
Post by: f0dder on November 27, 2008, 05:53 PM
agentsteal: what you did might not be unethical, but please keep in mind that mouser could probably be held responsible if a post like that was allowed on this forum. Laws are crazy like that.
Title: Re: How to make a local copy of an ancient Web forum?
Post by: city_zen on November 27, 2008, 08:49 PM
Laws are crazy like that.

Yup, f0dder, you're right. Laws are indeed crazy (http://www.theregister.co.uk/2007/11/05/fresno_uni_database_hack_charges/)  :o :(
Title: Re: How to make a local copy of an ancient Web forum?
Post by: gorinw13 on November 28, 2008, 01:25 PM


I suggest that you can use the power siphon software --- It is no longer available but it is a nice one.......

It even can compile the whole site as a single exe file if you like --- no files or directories -- a self viewing archive...

I can send the software to you if you do wish so --- the web site of the creator no longer exists...
Title: Re: Software to copy a part of a website - Power Siphon at the Archive.org.
Post by: IainB on March 16, 2017, 12:00 PM
I was wanting to copy and parcel up part of an old website, and re-found this thread.
I have posted this comment just to update this thread with some potentially useful new information that I discovered today.
After reading about the Power Siphon software mentioned by @gorinw13, I promptly did a duckgo search for "siphon software for web copy", and the archive.org link below was the 3rd result in the list.    :Thmbsup:
The relevant archived page is at:
https://archive.org/details/tucows_344044_Power_Siphon
The text of the web page is copied below sans embedded hyperlinks/images, but notice the bit I have highlighted at the bottom.
Power Siphon
by http://www.powersiphon.com

Published December 20, 2003
Topics Power Siphon, Internet, Web browsers and tools, Offline browsing, Power Siphon

This Web spider downloads Web sites and Web content that you specify and saves the information to your hard drive for offline use. You provide the URL of the home page or any other starting page and watch the progress of the download in real time. You can also compress downloaded content into an EXE file.
The program includes a built-in viewer with a slideshow mode, and you can use the wizard interface to define tasks. Other features includes Microsoft Access compatibility, database support, spell checking, indexing and the ability to create your own search engine.


Identifier tucows_344044_Power_Siphon
Date 2003-12-20
Creator http://www.powersiphon.com
Tucows_rating 4
Rights Shareware
Publisher Tucows Inc.
Mediatype software
Addeddate 2004-11-02 13:25:00
Publicdate 2004-11-08 17:51:00
Backup_location ia903600_6
Notes

Tucows, Inc has graciously donated a copy of this software to the Internet Archive's Tucows Software Archive for long term preservation and access. Please check the Tucows website for all current versions of the software.

So the Power Siphon software mentioned by @gorinw13 can be downloaded via that Archive, which has captured a copy of the website Power Siphon - http://www.powersiphon.com
The link for downloading the software is (and it works):
https://archive.org/download/tucows_344044_Power_Siphon/power_siphon_tucows_setup.exe
- it's a 10.5Mb executable installer file.

I think that's ruddy useful of them to make that available.
Title: Re: How to make a local copy of an ancient Web forum?
Post by: gamezntoyz on March 16, 2017, 04:23 PM
Nice find, man. This is highly useful.
Title: Re: How to make a local copy of an ancient Web forum?
Post by: app103 on March 18, 2017, 09:44 PM
I think that's ruddy useful of them to make that available.

Only partially useful in the case of shareware, since for any apps that are no longer available on the current software market (in any version), you probably wouldn't be able to purchase a license to keep using it. And in some cases for software that phones home as part of the installing process, you can't even install it to get the free 30 day trial. I know this from experience, since  have a folder containing about 5G of setup files, downloaded directly from Tucows, back in 1999, a gift from my father, included on a computer that he gave me in 2000 (I still don't even know what most of them do!)
Title: Re: How to make a local copy of an ancient Web forum?
Post by: David.P on May 01, 2017, 12:47 PM
Hi all,

I got that same problem again.

This is the forum to be downloaded this time:
http://www.dasgelbeforum.net/forum.php?page=1041 (http://www.dasgelbeforum.net/forum.php?page=1041)

The link above is actually the oldest/last page. So it would be sufficient to download those 1041 pages including all linked pages (only one level deep).

Any experiences which tool (http://alternativeto.net/software/teleport-pro/) would be the simplest to do this?

Thanks for opinions!

Title: Re: How to make a local copy of an ancient Web forum?
Post by: 40hz on May 01, 2017, 02:18 PM
Had very good luck with the HTTrack spidering website copier.

Free for download under GPL/Libre. Website here (https://www.httrack.com/page/1/en/index.html).
Title: Re: How to make a local copy of an ancient Web forum?
Post by: David.P on May 02, 2017, 04:27 AM
OK thanks!

At the moment, I try BlackWidow (https://www.softbytelabs.com/en/BlackWidow/). BlackWidow simply creates an Explorer-like view of the entire website, and after the scan is finished, you download (only) the parts you want.

(http://i.imgur.com/VIlk9QL.jpg)

Does HTTrack allow a similar approach?