topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday December 12, 2024, 3:35 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Last post Author Topic: How to make a local copy of an ancient Web forum?  (Read 30155 times)

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 208
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
How to make a local copy of an ancient Web forum?
« on: November 19, 2008, 11:27 AM »
Hi Forum,

there is this stone-age Web forum by the L&H company that has ceased to exist already like a decade ago:

http://support.lhsl..../dragon/webdisc.nsf/

It is still operative but it is to be feared that the current site owner is simply going to shut the forum down pretty soon as he has done with other similar forums.

Therefore it would be great to find a way how to make a local copy of all that forum's threads in some way.

I have already tried Adobe Acrobat  :-[ and WebsitePacker (this is great, makes *.CHM files out of an entire site).

However both take ages and download literally Gigabytes of stuff because these tools are too stupid not to click on each and every link (especially the "Collapse" and "Expand" links next to every thread :mad:). Therefore, every single posting sort of gets downloaded like a dozen times instead only once, and it would take days, create hundreds of thousands of files and use up dozens of Gigabytes to download everything.

Any more ideas how I could actually rip that entire forum to one compact file or directory structure?

Thanks heaps already,
David.P

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,914
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #1 on: November 19, 2008, 11:38 AM »
This is actually a great question -- i look forward to hearing the replies about what to do.
Do i understand correctly that you do not have access to the backend database? ie you just have normal forum member rights to view posts?

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 208
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #2 on: November 19, 2008, 11:46 AM »
Do i understand correctly that you do not have access to the backend database? ie you just have normal forum member rights to view posts?

I have the same rights to that L&H forum e.g. as you, or as everyone else reading the present thread.

Probably it would need some sort of "intelligent" crawler that only downloads links up to a certain depth (like about depth "2" or something) while neglecting other links (like especially those "Collapse" and "Expand" links).

Cheers David.P

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,914
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #3 on: November 19, 2008, 12:26 PM »
yeah for a forum, what you might be better off with is using a web spider thing but NOT in the mode that crawl a website, but rather in a mode that grabs all pages of the form:
https://www.donation...um/index.php?topic=1
https://www.donation...um/index.php?topic=2
https://www.donation...um/index.php?topic=3
etc.

The only tricky thing is for topics that are multiple pages long.

city_zen

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 134
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #4 on: November 21, 2008, 07:41 AM »
Teleport Pro is supposedly the best of its class, but it's a bit expensive. It's probably worth taking a look at it, though, to see if it has the features you need for this job.
I'll have what she's having

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,914
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #5 on: November 21, 2008, 07:46 AM »
i still prefer Offline Explorer (Pro?) over teleport (last time i checked at least), but it is also expensive.

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 208
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #6 on: November 21, 2008, 08:09 AM »
Thanks for the tips -- however it seems that basically, each of Httrack and BlackWidow 4.4 (both free) basically can do most of what those commercial website rippers offer.

Otoh -- Httrack, BlackWidow and also WebsitePacker all seem to load tens of thousands of links from that forum where actually there are only are a few thousand useful entries. It seems as if that forum's software generates unique URL's every time something is accessed. This is why every single posting of that forum gets checked and downloaded like a dozen times instead of only once.

So far, I think that BlackWidow is the best software since it can build an explorer-like tree representation of the web site without actually downloading everything first. Then in the end, you only pick the folder containing the stuff you want and download that folder.

It however still seems to take days for BlackWidow to build that folder tree (if it ever gets there at all), for the same "unique URL" reason above...  >:(

David.P

Paul Keith

  • Member
  • Joined in 2008
  • **
  • Posts: 1,989
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #7 on: November 21, 2008, 08:27 AM »
I haven't tried these before. Can anyone provide an estimate for how much space it will take to download an entire forum/website on average? I'm wondering whether as a casual user, I'm better off with the Scrapbook Firefox extension though to be honest, I don't know much about capturing to a certain depth.

J-Mac

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 2,918
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #8 on: November 23, 2008, 01:20 AM »
I haven't tried these before. Can anyone provide an estimate for how much space it will take to download an entire forum/website on average? I'm wondering whether as a casual user, I'm better off with the Scrapbook Firefox extension though to be honest, I don't know much about capturing to a certain depth.

That depends, Paul. What do you consider to be an average forum/website? "Average" is relative to each user, I've found.

However even a relatively small forum has tens of thousands of links and thousands of separate page views. I would guess that for most forums it would be more tha most users would want to store on their home computers. I'm talking here about the front-end views, of course. If you had access to a forum's back-end you could store just the database, which would be considerably smaller.

Jim

Paul Keith

  • Member
  • Joined in 2008
  • **
  • Posts: 1,989
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #9 on: November 23, 2008, 09:48 AM »
Oh that's too bad. I would consider DonationCoder to be an average sized forum.

city_zen

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 134
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #10 on: November 23, 2008, 12:31 PM »
I would consider DonationCoder to be an average sized forum

From the forum's frontpage:
Total Posts: 139,067
Total Topics: 14,933

IMHO, this forum is rather on the large-ish side ....

Anyway, maybe mouser can tell us the size of the forum's back end database in MBs, so we can have at least a ballpark approximation of how big a forum database can be. Yes, I do understand that it may vary GREATLY, depending on whether attachments and/or images are allowed or not, but still it'd be better than nothing

I'll have what she's having

Paul Keith

  • Member
  • Joined in 2008
  • **
  • Posts: 1,989
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #11 on: November 23, 2008, 12:41 PM »
city_zen, while that's true, I think just compared to the really popular forums, it falls short but my litmus test for a really active forum is one where you can't read one topic and go back to find that the entire first page has become different in a matter of minutes where only topics that are constantly churning out replies are the ones getting back up and new topics have to be constantly bumped to the top.

Lashiec

  • Member
  • Joined in 2006
  • **
  • Posts: 2,374
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #12 on: November 26, 2008, 09:04 AM »
Compared to some behemoths, DonationCoder is almost like a grain in the sand. Just for reference, the IGN boards have exactly 189,401,031 messages in this exact moment. And they have a couple of users with more than 100,000 posts, so...

city_zen

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 134
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #13 on: November 26, 2008, 11:49 AM »
Yup, you're right. Turns out Donation Coder is in fact an average sized forum, my "humble opinion" was wrong  :P
Since I don't usually participate in or even visit such huge forums, I wasn't aware of the gigantic size some of them have acquired.
After I realized it, it wasn't long before I thought "Hmmm, I wonder which is *the largest* forum in the WWW", which led me to this page: Big Boards Ranking

 :o
What can I say? I can't believe there's actually a forum with over a billion posts and over 15 million members, but the info is there ...  :huh:
I'll have what she's having

Paul Keith

  • Member
  • Joined in 2008
  • **
  • Posts: 1,989
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #14 on: November 26, 2008, 03:00 PM »
Well Gaia Online is constantly fed by their in-game currency. You see, people there gain in-game money for every post they have which allows them to buy new stuff for their avatars.

To add to that, there are constantly poem/story contests there which promises big huge in-game money which further feed the post counts. Finally, before it jumped the shark, it had a decent Religion/Politics/etc. "Expanded Discussion" board that was on par with DonationCoder's friendliness except on popular topics.

Most of the discussions never reached the advanced stages of discussions but many also were able to avoid flame fests and the fact that, that section of the forum had no censorships, made it one of the more popular boards for awhile even when topics were repeatedly being made so it's something fundamentally different from the way you approach DonationCoder's more traditional forum design.

The same can be said for 4chan which allows for anonymous postings which if it isn't already helped by the porn in it, constantly creates and popularizes memes mainly due to it's anonymity and ease of inputting. It's kind of like the original more bad-ass version of Twitter and Plurk. (though the design isn't original and in fact has been copied from the design of many Japanese forums)

That's why these two are incomparable. The rest pretty much have 24/7 internet netizens topic in mind.

MrCrispy

  • Participant
  • Joined in 2006
  • *
  • Posts: 332
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #15 on: November 26, 2008, 06:45 PM »
Some speculation without having used the ripping software or knowing how the forum works -

1. if the various ways to get to the same post (or thread) actually resolve to the same url, then the ripper should only download it one. Do they do this?

2. In some forums, expand/collapse/quick reply etc are Ajax/Javascript actions and not a page load. Can these be ignored at the page level?

3. If #1 is not true, then I guess some form of content analysis (where the ripper would detect the page has been downloaded previously as it has the same html) could be used to detect duplicates. I'm sure no one does this since it won't be reliable and even if it was, would be slow

4. Can you tell the ripper to exclude certain links, like those matching "expand/collapse" etc?

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,153
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #16 on: November 27, 2008, 02:24 AM »
Compared to some behemoths, DonationCoder is almost like a grain in the sand. Just for reference, the IGN boards have exactly 189,401,031 messages in this exact moment. And they have a couple of users with more than 100,000 posts, so...
That's insane :-s
- carpe noctem

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 208
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #17 on: November 27, 2008, 02:28 AM »
Wow, I can't believe that you simply could do that! Thanks, I'll look into this database and see what I can excerpt from it.

Otherwise, what I have found out is that regarding "spidering intelligence", the software Blackwidow is one of the best, since you can specify

a) which pages are scanned for your wanted links only; and
b) which pages are actually downloaded.

This way, Blackwidow actually manages to crawl from posting to posting in that forum, downloading only the posts and ignoring EVERYTHING else on the website. It also manages not to download every posting multiple times.

Blackwidow however seems to have a hard time to rewrite the html code such that it actually becomes browsable offline  :(

The latter however is what the program WinHttrack does marvelously. WinHttrack unfortunately has the drawback that it doesn't have such sophisticated filter settings as Blackwidow. With WinHttrack, you can't differentiate between pages that are only scanned for links and pages that are downloaded. Therefore, WinHttrack (that does a beautiful job in converting the pages for offline browsing) ends up downloading much more than what you actually are after.

Thanks everyone,
David.P

PS: Filter settings in Blackwidow:


agentsteal

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 75
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #18 on: November 27, 2008, 11:28 AM »
It wasn't simple.
Why did you want a copy of this forum?

--Edit:
A moderator deleted my post  :mad:

We had a number of comments that this content is not appropriate for donationcoder.com.

It is not ethical to hack into websites and it is not something we condone or want to encourage.

I don't know where you are based but in some countries admitting what you did could land you in a court case and possibly prison. (Certainly the UK and US are getting much tougher on people hacking). Just because a forum appears to be abandoned wouldn't give you any protection.

The forum belongs to a company that went bankrupt in 2001. I wasn't doing anything malicious I just made a copy of the forum database.

Carol Haynes

  • Waffles for England (patent pending)
  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 8,069
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #19 on: November 27, 2008, 12:02 PM »
Sorry I didn't mean to upset anyone - even if the company went bust in 2001 some one owns the domain name and must be paying to host the site. I can't see how hacking into the backend of a website can really be condoned unless it is your site and you have lost the login credentials.

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,153
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #20 on: November 27, 2008, 05:53 PM »
agentsteal: what you did might not be unethical, but please keep in mind that mouser could probably be held responsible if a post like that was allowed on this forum. Laws are crazy like that.
- carpe noctem

city_zen

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 134
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #21 on: November 27, 2008, 08:49 PM »
Laws are crazy like that.

Yup, f0dder, you're right. Laws are indeed crazy  :o :(
I'll have what she's having

gorinw13

  • Member
  • Joined in 2006
  • **
  • Posts: 63
  • Hi There !!!!
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #22 on: November 28, 2008, 01:25 PM »


I suggest that you can use the power siphon software --- It is no longer available but it is a nice one.......

It even can compile the whole site as a single exe file if you like --- no files or directories -- a self viewing archive...

I can send the software to you if you do wish so --- the web site of the creator no longer exists...

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,544
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
I was wanting to copy and parcel up part of an old website, and re-found this thread.
I have posted this comment just to update this thread with some potentially useful new information that I discovered today.
After reading about the Power Siphon software mentioned by @gorinw13, I promptly did a duckgo search for "siphon software for web copy", and the archive.org link below was the 3rd result in the list.    :Thmbsup:
The relevant archived page is at:
https://archive.org/details/tucows_344044_Power_Siphon
The text of the web page is copied below sans embedded hyperlinks/images, but notice the bit I have highlighted at the bottom.
Power Siphon
by http://www.powersiphon.com

Published December 20, 2003
Topics Power Siphon, Internet, Web browsers and tools, Offline browsing, Power Siphon

This Web spider downloads Web sites and Web content that you specify and saves the information to your hard drive for offline use. You provide the URL of the home page or any other starting page and watch the progress of the download in real time. You can also compress downloaded content into an EXE file.
The program includes a built-in viewer with a slideshow mode, and you can use the wizard interface to define tasks. Other features includes Microsoft Access compatibility, database support, spell checking, indexing and the ability to create your own search engine.


Identifier tucows_344044_Power_Siphon
Date 2003-12-20
Creator http://www.powersiphon.com
Tucows_rating 4
Rights Shareware
Publisher Tucows Inc.
Mediatype software
Addeddate 2004-11-02 13:25:00
Publicdate 2004-11-08 17:51:00
Backup_location ia903600_6
Notes

Tucows, Inc has graciously donated a copy of this software to the Internet Archive's Tucows Software Archive for long term preservation and access. Please check the Tucows website for all current versions of the software.

So the Power Siphon software mentioned by @gorinw13 can be downloaded via that Archive, which has captured a copy of the website Power Siphon - http://www.powersiphon.com
The link for downloading the software is (and it works):
https://archive.org/download/tucows_344044_Power_Siphon/power_siphon_tucows_setup.exe
- it's a 10.5Mb executable installer file.

I think that's ruddy useful of them to make that available.

gamezntoyz

  • Participant
  • Joined in 2016
  • *
  • default avatar
  • Posts: 1
    • View Profile
    • Donate to Member
Re: How to make a local copy of an ancient Web forum?
« Reply #24 on: March 16, 2017, 04:23 PM »
Nice find, man. This is highly useful.