ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > Living Room

Reocities: the GeoCities one-man rescue project

(1/3) > >>

40hz:
As most of you know, GeoCities is now a part of Internet history.

Apparently Yahoo! was also intent on having it fade into obscurity despite numerous outside offers to assist in backing it up for the historic record.

What follows is the tale of one guy who decided to go a little further than just offer.

From our friends over at Download Squad comes this story:

Link: http://www.downloadsquad.com/2009/10/29/reocities-because-geocities-is-gone-but-not-forgotten/

Reocities: because Geocities is gone, but not forgotten
by Jay Hathaway (RSS feed) Oct 29th 2009 at 1:00PM


When Yahoo! decided to close down GeoCities, a lot of us shed a single tear for our first home on the Internet and moved on. For one man called Jacques, though, that wasn't good enough.

He took it upon himself to save as much of GeoCities as possible, by writing scripts that pinged the site to findAnd aactive pages, and then downloaded them to his personal storage space. The one-man project, called Reocities, rescued an estimated 600,000 GeoCities sites before the big shutdown.
--- End quote ---

The above article has links to the Retrocities homepage ( http://reocities.com):

Welcome To ReoCities...

Here lies what we could salvage from the ashes of GeoCities.

Yahoo! has done an amazing thing by keeping GeoCities alive for as long as they did, but we feel that it is a waste to leave the Internet with a hole of this magnitude. At a minimum, Yahoo! could have simply left GeoCities as a monument to the early days.
Maybe close it off from editing and simply make it static after getting rid of the spam pages once and for all.
Behind this minimalistic page stretches a wealth of Internet history. If any of it was yours and we have successfully recovered it, then we hope it makes you happy to see it restored.

We've rebuilt the walls to the Cities and the streets where a large part of the early settlers of the World Wide Web used to live in. You can still find them where they were before, but not all of the houses have been rebuilt yet.

As time passes, we will try to recover more and more of what was lost, at least as much as is technically possible. If you wish to help with this effort, and you have your old GeoCities content backed up, then please email us at [email protected], but *not* before we've stopped importing the data that we have right now.
--- End quote ---

- and the link to a "making of" page which gives some insight into what's involved in snagging a copy of something the size of GeoCities when the clock is running out:

Link: http://www.reocities.com/newhome/makingof.html

#
Size

GeoCities is large. Very, very large. Not when compared to, say, the likes of MySpace or Facebook. But compared to your average garden variety website, it is huge. Given that, when GeoCities first launched in 1994, the average hard drive was somewhere around 500 MB, to store multiple hundreds of gigabytes must have been a complicated technological feat to achieve.

RAID was already around, but those 'inexpensive' disks were, for the most part, not that inexpensive. Storage technology was several orders of magnitude slower and had a smaller capacity than today. In spite of all that, you can't just go and make a copy like you could do with any other set of page. Yesterday's giants are still pretty big.

#
Number Of Files

GeoCities comprises hundreds of millions of files in all kinds of formats, and the most important part of the link structure, the .html and .htm files, were made in an age when FrontPage was considered hot stuff.

To avoid overflowing the directory structure on the machines that GeoCities was using, they opted for a tree based format. This meant that any one of the Cities was subdivided into Neighborhoods, and each one had 10 000 accounts, maximum.
--- End quote ---

How's this for a website backup toolkit?

The ingredients:

    
* 1 iconic website about to be erased
*      21 pots of strong tea
*      more sugar than is probably healthy
*      very little sleep
*      some computing gear
*      one solid Internet connection
*      6 days in October 2009
*      Some very good help (Thanks Abi!)


--- End quote ---

And if you think this project was nothing more than a raw download job, check this out:

21:00 PM, Friday, 23 October 2009 - The Secret Weapon

At this point in time there are only 44 hours to go until it is permanently curtains for GeoCities. We're talking Friday to Saturday night, and I realize that if I don't do something drastic, then this effort is going to fail.

So, enter the secret weapon. A couple of years ago I wrote a small (about 1 billion pages) search engine. For that purpose I bought a cluster of 5 machines, which have since been upgraded with 4 TB storage each, and already had a fairly beefy CPU. They're also connected to the net with some good uplinks and have a 1Gb/s connection between them to a dedicated switch. Time to get those guys involved.

Now that we know the structure of GeoCities, it is possible to farm out the fetching of pieces to each of the cluster nodes. A small program figures out who is busy with what, and each cluster can concentrate on one of the 721 shelves, and the 10 000 possible accounts on that shelf. In the past 4 days some of those shelves have already seen extensive coverage, so we mark those as done, leaving about half to be processed still. After a few more hours to get this all set up the cluster was humming along at 150 Mb/s inbound. That's a CD every 30 seconds or so!
--- End quote ---

Did he say some computing gear? A 5-machine cluster and a self-written search engine? He calls that some computing gear? Talk about a certified propeller-head!


Very cool stuff! 8)

----
P.S. Congratulations Jacques - whoever you are. Because of your efforts to salvage a piece of Internet history, you've made a place in its history books for yourself. Not too shabby for a one week project, hey?  ;D:Thmbsup:





superboyac:
Ah...Geocities.  I remember it.  I think I had a website on it.  The only thing I had on the site was that stupid animated gif of the hand coming out and grabbing something.  I thought that was so cool at the time.

[edit]
HAHA.  I found it...so stupid!

f0dder:
Oh, GeoCities... last update time on my page there is ~12 years ago. The memories :)

JavaJones:
What about archive.org? Was Geocities as a rule not spidered? Note: I didn't read any of the articles, so my apologies if it's explained within. ;)

- Oshyan

app103:
Archive Team and textfiles.com managed to save the rest of it and cramned it all into a single page. It sure is a sight to behold.  :D

Navigation

[0] Message Index

[#] Next page

Go to full version