Apparently Yahoo! was also intent on having it fade into obscurity despite numerous outside offers to assist in backing it up for the historic record.
What follows is the tale of one guy who decided to go a little further than just offer.
From our friends over at Download Squad comes this story:
Link: http://www.downloadsquad....s-gone-but-not-forgotten/
Quote
Reocities: because Geocities is gone, but not forgotten
by Jay Hathaway (RSS feed) Oct 29th 2009 at 1:00PM
When Yahoo! decided to close down GeoCities, a lot of us shed a single tear for our first home on the Internet and moved on. For one man called Jacques, though, that wasn't good enough.
He took it upon himself to save as much of GeoCities as possible, by writing scripts that pinged the site to findAnd aactive pages, and then downloaded them to his personal storage space. The one-man project, called Reocities, rescued an estimated 600,000 GeoCities sites before the big shutdown.
by Jay Hathaway (RSS feed) Oct 29th 2009 at 1:00PM
When Yahoo! decided to close down GeoCities, a lot of us shed a single tear for our first home on the Internet and moved on. For one man called Jacques, though, that wasn't good enough.
He took it upon himself to save as much of GeoCities as possible, by writing scripts that pinged the site to findAnd aactive pages, and then downloaded them to his personal storage space. The one-man project, called Reocities, rescued an estimated 600,000 GeoCities sites before the big shutdown.
The above article has links to the Retrocities homepage ( http://reocities.com):
Quote
Welcome To ReoCities...
Here lies what we could salvage from the ashes of GeoCities.
Yahoo! has done an amazing thing by keeping GeoCities alive for as long as they did, but we feel that it is a waste to leave the Internet with a hole of this magnitude. At a minimum, Yahoo! could have simply left GeoCities as a monument to the early days.
Maybe close it off from editing and simply make it static after getting rid of the spam pages once and for all.
Behind this minimalistic page stretches a wealth of Internet history. If any of it was yours and we have successfully recovered it, then we hope it makes you happy to see it restored.
We've rebuilt the walls to the Cities and the streets where a large part of the early settlers of the World Wide Web used to live in. You can still find them where they were before, but not all of the houses have been rebuilt yet.
As time passes, we will try to recover more and more of what was lost, at least as much as is technically possible. If you wish to help with this effort, and you have your old GeoCities content backed up, then please email us at j@ww.com, but *not* before we've stopped importing the data that we have right now.
Here lies what we could salvage from the ashes of GeoCities.
Yahoo! has done an amazing thing by keeping GeoCities alive for as long as they did, but we feel that it is a waste to leave the Internet with a hole of this magnitude. At a minimum, Yahoo! could have simply left GeoCities as a monument to the early days.
Maybe close it off from editing and simply make it static after getting rid of the spam pages once and for all.
Behind this minimalistic page stretches a wealth of Internet history. If any of it was yours and we have successfully recovered it, then we hope it makes you happy to see it restored.
We've rebuilt the walls to the Cities and the streets where a large part of the early settlers of the World Wide Web used to live in. You can still find them where they were before, but not all of the houses have been rebuilt yet.
As time passes, we will try to recover more and more of what was lost, at least as much as is technically possible. If you wish to help with this effort, and you have your old GeoCities content backed up, then please email us at j@ww.com, but *not* before we've stopped importing the data that we have right now.
- and the link to a "making of" page which gives some insight into what's involved in snagging a copy of something the size of GeoCities when the clock is running out:
Link: http://www.reocities.com/newhome/makingof.html
Quote
#
Size
GeoCities is large. Very, very large. Not when compared to, say, the likes of MySpace or Facebook. But compared to your average garden variety website, it is huge. Given that, when GeoCities first launched in 1994, the average hard drive was somewhere around 500 MB, to store multiple hundreds of gigabytes must have been a complicated technological feat to achieve.
RAID was already around, but those 'inexpensive' disks were, for the most part, not that inexpensive. Storage technology was several orders of magnitude slower and had a smaller capacity than today. In spite of all that, you can't just go and make a copy like you could do with any other set of page. Yesterday's giants are still pretty big.
#
Number Of Files
GeoCities comprises hundreds of millions of files in all kinds of formats, and the most important part of the link structure, the .html and .htm files, were made in an age when FrontPage was considered hot stuff.
To avoid overflowing the directory structure on the machines that GeoCities was using, they opted for a tree based format. This meant that any one of the Cities was subdivided into Neighborhoods, and each one had 10 000 accounts, maximum.
Size
GeoCities is large. Very, very large. Not when compared to, say, the likes of MySpace or Facebook. But compared to your average garden variety website, it is huge. Given that, when GeoCities first launched in 1994, the average hard drive was somewhere around 500 MB, to store multiple hundreds of gigabytes must have been a complicated technological feat to achieve.
RAID was already around, but those 'inexpensive' disks were, for the most part, not that inexpensive. Storage technology was several orders of magnitude slower and had a smaller capacity than today. In spite of all that, you can't just go and make a copy like you could do with any other set of page. Yesterday's giants are still pretty big.
#
Number Of Files
GeoCities comprises hundreds of millions of files in all kinds of formats, and the most important part of the link structure, the .html and .htm files, were made in an age when FrontPage was considered hot stuff.
To avoid overflowing the directory structure on the machines that GeoCities was using, they opted for a tree based format. This meant that any one of the Cities was subdivided into Neighborhoods, and each one had 10 000 accounts, maximum.
How's this for a website backup toolkit?
Quote
The ingredients:
- 1 iconic website about to be erased
- 21 pots of strong tea
- more sugar than is probably healthy
- very little sleep
- some computing gear
- one solid Internet connection
- 6 days in October 2009
- Some very good help (Thanks Abi!)
And if you think this project was nothing more than a raw download job, check this out:
Quote
21:00 PM, Friday, 23 October 2009 - The Secret Weapon
At this point in time there are only 44 hours to go until it is permanently curtains for GeoCities. We're talking Friday to Saturday night, and I realize that if I don't do something drastic, then this effort is going to fail.
So, enter the secret weapon. A couple of years ago I wrote a small (about 1 billion pages) search engine. For that purpose I bought a cluster of 5 machines, which have since been upgraded with 4 TB storage each, and already had a fairly beefy CPU. They're also connected to the net with some good uplinks and have a 1Gb/s connection between them to a dedicated switch. Time to get those guys involved.
Now that we know the structure of GeoCities, it is possible to farm out the fetching of pieces to each of the cluster nodes. A small program figures out who is busy with what, and each cluster can concentrate on one of the 721 shelves, and the 10 000 possible accounts on that shelf. In the past 4 days some of those shelves have already seen extensive coverage, so we mark those as done, leaving about half to be processed still. After a few more hours to get this all set up the cluster was humming along at 150 Mb/s inbound. That's a CD every 30 seconds or so!
At this point in time there are only 44 hours to go until it is permanently curtains for GeoCities. We're talking Friday to Saturday night, and I realize that if I don't do something drastic, then this effort is going to fail.
So, enter the secret weapon. A couple of years ago I wrote a small (about 1 billion pages) search engine. For that purpose I bought a cluster of 5 machines, which have since been upgraded with 4 TB storage each, and already had a fairly beefy CPU. They're also connected to the net with some good uplinks and have a 1Gb/s connection between them to a dedicated switch. Time to get those guys involved.
Now that we know the structure of GeoCities, it is possible to farm out the fetching of pieces to each of the cluster nodes. A small program figures out who is busy with what, and each cluster can concentrate on one of the 721 shelves, and the 10 000 possible accounts on that shelf. In the past 4 days some of those shelves have already seen extensive coverage, so we mark those as done, leaving about half to be processed still. After a few more hours to get this all set up the cluster was humming along at 150 Mb/s inbound. That's a CD every 30 seconds or so!
Did he say some computing gear? A 5-machine cluster and a self-written search engine? He calls that some computing gear? Talk about a certified propeller-head!
Very cool stuff!
----
P.S. Congratulations Jacques - whoever you are. Because of your efforts to salvage a piece of Internet history, you've made a place in its history books for yourself. Not too shabby for a one week project, hey?
:Thmbsup:








Logged



- carpe noctem








