Welcome Guest.   Make a donation to an author on the site September 17, 2014, 06:36:20 AM  *

Please login or register.
Or did you miss your validation email?


Login with username and password (forgot your password?)
Why not become a lifetime supporting member of the site with a one-time donation of any amount? Your donation entitles you to a ton of additional benefits, including access to exclusive discounts and downloads, the ability to enter monthly free software drawings, and a single non-expiring license key for all of our programs.


You must sign up here before you can post and access some areas of the site. Registration is totally free and confidential.
 
The N.A.N.Y. Challenge 2010! Download 24 custom programs!
   
   Forum Home   Thread Marks Chat! Downloads Search Login Register  
Pages: [1] 2 Next   Go Down
  Reply  |  New Topic  |  Print  
Author Topic: Cody got into the electrical cables again .. Server went down on 12/8/11  (Read 6748 times)
mouser
First Author
Administrator
*****
Posts: 33,353



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« on: December 07, 2011, 10:02:06 PM »

Actually it was the entire hosting company (Softlayer) network where the DC server lives.. They lost power for a while.
Sorry for the downtime, and a huge thanks to Gothic for bringing everything back up cleanly.  thumbs up
« Last Edit: December 07, 2011, 10:24:27 PM by mouser » Logged
tomos
Charter Member
***
Posts: 8,518



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #1 on: December 07, 2011, 10:17:25 PM »

Thought there was some funny business going on,
welcome back ;-)


EDIT/ do you mean 1st December by 12/1?
(Site was down an hour ago or so - dont know for how long - but this often happens around this time due to backup - usually get a message though)
« Last Edit: December 07, 2011, 10:26:24 PM by tomos » Logged

Tom
tomos
Charter Member
***
Posts: 8,518



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #2 on: December 07, 2011, 10:28:13 PM »


you got your edit in there just before mine...
Logged

Tom
rgdot
Supporting Member
**
Posts: 1,611


View Profile WWW Give some DonationCredits to this forum member
« Reply #3 on: December 07, 2011, 11:17:27 PM »

Cody did a bit more damage it seems. Why is this post, for example, being time stamped 11 something PM? It's 6:09AM Eastern right now.
Logged
mouser
First Author
Administrator
*****
Posts: 33,353



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #4 on: December 08, 2011, 05:25:52 AM »

thanks for pointing that out rgdot.. that seems to happen every time the server is rebooted.  should be fixed now.
Logged
Renegade
Charter Member
***
Posts: 11,358



Tell me something you don't know...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #5 on: December 08, 2011, 05:31:30 AM »

It took a long time.

I don't know WTF is up with the Seattle data center... I'm in Texas, and never have any problems there. None.

Isn't this like the second time they've had that same problem?
Logged

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
rgdot
Supporting Member
**
Posts: 1,611


View Profile WWW Give some DonationCredits to this forum member
« Reply #6 on: December 08, 2011, 05:32:08 AM »

Thanks mouser and Gothic
Logged
mahesh2k
Supporting Member
**
Posts: 1,408



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #7 on: December 08, 2011, 02:32:43 PM »

I thought cody took my Alien vs Cody image seriously  tongue
Logged
IainB
Supporting Member
**
Posts: 4,704


Slartibartfarst

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #8 on: December 08, 2011, 03:55:47 PM »

Actually it was the entire hosting company (Softlayer) network where the DC server lives.. They lost power for a while.

Is it normal for hosting companies to have power outages?
If so, then maybe I am missing something here, because I don't understand that.
By training, whenever I have been involved in setting up data centres, I have had to ensure that the risk mitigation plan always insists on there being at least four basic built-in redundancies (i.e., quite apart from computer system redundancies):
  • dual/backup air conditioning systems.
  • dual telecomms links (using two different telco suppler networks).
  • onsite single or dual backup diesel power generators - which automatically kick in when the power dies/fluctuates.
  • interim UPS (batteries) for server systems. (This supply allows sufficient time for the generators to get up to full capacity after they automatically kick in.)

If your hosting contract expressly excludes these things, then presumably you would receive service at a significant discount.
Logged
mouser
First Author
Administrator
*****
Posts: 33,353



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #9 on: December 08, 2011, 04:19:51 PM »

the crazy thing is we are paying for an expensive hosting company (softlayer) specifically because they are supposed to be one of the most reliable companies with the best redundancies, etc.
Logged
IainB
Supporting Member
**
Posts: 4,704


Slartibartfarst

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #10 on: December 08, 2011, 07:14:56 PM »

the crazy thing is we are paying for an expensive hosting company (softlayer) specifically because they are supposed to be one of the most reliable companies with the best redundancies, etc.
Well then, I would recommend (if you don't mind, and I'm not trying to teach you to suck eggs), from l-o-n-g experience of IT service contacts on both sides (customer/supplier), that you carefully scrutinise your contract and/or SLA (Service Level Agreement) for conditions and in particular penalty clauses in the event of deteriorated service or loss of service.

If there are any penalty clause provisions, then I think (from memory) you could legally claim either of:
(a) actual or reasonable notional consequential costs, or loss of revenue/profit arising from the outage.
or:
(b) punitive damages ("And don't do it again!" type damages)
- but not both.

If there isn't any penalty clause, then you may have unwittingly signed a contract with no teeth for the customer in the event of an outage such as this. A contract for "All care and no responsibility".

If you get no financial recompense for the outage, and if you believe that you are:
Quote
... paying for an expensive hosting company (softlayer) specifically because they are supposed to be one of the most reliable companies with the best redundancies, etc.
- then I'd suggest that you may have been paying out money under false pretences for as long as you have been using that supplier, and should swap suppliers because of that fact alone, and ask for a full/partial refund.

If you told your account manager/rep. that you were considering this, then it might be interesting to see what sort of response that gets.
  • A favourable (to you) response would probably indicate that they are interested in holding onto your business.
  • An unfavourable response would probably give the lie to any notions or expectations of "customer care and QOS" that you might have held regarding this supplier.

As to the outage itself, if it really shouldn't have happened because the supplier - to your knowledge - had all the appropriate redundancies/backups in place, then - by definition - that could means that there was a process failure somewhere.
In my experience the main thing that usually gets in the way of a good service manager providing his services to meet an SLA is a pencil-head (usually an accountant). So be on the lookout for that as a possibility. They may have been cost-cutting and hoping to get away with it.
Logged
IainB
Supporting Member
**
Posts: 4,704


Slartibartfarst

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #11 on: December 08, 2011, 07:49:51 PM »

I was very curious as to what Softlayer had by way of the things I listed above:
Quote
  • dual/backup air conditioning systems.
  • dual telecomms links (using two different telco suppler networks).
  • onsite single or dual backup diesel power generators - which automatically kick in when the power dies/fluctuates.
  • interim UPS (batteries) for server systems. (This supply allows sufficient time for the generators to get up to full capacity after they automatically kick in.)
So I got onto the main website and clicked on "Chat with a real person" and type-chatted with one "Austin P".
He pointed me to: Data Centres

From that, it looks like a really well set-up and professional outfit. They seem to have all the usual power and battery backups sorted.

So, what was their explanation for the outage that hit you?
And did it hit all their customers similarly, from that data centre?
Enquiring minds need to know.
Customers hit by the outage would expect to be told.
Logged
mouser
First Author
Administrator
*****
Posts: 33,353



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #12 on: December 08, 2011, 07:55:49 PM »

Quote
And did it hit all their customers similarly, from that data centre?

yeah it took down the whole data center in seattle as far as i know.

here's what they said to us.. though i would take these things with a gain of salt as my experience is hosting companies stretch the truth a bit to paint things rosier than what they were:

Quote
Around 02:54 UTC on 08-DEC-2011 02:54, there was a disruption to utility power to our SEA01 datacenter facility. The backup generator / UPS system subsequently experienced trouble backing up the critical load, and at 03:16 UTC on 08-DEC-2011 a section of our datacenter went offline due to power loss. This power outage took our site core routers offline, which caused a total network disruption to servers in the SEA01 facility. Additionally, some servers lost power.

Power was restored for the network equipment at 03:32 UTC on 08-DEC-2011, at which time servers in pod 02 in SEA01 (fcr02.sea01 & bcr02.sea01) came back online, as well as back-end for pod 01 (bcr01.sea01). The front-end router for pod 01 (fcr01.sea01) did not recover properly after power was restored, and finally at 03:50 UTC on 08-DEC-2011 all network services were back online.

Our facilities and system administration team are currently working on restoring service to any servers or CCIs (cloud instances) that did not automatically recover after power resumed.
Logged
IainB
Supporting Member
**
Posts: 4,704


Slartibartfarst

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #13 on: December 08, 2011, 08:36:31 PM »

Looks like someone probably screwed up big time - and it will probably be human error, because automated power UPS/generators work just fine otherwise.
Quote
The backup generator / UPS system subsequently experienced trouble backing up the critical load, and at 03:16 UTC on 08-DEC-2011 a section of our datacenter went offline due to power loss.

This tells you what happened, but not why.
Would be interesting to know what they determine the cause to be.
Logged
anandcoral
Honorary Member
**
Posts: 230



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #14 on: December 09, 2011, 01:52:37 AM »

I had sweats on my forehead, when I could not get through in the morning session. Though I do not remember how long I kept refreshing the "downforall.." page with the DC link, it did looked like more than hour.

I can not say how I felt relieved when it said DC was running.

One suggestion, since at the time of DC down, we all were asking just one question "what happened ?". Now can we have a mirror/ separate site, with just the official information of current activity of DC ? Mouser or a volunteer can keep it updated once a day in normal cases and frequently in down time (hope it does not happen again). No more that 1 or 2 pages and no comment features, just read only.

Since both will reside in separate place/ server etc. Once can check the other if he/she has problem getting through the first.

What do you think about it ?

Regards,

Anand
Logged
mouser
First Author
Administrator
*****
Posts: 33,353



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #15 on: December 09, 2011, 02:39:09 AM »

It's a very good idea.  We need to have another small forum somewhere so we can meet when donationcoder main site goes offline, to provide information, etc.

Meanwhile if you install an irc chat program (lots of free ones), you can always find us chatting any hour of the night on the efnet network, on channel #donationcoder (that's where you go if you hit the chat button at the top of the page).
Logged
Renegade
Charter Member
***
Posts: 11,358



Tell me something you don't know...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #16 on: December 09, 2011, 02:41:59 AM »

mouser, isn't this the second time that has happened?

(I'm with Softlayer as well, but in the Texas data centers, and I've NEVER had any problems.)
Logged

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
Renegade
Charter Member
***
Posts: 11,358



Tell me something you don't know...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #17 on: December 09, 2011, 02:44:59 AM »

It's a very good idea.  We need to have another small forum somewhere so we can meet when donationcoder main site goes offline, to provide information, etc.

Meanwhile if you install an irc chat program (lots of free ones), you can always find us chatting any hour of the night on the efnet network, on channel #donationcoder (that's where you go if you hit the chat button at the top of the page).

Is the cloud actually at the point where you can run a real application in the cloud? Like a forum? A data driven application? I mean like a decentralized solution that you can still run off of traditional DNS, and not a uber-massive DDNS redundant server beast. Just a simple little solution that's decentralized and will let small sites (e.g. 1 server) run?

Logged

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
IainB
Supporting Member
**
Posts: 4,704


Slartibartfarst

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #18 on: December 09, 2011, 04:33:09 AM »

Is the cloud actually at the point where you can run a real application in the cloud? Like a forum? A data driven application? I mean like a decentralized solution that you can still run off of traditional DNS, and not a uber-massive DDNS redundant server beast. Just a simple little solution that's decentralized and will let small sites (e.g. 1 server) run?
Well, you could migrate DCF to Google groups, I suppose...
That's distributed and backed up all over the place, I gather. Not sure if that means that it is in the "Cloud" though.
Logged
JavaJones
Review 2.0 Designer
Charter Member
***
Posts: 2,537



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #19 on: December 11, 2011, 03:50:08 PM »

Seriously, is down time enough of an issue that we need a *second forum* to "discuss DC stuff while the main forum is down"? Woah. That's... crazy in my opinion. Just get a reliable hosting service! The IRC channel is enough for those "in the know" (which is the only people who would know about and use an alternate forum when the main one is down anyway). I seriously don't think resources and time should be wasted on a secondary forum.

Anyway, has anyone ever heard of a data center power outage that *went well*, i.e. according to plan? Something like "At 3PM Pacific Time on December 11th, our San Jose data center lost power. Our UPS units kept all systems online while our backup generators kicked in and there was no interruption of service. Our backup generators powered all systems for 9 hours until the utility company could restore power. Thank you for your patronage." End of story. I think I've seen that maybe *once*, yet 10s of times I've seen "Our backup systems got overloaded and failed, then things went down for x amount of time. Sorry!" What good are backup systems that fail themselves?

What about switching to a different Softlayer data center?

- Oshyan
Logged

The New Adventures of Oshyan Greene - A life in pictures...
Renegade
Charter Member
***
Posts: 11,358



Tell me something you don't know...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #20 on: December 12, 2011, 02:52:52 PM »

Did it go down again? I noticed more downtime.
Logged

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
Ath
Supporting Member
**
Posts: 2,212



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #21 on: December 12, 2011, 03:00:35 PM »

All I saw was that today the forum-backup had shifted forward (as in: later than usual) 1 hour, but that could be on purpose, or have to do with the server time setting issue there was after getting re-powered, as rgdot pointed out earlier in this thread.
Logged

mouser
First Author
Administrator
*****
Posts: 33,353



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #22 on: December 12, 2011, 03:25:27 PM »

It did go down again last night.. not softlayer but our server.. it started the backup process and cpu usage climbed to the point where the server was unreachable and never came down until the server had to be hard rebooted..
Logged
40hz
Supporting Member
**
Posts: 10,670



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #23 on: December 12, 2011, 04:13:19 PM »

Was that a virtual machine, or the hardware server itself that a maxed out?
Logged

Don't you see? It's turtles all the way down!
mouser
First Author
Administrator
*****
Posts: 33,353



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #24 on: December 12, 2011, 04:15:42 PM »

the vmware virtual machine running donationcoder.com; we could see the cpu load go off the chart right as it started performing backups and it just never came down on its own and we couldn't get into the vmware console for it.  very strange.
Logged
Pages: [1] 2 Next   Go Up
  Reply  |  New Topic  |  Print  
 
Jump to:  
   Forum Home   Thread Marks Chat! Downloads Search Login Register  

DonationCoder.com | About Us
DonationCoder.com Forum | Powered by SMF
[ Page time: 0.063s | Server load: 0.1 ]