topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • October 16, 2019, 04:34 AM
  • Proudly celebrating 13 years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Avaiability/outages of the DCF website.  (Read 1248 times)

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,446
  • Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Avaiability/outages of the DCF website.
« on: December 22, 2018, 11:24 AM »
Is there a log of service level incidents/outages of the website?
I got an Error 522 today, reported by Cloudflare  - copied per the attached file which is just text in an .mhtml file, in a .zip file.
I wondered whether it was a known outage/incident or an unknown intermittent error of some kind.
Thought I should report it.
 There was a diagram that showed the connections between:
  Browser (me) <---> Cloudflare (Tokyo) <---> Host (donationcoder.com)

 - with the Browser and Cloudflare shown as "working" and their link OK, but the link between Cloudflare to Host was X'd out (not working).

I did a Ctrl-R (refresh) and after a rather longish wait, the DCF site came up OK.

Shades

  • Member
  • Joined in 2006
  • **
  • Posts: 2,584
    • View Profile
    • Donate to Member
Re: Avaiability/outages of the DCF website.
« Reply #1 on: December 22, 2018, 04:18 PM »
After the DDOS attack not too long ago, I understood that the DC website now uses CloudFlare's facilities to handle internet traffic to the DC web server.

CloudFlare indicates that error 522 can be caused by:
 -   Overloaded web server
 -   Offline origin web server
 -   Blocked Cloudflare requests
 -   Faulty network routing
 -   Disabled keepalives
 -   Incorrect IP address in the Cloudflare DNS settings (i.e. the request from us was sent to the wrong place)
 -   Dropped packets on the host network

The responsibility for the first 2 possible causes are directly related to DC, the other possible causes are more on CloudFlare's turf. Therefore CloudFlare can be just as much a solution as it can be a problem.

Stephen66515

  • Animated Giffer in Chief
  • Honorary Member
  • Joined in 2010
  • **
  • Posts: 3,552
    • View Profile
    • Donate to Member
Re: Avaiability/outages of the DCF website.
« Reply #2 on: December 22, 2018, 04:25 PM »
Generally, when the site goes down like that, it's cause mouser has pressed the wrong button on something ;)

Sometimes though it can just be a small hiccough at the datacenter, causing the network to lose connection for a few moments.

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,446
  • Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Avaiability/outages of the DCF website.
« Reply #3 on: December 22, 2018, 04:51 PM »
Well, I wasn't as concerned with having an analysis of that specific incident and its causes/responsibilities per se as much as I was with simply identifying the correct reporting path  - e.g., does DCF maintain an Incident Log (per ITIL good/best practice)? - but I have no real idea what the process for reporting such incidents for DCF might be in any case.
Is there a log of service level incidents/outages of the website?

Anyway, I've reported it now.    :D

rgdot

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 2,147
    • View Profile
    • Donate to Member
Re: Avaiability/outages of the DCF website.
« Reply #4 on: December 22, 2018, 04:57 PM »
Generally, when the site goes down like that, it's cause mouser has pressed the wrong button on something ;)

computerfix.gif
« Last Edit: December 24, 2018, 01:41 AM by Deozaan, Reason: fixed formatting »

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,446
  • Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Avaiability/outages of the DCF website.
« Reply #5 on: December 23, 2018, 12:02 PM »
@rgdot: Yes, the Captain might typically press the wrong button, as @Stephen66515 suggests:
Generally, when the site goes down like that, it's cause mouser has pressed the wrong button on something ;)
Reminds me of an auditor I knew who worked at a banking data processing organisation. He was being proudly shown around the operations-room of one of the several new distributed national data centres when he suddenly and inadvertently became notorious for being "that guy who curiously pressed an unlabelled big red button on the side of an IBM mainframe box". I'm not sure what the purpose of the big red button was, or why it was prominent in an area where people could touch it or knock against it if it was such a risk, but his pressing it apparently resulted in the shutdown of the whole data centre for a couple of hours.

He kept his job because the button was unlabelled (came with no warnings), was easily accessible in a "safe" (read, "safe for monkeys to roam") area in the first place, and his action had clearly highlighted a serious potential operations-room process risk - which was subsequently rectified.
Seemed fair to me. (True story.)
« Last Edit: December 23, 2018, 12:19 PM by IainB »