Jump to content

E-mail & Forum 14/15 Feb outage


Wazoo
 Share

Recommended Posts

As seen at the SpamCop Email System News web-page ....

Feb 15, 2009

[15:40 EST] Our data center had a major power outage on Saturday morning around 8:00 a.m. EST. We were able to resolve most issues by around 12:00 p.m. but some of our equipment was damaged in the power problems. We are working on restoring 100% service today. We will post more information on the forum and update this news with a link to that post. Thanks to Wazoo and Don for working with users and notification during this time.

Feb 14, 2009

[04:23 EST] We are aware that there is a problem POPping external servers. We are working on the problem now and expect it to be fixed within the next hour. After that, it may take a couple of hours for it to catch up POPping all of the mail waiting.

Equipment damage was something I'd hoped wasn't part of the issue, but ....

Not sure I want to imagine just how much work has been going on for the last couple of days .... I'm offering what thanks I can for that effort .. apologies for all the phone calls, voice-mails, IMs, etc.

Moderator edit:

Some of the systems that were/are affected include:

SpamCop Forum

SpamCop Wiki

SpamCop Reporting

SpamCop Email

SpamCop SMTP service

As well as other services

Edited by dbiel
Link to comment
Share on other sites

Not sure I want to imagine just how much work has been going on for the last couple of days

If that's indeed the case. However, since the News page was "live" and could probably have been updated during any such "work," I'm left thinking about other possible explanations for the extremely long gap between those two news items you quoted. If work was actually taking place during much of Saturday, a simple note to that effect on the News page would have been more than welcome.

I know this sounds a bit grumpy, and that during server outages, restoring the services is higher priority than communicating with anyone, but a little bit of information is usually better than leaving people to wonder and speculate. For example, knowing for sure that someone is actually working on the problem, rather than blissfully unaware means that those of us who have mail forwarded to our SC email accounts wouldn't need to take emergency actions ourselves, such as re-routing the forwards.

DT

Edited by DavidT
Link to comment
Share on other sites

If that's indeed the case. However, since the News page was "live" and could probably have been updated during any such "work," I'm left thinking about other possible explanations for the extremely long gap between those two news items you quoted. If work was actually taking place during much of Saturday, a simple note to that effect on the News page would have been more than welcome.

I know this sounds a bit grumpy, and that during server outages, restoring the services is higher priority than communicating with anyone, but a little bit of information is usually better than leaving people to wonder and speculate. For example, knowing for sure that someone is actually working on the problem, rather than blissfully unaware means that those of us who have mail forwarded to our SC email accounts wouldn't need to take emergency actions ourselves, such as re-routing the forwards.

I heartily agree with both sentiments -- As a sysadmin, I often 1) have no way of notifying everyone of system problems and 2) am constantly being bombarded by phone calls from people wanting to know if we're having problems. On the other hand, a simple post on the news site that "yes, we know the system is down and we're working on it" would have been sufficient. As it is, the news server was working, but no one bothered to post there, that I'm aware, that there was a problem and it was being worked on.

I realize that this forum is the more official of the two methods, and that the spamcop news server is somewhat deprecated, however, if the forum is down.... PLEASE, J.T. et al -- just put a note on the news page that there is a known issue next time something happens. Don't leave us wondering.

Link to comment
Share on other sites

E-mail containing 'thanks for the voice-mails' received with some additional data. To be honest, I was laughing a bit at the data offered, the thought apparently being that I probably couldn't understand the magnitude of the data-center situation ... but in reality, I certainly do ... perhaps a Lounge post coming up .. Anyway ...

Power failure to the building occurred ... however, the backup system failed (my words here .. to come up fully, the most likely cause for the next words ...) .... it was that point that some systems, gear, etc. let out the magic smoke. (I am sure that you all know that it's the magic smoke that allows electronic things to do their work. When something breaks and one sees the magic smoke escaping into the air, you know that the electronic box is now dead. Some say that this is just regular smoke from an overheated component, but .... technicians know better <g>) Generators eventually came fully on-line, but ... this also helped to fry a few more pieces of already parially damaged hardware.

Take a look at the graphic of a shot of some of the data-center ... those parts under the 'yellow' lines are basically wire cages, entrance/doors locked by the enclosed system owners. JT has several of these cages, located all over the building. Although networked throughout the building, not all are configured to directly reach outside. JT states that when he arrived on-site, there were another 100 or so individuals there, also working on their systems. This caused issues in that the data-center has some tools & diagnostic equipment available for use, but .... not hundreds of them. So there was some 'competition' for getting one's hands on some of that gear.

At present, the data-center is still running on generator power ... the vendor has flown techs in to troubleshoot the backup system that didn't work. (So all is not totally well at tis point.)

He states that the Forum server was just one of the servers sitting just behind a networking box that failed. He returned to the site today with some more equipment (and a plan) and that's how this thing came back on-line. The rest of the system, if not on-line now, should return shortly.

Link to comment
Share on other sites

Feb 16, 2009

[16:34 EST] Regarding the data center power problems on Saturday morning, our provider says they have located the problem and replaced faulty parts this morning. They are testing the backup system this afternoon to ensure it is functional again. In the meantime, the facility continues to run on 100% generator power until they're confident the backups are working and ready to go into production.

Link to comment
Share on other sites

 Share

×
×
  • Create New...