QUOTE(DavidT @ Nov 22 2008, 11:41 AM)

Thanks. As for why my experience was different here...that's hard to say. If it were only a DNS issue, then my browser shouldn't have been able to resolve the addresses, and the error messages should have reflected that. During the outage, I also attempted command-line email deliveries to the MX servers, but I did so from a web-hosting box on the East coast, and those attempts also timed out, indicating not DNS issues, but actual connectivity issues.
In fact, the problem is much stranger and obscure than that.
We have very good DNS. spamcop.net DNS is handled by Akamai globally. cesmail.net DNS is handled by a set of four different nameservers, in four datacenters, in both North America and Europe. There was no problem with the internet at large reaching our DNS.
The real problem was with our servers internally reaching these nameservers. We use them ourselves to resolve names used internally within the system. We have resolving cache nameservers but they ultimately look up our names on the same servers you guys do. For a time Friday evening, our caches couldn't get to our own nameservers and couldn't look up IP addresses, like the IP address of the database server.
We've been informed now that upstream from our data center, one provider had problems reaching a broken connection to Level3. Apparently they were accepting traffic but it was disappearing. The upstreams went back and forth for a while before our data center did an emergency failover and took all traffic away from those guys. At that point, the problem instantly went away.
My guess is that inbound traffic came in a different route and that reply packets went back out that route. So, we were mostly reachable. However, transactions that we initiated (like DNS queries) mostly went out that broken route.
Anyway, this all happened several layers upstream of us so I don't have any more visibility into the problem. This is probably about all the info I'll ever get and I wasn't able to do any real network debugging while it was going on.
JT