Heroku is down again

robbiet480 · on June 30, 2012

MASSIVE Storms in VA area where us-east-1 is. 326,000 customers without power already, worst lightning I have seen in my 20 years of life. Sky is intense blue/green/purple. This is most likely what the issue is

jordanscales · on June 30, 2012

No matter how powerful we become as a species with our technology, we are still at the mercy of the clouds. Pretty cool if you think about it.

tyrmored · on June 30, 2012

The cloud is no match for the cloud!

jonniekang · on June 30, 2012

vibrunazo · on June 30, 2012

I'm completely ignorant here. But aren't these outages usually solved by having backup servers in different locations? As many datacenters do, and as I imagine something as huge as heroku would?

oldstrangers · on June 30, 2012

Or if we just built our power grid underground like rational people.

vasco · on June 30, 2012

Underground cables have more expensive set up costs, lower lifetime, and higher maintenance costs. The price you pay for electricity doesn't even come close to justifying burying power lines. There's also the ecological stuff if you find that a reasonable argument. Bottom line, burying power cables just so you don't have to light a candle for a night isn't worth it.

__mark__ · on June 30, 2012

Until earthquakes.

mediocregopher · on June 30, 2012

Depending on where you are in the world, earthquakes are much rarer then insane storms. I'm speaking as a Floridian. I'm fairly ignorant on this issue, but would it be that difficult to use one or the other depending on which natural occurrence is more likely? Or is this also a cost issue?

gregschlom · on June 30, 2012

Massive cost issue, plus some technical issues.

According to this document [1]:

"The North Carolina Utilities Commission studied the cost of placing Duke Power’s distribution facilities underground and found it would cost more than $41 billion, resulting in a 125 percent increase in customer rates."

[1] http://www.sceg.com/NR/rdonlyres/465E6534-2FFB-4069-BF84-814...,

Ygg2 · on June 30, 2012

Underground cables have many problems like rats (and other vermin) or people harvesting copper/metals.

Nitramp · on June 30, 2012

Do they? Here in Germany the entire cabling within cities is underground, only the high voltage long distance lines are above ground. I've never heard a story about people stealing underground cables (they do steal e.g. train track above ground cabling). That also wouldn't make sense, digging up those cables is much more effort than taking them down from a post.

I've also never heard stories about issues with rats.

Power outages still happen, but they are quite rare - in 30 years I can only remember twoish.

adamt · on June 30, 2012

Rats are a common menace with all sorts of cabling. Large parts of scotland recently lots broadband due to rats eating cables (http://www.theregister.co.uk/2011/10/12/dirty_rat_downs_virg...).

Apparently it's the insulation on the wires that they like.

batista · on June 30, 2012

Really? Rats and copper harvesters?

Like it's gonna be some unprotected plastic cables 1 foot under the ground?

Ygg2 · on June 30, 2012

I don't know but I've heard stories. Stealing cables underground is not common but it happened. And rats and underground water is quite a problem for underground (copper) cables.

blaines · on June 30, 2012

Sometimes they don't even steal the cables... This guy just liked the sparks. http://www.westvalleyview.com/main.asp?Search=1&ArticleI...

tomjen3 · on June 30, 2012

Not cool at all.

All it means is that humans are not yet powerful enough to make the environment work as it should (ie serve humans).

MrMatters · on June 30, 2012

Well, until we figure out plausible ways to control weather reliably on a large enough scale, at least. Without killing the atmosphere or our species or anything like that.

Ygg2 · on June 30, 2012

Or each other...

kermatt · on June 30, 2012

I see what you did there.

wensing · on June 30, 2012

Stormpulse.

jaequery · on June 30, 2012

this is pretty bad, really bad ...

wavephorm · on June 30, 2012

Hopefully one day it will be feasible to host our computing architecture in space and avoid all these terrestrial obstructions.

zacharyvoase · on June 30, 2012

Until a solar storm hits, and our servers are no longer under the protection of the Earth's magnetic field.

wavephorm · on July 1, 2012

How do satellites deal with solar storms currently? Surely they could put a ton of iron around the servers to protect them.

AYBABTME · on June 30, 2012

I have a feeling the electromagnetic conditions are much more stable on earth than in space. The magnetosphere and the atmosphere deflect a great deal of energy.

jordanthoms · on June 30, 2012

What about when a booster goes off by mistake and sends your data flying into the sun?

Ygg2 · on June 30, 2012

Sun is far away, you'd still get data back before it hits, but if buster made satelite hit another satelite or descend to earth it would be a worse situation.

meskyanichi · on June 30, 2012

Or as the religious people call it, "The Act of God". No seriously, people actually write that shit in their terms. Hilarious.

cmelbye · on June 30, 2012

It's a legal term with a specific meaning.

http://en.wikipedia.org/wiki/Act_of_God

meskyanichi · on June 30, 2012

Which is why I brought it up, it's hilarious. I thought people were just trolling at first but man, the first time I saw it, it made my day. Relating something like "God" with natural disasters. I love how people come up with that kind of stuff.

meskyanichi · on June 30, 2012

Religious people be mad!

excuse-me · on June 30, 2012

Presumably in this case - Thor ?

meskyanichi · on June 30, 2012

That, or Ramuh? Or Quezacotl?

excuse-me · on June 30, 2012

Well it's hitting N America and his movie didn't do that well - so he might be a bit annoyed

meskyanichi · on July 1, 2012

Zing!

shadowz · on June 30, 2012

Seems pretty severe actually. Washington Post has a live blog going on about it:

http://www.washingtonpost.com/blogs/capital-weather-gang/pos...

hristov · on June 30, 2012

In other words, Amazon's power backups failed again. On the bright side at least they are not running a nuclear reactor.

_ezkx · on June 30, 2012

Was watching a movie in a big 20-screen theater in Richmond, and they told everybody to just leave (incidentally, not through the emergency exits, instead they funneled 100s of people into the lobby all at once :/)

douglasisshiny · on June 30, 2012

I live in DC. It was an amazing storm. A transformer in my area went down fairly quickly. Fortunately, I live near a large hospital.

Klinky · on June 30, 2012

Saw this post here on HN, pulled up www.chart.state.md.us to watch the live traffic cams in the area. Clicked through a couple, some of which showed heavy rain, wind & lightning. Then the stream froze and now the site is completely unresponsive.

stbullard · on June 30, 2012

EC2 status:

8:21 PM PDT We are investigating connectivity issues for a number of instances in the US-EAST-1 Region.

8:31 PM PDT We are investigating elevated errors rates for APIs in the US-EAST-1 (Northern Virginia) region, as well as connectivity issues to instances in a single availability zone.

8:40 PM PDT We can confirm that a large number of instances in a single Availability Zone have lost power due to electrical storms in the area. We are actively working to restore power.

joelg87 · on June 30, 2012

8:49 PM PDT Power has been restored to the impacted Availability Zone and we are working to bring impacted instances and volumes back online.

stbullard · on June 30, 2012

9:20 PM PDT We are continuing to work to bring the instances and volumes back online. In addition, EC2 and EBS APIs are currently experiencing elevated error rates.

rgarcia · on June 30, 2012

How many times does this have to happen before heroku spreads across multiple regions?

mechanical_fish · on June 30, 2012

It's a hard thing to engineer, especially after the fact, and especially when you are trying to hide it behind an abstraction layer. (Which is to say: You can't expect your customers to engineer their apps with multiregion in mind, or to take it kindly when you raise rates to support additional redundant hardware and bandwidth.)

e.g. because region-to-region data transfer is not free, and trans-region latency is ugly, you can't just relaunch half your instance farm in another region and expect happiness. There are also routing issues: Internal IPs don't work across regions, elastic IPs don't transfer across regions...

hpatel · on June 30, 2012

Even if they can't do it right away, they should communicate a plan for how they are going to tackle this recurring issue. That's the whole point behind status.heroku.com / trust.salesforce.com. They are part of a publicly traded corporation with a lot of resources.

Extremely nerve wracking for new startups like ours.

scorpion032 · on June 30, 2012

I guess, it is always a good plan to have an instance deployable on Linode.

Estragon · on June 30, 2012

If you have that set up, does Heroku give you any extra value?

erichmond · on June 30, 2012

^^ This.

fragsworth · on June 30, 2012

You say this like they can just snap their fingers and provide regional services.

inopinatus · on June 30, 2012

Well, nothing of scale happens overnight.

But anyone deploying a critical application to AWS makes a point of cross-region data replication. Heroku have long known that they lose potential customers to, say, Engine Yard as a result of only hosting at US-East.

One can only conclude that this is a clear business decision on their part. I can hardly believe that Heroku's engineers are incapable of it. Indeed I would be very surprised to learn that they haven't brought up an instance of their platform at, say, US-West, for testing or proof-of-concept purposes.

Of course, productising that is a different matter. Extending the control plane, front end, and pricing/billing systems might have considerable associated project cost. Perhaps they have concluded that the costs outweigh the additional revenue. Or, just haven't got around to it yet.

tomjen3 · on June 30, 2012

If not, why pay them rather than go to amazon directly?

cardmagic · on June 30, 2012

Like http://appfog.com/ has?

MicahWedemeyer · on June 30, 2012

Can we update the title to something like "AWS US-east-1 is down" instead of just Heroku?

sofuture · on June 30, 2012

> informative titles get changed

> uninformative titles left intact

hn 2012

rplnt · on June 30, 2012

Appropriate title might be that "Heroku is down due to AWS outage which is down due to power failure which happened due to storms caused by moist winds colliding with hot air that was heated over the continent by sun that....". It really doesn't matter. Heroku is down. Customers don't care.

jdcryans · on June 30, 2012

Then you'd have to change the link too?

Apreche · on June 30, 2012

Why does the AWS dashboard show all green when that is most definitely not the case?

http://status.aws.amazon.com/

ultrasaurus · on June 30, 2012

I strongly doubt that's the case, at PagerDuty we're seeing ~100x the regular traffic. I think it's been having issues for at least 15 minutes.

jordanscales · on June 30, 2012

The red check marks are in VA and can't be displayed.

jc4p · on June 30, 2012

If only they did multi-AZ hosting like they keep telling us to do when there's outages :)

Apreche · on June 30, 2012

Even now that it is updated it has yellow triangles for "performance issues" instead of red circle for service disruption. Seems like they are in denial.

erichmond · on June 30, 2012

This was the disappointing thing for me as well. Our connectivity died around 8PM EST-ish, and I immediately went to status.aws and it said everything was normal. I then proceeded to waste half my night looking at our internal infrastructure trusting that page was accurate.

I've learned my lesson.

oakwhiz · on June 30, 2012

Power went out = service disruption.

It's underhanded to call it a "performance issue," if not an outright lie, albeit a small one.

ultrasaurus · on June 30, 2012

Yup, a lot of services served by AWS are having issues. We're seeing a huge spike in incidents being triggered in PagerDuty.

(fyi: our customers are still being alerted)

scorpion032 · on June 30, 2012

What infrastructure do you use for alerting your customers?

devth · on June 30, 2012

My app with 2 dynos is down.

Their status site is running fine altho it's not reporting errors: https://status.heroku.com/

Their Helpdesk is down: https://api.heroku.com/helpdesk/login?timestamp=1341025835&#...

Devcenter is down: http://devcenter.heroku.com

AWS isn't reporting any errors: http://status.aws.amazon.com

shuzchen · on June 30, 2012

That's standard. You'd expect them to run everything they have on their own service except for the status site.

stbullard · on June 30, 2012

At 11:25 Eastern, https://status.heroku.com/incidents/386 was posted: "We're currently experiencing a widespread application outage. We've disabled API access while engineers work on resolving the issues."

KenCochrane · on June 30, 2012

Amazon posted an update:

8:21 PM PDT We are investigating connectivity issues for a number of instances in the US-EAST-1 Region.

ricardobeat · on June 30, 2012

Please link to https://status.heroku.com/, pointing to a broken URL is pointless.

darrenkopp · on June 30, 2012

Think that netflix is down too. so much for chaos monkey?

blantonl · on June 30, 2012

Yup, I can confirm from here in Montana that both Netflix and Herkou are down.

However, I have 20 instances on us-east. And haven't seen any problems, even during yesterday's outage on AWS.

Edit: that doesn't mean this isn't an AWS outage.... It almost certainly is.

sofuture · on June 30, 2012

Yeah, my ~20 instances are okay. I think we had a hiccup with our (multi-az, whooo!) RDS though.

theoutlander · on June 30, 2012

No wonder my wife was just asking me why Netflix is down on Xbox live here in Seattle.

jaequery · on June 30, 2012

i think netflix ceo just signed up for a rackspace account

superuser2 · on June 30, 2012

Rackspace got hit by a truck last year and went down for a while too. Not cloud != perfectly reliable.

davidandgoliath · on June 30, 2012

Slightly different scenario however: the power was shut off by the fire marshall if I recollect correctly.

Rackspace (and many, many other co's) tend to have functional UPS units & generators. Amazon tends to choose the cheapest datacenter facility imaginable and then these sort of failures occur.

Given their size they'll inevitably fix the power issues though -- they've got the finances & they're capable to add a few levels of redundancy.

superuser2 · on June 30, 2012

I found the reports about the outage - it was 2007 (so obviously much more than a year ago) but very similar to one of Amazon's recent outages - the truck took out a transformer, Rackspace fired up backup power, but cooling failed to start so Rackspace had to shut it all down to avoid melting everything.

Looks like Amazon wasn't the only one with inadequate testing of their continuity plan. And I don't think Rackspace offered alternate Availability Zones at that point.

justinsb · on June 30, 2012

You're joking, but I bet Netflix's Asgard system gets support for OpenStack pretty quickly now...

sp332 · on June 30, 2012

Netflix has confirmed that they would like the option to migrate to from Amazon. https://www.computerworld.com/s/article/9222294/Netflix_open...

justinsb · on June 30, 2012

Interesting link - thanks.

I think Netflix are expecting another cloud to offer the same model and API as Amazon though, which isn't likely to happen - everyone else is learning from AWS's mistakes!

Even if it did, many of the features they're waiting for (like auto-scaling groups) probably wouldn't be as useful in a multi-cloud environment, and would therefore have to be built into Asgard.

zenogais · on June 30, 2012

Confirmed in California as well.

undergroundhero · on June 30, 2012

Confirmed in Missouri.

sehugg · on June 30, 2012

There are US-East problems, looks like:

  - One AZ is down
  - API commands are spotty and may return incorrect results
  - ELB looks screwed
  - IP reassignments don't seem to be working
  - Who knows what the fuck else is broken

KenCochrane · on June 30, 2012

Don't think it was just heroku, lots of other sites were down as well, netflix.com, etc. most likely another AWS issue.

reustle · on June 30, 2012

I can't get to netflix.com. That is no good. Luckily all of my 75+ east servers seem to be ok.

rorrr · on June 30, 2012

netflix.com is down here as well

apawloski · on June 30, 2012

Reddit also seems to be experiencing some difficulties. Are they still on AWS?

dragonstyle · on June 30, 2012

We're in AWS East and definitely fighting some issues here, though we're trying to understand what is happening.

drewvanstone · on June 30, 2012

We have dozens of servers that are unavailable at the moment (US-East). Obviously AWS is having major issues.

jc4p · on June 30, 2012

It's been fifteen minutes since our site in US-East went down and AWS Status hasn't said anything yet.

usaar333 · on June 30, 2012

Ugh, the EC2 administration console is done. Being in other availability zones won't save you..

heliostatic · on June 30, 2012

As of 10:16 CDT, I can't reach Netflix or Heroku, although AWS status (http://status.aws.amazon.com/) is not yet reporting any current outages.

momoro · on June 30, 2012

Aws East connectivity issues, 8:21pm:

http://status.aws.amazon.com/rss/ec2-us-east-1.rss

pud · on June 30, 2012

This is the motivation I needed to spread my EC2 instances across multiple availability zones. When the power comes back.

(fandalism is down)

sehugg · on June 30, 2012

Don't assume that'll save you.

aquark · on June 30, 2012

Just started up try using filepicker.io and it seems to be down too.

Beginning to feel pretty lucky though -- this is at least the 4th AWS-East outage that has made enough of a splash to notice but missed my instances. Upgrading to multiple availability zones was scheduled for Monday anyway.

manishm · on June 30, 2012

Simple solution to this is to have a backup or failover to a non-AWS Datacenter too, basically don't be just dependent on one Datacenter. E.g. MS Azure/Google/Rackspace This not only spreads your risks but keeps your customers happy.

digeridoo · on June 30, 2012

There's nothing simple about that. :)

manishm · on June 30, 2012

Yea, but it should be done by providers like Heroku who can talk to multiple datacenters :)

bsaul · on June 30, 2012

Google Appengine's just fine. Dont't know how many AZ i'm on and don't want to know :)) The more i see about amazon failures the more i think VM are just not high enough for me in the abstraction layer...

_nato_ · on June 30, 2012

I have 3 customers -- what the are they going to do, those poor souls!!

moskie · on June 30, 2012

Comcast's login server looks like it's down too: http://login.comcast.net. Prevents me from logging into HBO Go. No Netflix either. :(

sneaky_weasel · on June 30, 2012

Everything still down for me. Would have expected some redundancy...

dnsco · on June 30, 2012

Apparently it's AWS East.

coryshaw · on June 30, 2012

Ahh I was just using crunchbase and its now down, must be related.

technotony · on June 30, 2012

I just lost a potential hire because of this, was demoing my app to someone and it wasn't working, she thought it was because of the product. Damn you heroku!

knodi · on June 30, 2012

Come on not another power issue, what happened to the generators... and the back up generators that they fixed few weeks back.

excuse-me · on June 30, 2012

The little red ribbon that you pull to get the AA batteries out is stcuk underneath - they are looking for a pen to flick the battery but since everyone switched over to Fire tablets there aren't any pens.

edouard1234567 · on June 30, 2012

idea for heroku : allow customers to host a "my app is down page for blah blah reason" where they host their status page (rackspace I guess?). Who think this would be useful? My users see a blank page right now when they go to ZeTrip, I'd rather show them a static page saying : "our site is down due to amazon lack of redundancy."

devicenull · on June 30, 2012

Cloudflare lets you do this afaik. I'm not sure I'd trust a service to show a proper 'this site is temporarily down' page when something very bad has happened.

fosk · on June 30, 2012

It's not Heroku that is down, AWS is down.

kevinprince · on June 30, 2012

We use Heroku for our event tools, thankfully we have nothing live this weekend or this would be a disaster.

RegEx · on June 30, 2012

Loadbalancers are down for me: Getting 'Response contains invalid JSON' upon attempted termination.

edouard1234567 · on June 30, 2012

"heroku status" command returns : All Systems Go: No known issues at this time.

philip1209 · on June 30, 2012

Cloudflare Always-Up isn't showing on my page - is Cloudflare affected too?

jordanthoms · on June 30, 2012

Heroku's uptime for June is going to be.. not so hot.

bohara · on July 2, 2012

For shissle.

Had to move off Heroku for my latest app. That amount of downtime would put me out of business.

To bad. I really like the Heroku platform

jaequery · on June 30, 2012

did the datacenter get flooded or what? this is just "major" downage.

apawloski · on June 30, 2012

It's likely related to the power outages across VA

mthreat · on June 30, 2012

AWS should call them Unavailability Zones.

alanh · on June 30, 2012

I had the same thought, but it’s not HN material.

_nato_ · on June 30, 2012

what a winner! And they charge!! FUX.

ahmedaly · on June 30, 2012

I can't believe that guys at heroku are not ready for such situations!

They rely ONLY on virginia's instances because its the cheapest, without caring about customers.. or thinking of replicating their services in multiple locations for such issues!