Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wow, I so strongly disagree with this. I'm currently working on a system backed by DocumentDB, and had a solution for the core of our application written out in a 13-line javascript stored procedure. The edict came down from above: no stored procedures because scalability. No analysis, no real-world data (we're currently doing a few transactions per day; my understanding is that stackoverflow uses SQL server and scales just fine), just idealism. So we set off to push our stored proc logic into a lazily-consistent background job. I hated the idea initially, but the result turned me off of eventual consistency permanently.

This is because, not only did we develop an edict against stored procedures, but we then developed an edict that everything must be done using the lowest consistency model possible. We ended up settling on session consistency. TBH we discovered an algorithm that didn't even require session consistency, but the latency would have been to great. Again, no particular analysis or rationale for the edict, just idealism. The end result was a system that, for each update, required six background jobs to be launched, which each launched their own cascading jobs that patched up the data. And our records all had to be peppered with version fields and updated-by-job fields and version-of-foreign-key fields that these fixup jobs had to test against to make sure they're not operating against stale data or running over each other, such that this crap actually outweighs the actual data in the records. And these background jobs manage state in Table Storage and transfer messages via Service Bus, so two more SPOFs not to mention all the extra compute resource cost. It's a huge mess of a jenga puzzle that's taken months to implement and code review and test, and now even six months later we're still finding edge cases that aren't handled correctly (or are they? you have to constantly go through from the start to really understand it). In addition, since things are never guaranteed to be in a consistent state, that means we could have foreign keys that don't reference anything, or circular graphs where cycles should be disallowed, or whatever else at any moment in time, and so we have to fudge this stuff when presenting data to the user, and it reduces our ability to do any sort of meaningful statistics on our data. To remove 13 lines of javascript. It's just a mess.

And then what happened is a couple new requirements came in that couldn't be reasonably handled in a lazy fashion, so we ended up using stored procs for that logic anyway.

Now, to be clear, the resulting system is strictly more scalable than the stored proc based one. One could in theory just throw more machines at any load and it'll continue to work. So there's that. But a) the load beyond which the stored proc system could scale is about 3 orders of magnitude higher than even our optimistic usage estimates in the foreseeable future, and b) the lazy algorithm requires so much additional compute overhead that, while infinitely scalable, would we even be able to afford it anymore?

So for my money, give me a strongly consistent system by default. I don't even particularly understand the argument given above. With a strongly consistent system, you can still choose to do things like join in the client if there's a reason for it. But going the opposite way and forcing consistency to be unassumed everywhere produces the mess we're dwelling in now.

Now, certainly there are cases where eventual consistency is better: high-write-volume on not-very-relational data where point-in-time accuracy isn't super business-critical is where it typically shines. At the "edges" of your system, essentially. But I do want to point out some of the pain that can be encountered if it's chosen for the wrong use case, and make sure that anyone who does use it in their core application logic knows what they're in for, and puts some more thought into whether it's what they really want.



> The end result was a system that, for each update, required six background jobs to be launched, which each launched their own cascading jobs that patched up the data. And our records all had to be peppered with version fields and updated-by-job fields and version-of-foreign-key fields that these fixup jobs had to test against to make sure they're not operating against stale data or running over each other, such that this crap actually outweighs the actual data in the records.

That sounds bad. The way you're supposed to do it is have the initial data be immutable and compute the downstream data separately, but into its own places.

> So for my money, give me a strongly consistent system by default. I don't even particularly understand the argument given above. With a strongly consistent system, you can still choose to do things like join in the client if there's a reason for it. But going the opposite way and forcing consistency to be unassumed everywhere produces the mess we're dwelling in now.

I'd say transactions should never be the default, because a database transaction that isn't a semantic transaction is worse than useless. If the application programmer hasn't thought about transactionality and synchronization, it's much better for that to show up as not having any transactions - you can fix that, by adding transactions where you need them - than to have transactions that don't mean anything, and the data ending up inconsistent even though no transactions ever overlapped.


It sounds like your root problem was dogmatic thinking, not any sort of consistency model.

There is a time and place for eventual consistency and multi-master, and there is a time for strongly consistent SQL servers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: