Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Manhattan: Real-time, multi-tenant distributed database for Twitter scale (blog.twitter.com)
96 points by jaboutboul on April 2, 2014 | hide | past | favorite | 42 comments


From what I understand, Manhattan is based on the ideas from ElephantDB. Unfortunately, development has pretty much stopped on ElephantDB despite the fact a book by Nathan Marz is being written about big data that is dependent on it. http://www.manning.com/marz/

Summingbird (bear with me, I'll tie this in) is also twitter's answer for writing code once and seeing it run on a variety of execution platforms such as hadoop, storm, spark, akka, etc... Not all of these have been built out, but the platform was designed to be a generic framework to support write once execute everywhere.

Summingbird is written to support Manhattan's model as well. The high level idea is to use versioning to determine whether a request is precomputed (batch), computed (realtime) or a hybrid (precomputed + computed). These are expressed as monads with basic functionality present in algebird. One way to bring support to this model to the open source world would be to implement storehaus bindings for elephantdb and to resurrect elephantdb or build a similar service to provide storage similar to Manhattan.

Overall, very early yet promising work in the open source community.

[edit: book is not about elephantdb, but is a critical component. modified wording. Also added link]


When Manhattan first started, our first use case was this. To support batch (something we talk about in our blog post). Similar to ElephantDB, the biggest difference was we wrote our own storage engine (seadb), instead of using BDBJE. We built the hadoop pipeline service so developers only needed to supply us with sequence files and we did the work for them by watching.

Over time, Manhattan evolved into a fully fledged read/write database that is able to support batch + read/write in the same system. Batch is great for some use cases, but sometimes the cost is too expensive when you factor in how much processing power, storage, etc is needed for that model. We support both of course still we want developers to have the freedom to increase their productivity.


Corresponding Twitter Engineering blog post: https://blog.twitter.com/2014/manhattan-our-real-time-multi-...


Yeah, that's much more interesting. The wired article just repeats the overview and loses the interesting technical details. I'm not sure why it was posted instead of the original content.


Thanks. I changed it.


I think its not an exaggeration at all. Twitter literally pulls data from thousands of servers but we are missing the point of Manhattan. As some of us know twitter services scale dynamic according the load they are serving and some engineers at Twitter decide to buy a hole truck of steroids to this idea. Here is a trick - We make container agents (storage services) "clients" of the Manhattan database(the core). They are Mesos processes which scale dynamic to the needs of the service (i.e 1 container = 10000 reads/writes per second, 2 containers = 20000 read/writes per second and so on) which allows the dynamic scaling of requests per second and writes per second. The core handles finding actual machines which have the data, replicating it and so on. There might be realtime storage service contaners which need fast data access, batch importer and timeseries they mention and so on. This requires a lot of gymnastics but offer a lot of nice features.The Manhattan database acts as virtual layer over thousands of machines and storage services allows for customized data manipulation. Cool...huh? According importance and scale (multi dc) this operates on i think it almost impossible to open source this. But who knows. miracles are happening now and then.


Or at least i think that is happening something along those lines.


And their Alerting system is called Koalabird(picture of manhattan db - Section Alerting :))


Interesting.. The database sounds almost too good to be true. I wonder if they'll open source this. They've done so in the past with projects like Storm, so I'm hopeful.


Consistency, that's the usual performance killer.

They have got the hadoop watcher right as well as the BTree/SSTable update mechanisms - heavy import, light update is a use-case that is not catered to usually.

The rest of it feels a lot like memcache (membase/zbase impls) when reading up on its architecture - plus SSDs, that's always going to beats the pants off anything with disks.

Looks good overall, this is probably not for you if want to do exact counters or match up different counters (i.e +1,+1,+1 for a counter funnel).

If you don't need updates to be consistent, like if you imported hadoop data in without modification, this starts to look like a really good model.


When I was using it, it felt too good to be true! :) It's got a lot of moving parts and internal integrations, however, so it's got to be a ton of work to make open. Still hopeful though.


Can someone enlighten me as to why 6000 tweets a second is something to make a big deal about? At 140 characters per message that comes out to 840,000 bytes/s < 1 Megabytes per second. In 2014 is a service that can handle 1 Megabytes/s impressive?


A single tweet is closer to 2500 bytes on average in most of the data feeds. However, your point still remains that this a "human-scale" data model that is relatively small and with a low data rate. It is pretty easy to engineer systems that can keep up with this even if you are not an expert in designing real-time database systems.

By comparison, many complex machine-generated data sources (e.g. real-time entity tracking) that are sometimes fused with the Twitter firehose operate at millions of complex records every second (often tens of gigabytes per second) that need to be processed, indexed, and analyzed in real-time. You can't deal with this kind of data model using something like Twitter's current architecture because the several order of magnitude difference in velocity and volume exposes the limitations of most database platform designs people typically use.


I think the issue is less with the number of tweets per second (not very impressive as you suggest), and instead the requirement that those same 6000 tweets fan out to all of the various feeds/locations that they need to show up at and in a timely manner. If it was a single page with those 6000 tweets/sec it would as you say be entirely unimpressive.


It's not. Figuring out which of the 240 million accounts they should be sent to is.


I spent a little while trying to implement a twitter back end for fun during their fail whale period and it's really just a data structure problem. (Edit: Figuring out which of the 240 million accounts they should be sent to.) You can literally get the basic freed process running 10k tweets per second with 100 million accounts on a single PC and a 1Gbit Ethernet connection where the Ethernet connection is the bottleneck. Fanning that out to various places to listen in is a fairly basic scaling issue.

Not that I took it vary far, but my first stab to see what the scaling issues would be was actually plenty fast to run there feed process at the time.

PS: The 'trick' is to keep two lists one everything a user follows, and another is everything that follows each account. For showing the messages to someone when they log in you keep the last 10 message Id's with timestamp or sequence ID so if someone follows 5k accounts you can avoid looking at the vast majority of those messages then sign them up the device to listen to new messages. (Sure, sometimes you will need to look past that top 10 but it's rather effective.)

(And for the downvoters I can upload some code if you want to see it.)


Ah, hubris. Try simulating some celebrities talking to one another. 15M followers (Ashton Kutcher) / 10K fanouts per second = 25 minutes to deliver one tweet.

That's the difference between "getting the basic process running" and operating at scale.


Ehh, first off the 2009 twitter was vary different than the 2014 beast. For some context: http://computer.howstuffworks.com/internet/social-networking...

Back when they where growing from 500,000 users to 7 million total users they where having major issues and that's when I was looking into things.

Anyway, not I was suggesting 1GB would be fine today. Still, I was saturating a 1GB connection so 15M (over twice the total users back then) * ~200bytes * 8 / ~1000^3 = ~24 seconds did you have an extra 60x multiplier in there somewhere?


The basic twitter system is easy enough. Now store all that data forever, run ads, collect ad stats, collect server stats, collect service stats, do authentication for API calls, etc. etc.

Things add up eventually. Suddenly your 6000 tweets/s has turned into a million op/s fanout.

Problems become a lot less straightforward over network links too.


Thanks for the offer. I would enjoy looking at the code.


How did you simulate 100 million interconnected accounts? Did you just randomize it? How many people does the average twitter account follow?


I don't know what it really looked like but I randomly had 99% of the accounts follow 100 people and 1% follow 5,000. I don't know how many the average account followed but apparently when a single user followed a few thousand it was giving them issues if the user rapidly refreshed the page.

PS: I did not keep old messages just there ID because that was not going to fit in RAM. My assumption was using Redis or other key value store would be fine what they needed was an internal index so you would only need to look up messages that would be displayed.

Note: there current setup once they worked the bugs out handled a peak of over 100,000 tweets per second in 2013. https://blog.twitter.com/2013/new-tweets-per-second-record-a... Which is well beyond the target I was shooting for.


Google scaling twitter to see the actual requirements of their system.


I'd love to see the code.


+1 for seeing this code!


The word "handle" is doing a lot of work there. It's a collation engine.


We store more than just tweets :)


I don't even look at databases before Aphyr verifies it they keep their promises...


You'd be better served by using his Jepsen series as an opportunity to learn how to evaluate and break databases yourself instead of waiting to be spoonfed.

A $DATABASE_COMPANY was recently looking to hire somebody to be a dedicated database breaker. The Jepsen series was mentioned in the job post.

Worshipping celebrity is unhealthy in what should be an engineering profession and is a sign of a deeply embedded pop culture.

Don't be lazy.


I was kidding.. I never needed to use anything but Postgres.


Article calls Gizzard, "strongly consistent" Gizzard's GitHub page says, "eventually consistent"[1]. What?

[1] https://github.com/twitter/gizzard


Yes, I think the article got that wrong. You are right that Gizzard is eventually consistent. The storage engine for Gizzard is mySQL, so in the event you want to do some node-local indexing, you at least have mySQL's strong consistency, but you won't achieve it in the overall system across keys.


"Real-time" is a bit of a misnomer as far as databases are concerned, especially if you're talking about a system that defaults to eventual consistency. I think they would have been better off saying "high-availability".


High availability isn't the same thing as having consistently low latencies. Some databases have spiky latency graphs without it being an availability issue.


Does anyone else find the opening statement a little misleading? Yes they originally come from one place, but they are sent from Twitter to the app of your choosing via JSON or similar. Sure there's going to be more than one request for the icon sprite and user avatars, but all from Twitter.

"When you open the Twitter app on your smartphone and all those tweets, links, icons, photos, and videos materialize in front of you, they’re not coming from one place. They’re coming from thousands of places."


It sounds like you're misinterpreting. The way I'm reading it, by "thousands of places" they mean "thousands of machines". Which may be an exaggeration, but it should be trivially obvious that data is sharded and must be collated for display.


So this is an internal system right? What is the point of telling the world if it's not available for anyone to look at or use. Perhaps a recruiting exercise?


Google, for one, has been doing this kind of thing for years. Dremel (http://research.google.com/pubs/pub36632.html), for instance, was the subject of a paper they published after they'd been developing and using it internally for years. It's never been open sourced, but there's now Apache Drill (http://incubator.apache.org/drill/) which is based on the same paper. In short, it's part marketing, part recruiting, and part ego. I'm hopeful that Twitter will at least publish a research paper that digs into some of the nuances of Manhattan and compares/contrasts against other technologies.


There is also Parquet from Twitter which is inspired by the Dremel paper: https://blog.twitter.com/2013/dremel-made-simple-with-parque...


Although I think that the article is interesting, I'm missing some details, like more engineering stuff, rather than high level details.



That's pretty typical of a Wired article - it is a 'popular science' magazine, after all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: