TerarkDB is a RocksDB replacement at ByteDance, optimized for tail latency, throughput and compression.
RocksDB is a fork of Google's LevelDB [1] at Facebook. Optimized to exploit many CPU cores, and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads.
LevelDB [2] is an open-source on-disk key-value store written by Google fellows Jeffrey Dean and Sanjay Ghemawat. Inspired by Bigtable.
Bigtable [3] is a compressed, high performance, proprietary data storage system built on Google File System, Chubby Lock Service, SSTable (log-structured storage like LevelDB) and a few other Google technologies.
I hope the info above is useful for others as I dont have the slightest idea what RockDB is, but I do know LevelDB.
I remember reading about this a few years ago here. If I remember correctly back then the main selling point was that it used succinct data structure and it was only the compression algo that was not open source - everything else was.
But now when I look at the new repo and the online doc there is no mention of succinct data struct anywhere.
Also, the benchmarks back then claimed 10x or more faster than RocksDB. Now the performance claim is much more modest.
Does that mean TerarkDB no longer uses succinct data struct? Or are you just open sourcing a lower-end version of the software without the secret sauce?
Can you talk about what makes TerarkDB faster than RocksDB?
Thanks for your attention, glad someone here still remember our history, TerarkDB is now FULLY open source with `succinct data structures`.
Here's the reasons:
1. Our `all-in-one` docs are still under writing, we will cover that part later.
2. For the performance part, we are now showing real-world cases, not a well-designed benchmark.(We selected the best result to show our work few years ago, don't want to do it anymore)
3. About why TerarkDB is faster than RocksDB will be explained in our `all-in-one docs` in one week, and most of the reasons are not magic, just engineering efforts
I think the performance images would be a lot more clear if they were on the same scale, as it stands it was unclear what was happening with, e.g. the disk write image until i zoomed into the axes.
Your all-in-one docs[1] refuse to render at all in firefox? Seems like a strange restriction, and disappointing that it doesn't even let you read the document in non-webkit browsers. Feels like the IE days all over again.
"An error occurred. This browser is not supported, click here to learn more."
Works perfectly for me in Firefox (macOS). Might be a misleadingly worded "something went wrong, we don't know what" message. Probably want to check your console, could be a network problem.
Yeah there's some unreadable minified error stack traces in the console. Weirdly it seems to load fine and I can scroll the content for a second or two while its still loading in. Then it puts up an error dialog and I can't access the content any more. Once thats happened if I reload the page the error dialog comes up immediately and the page doesn't bother loading behind it.
I'm on firefox on MacOS too, and it happens with my ad blocker on or off. Chrome works fine. The support document linked in the error explicitly says only chrome and safari are supported on macos. I'm confused why my grandparent comment has been downvoted - this is a real bug report stopping me (and maybe others) from reading documentation that looks to have a lot of thought put into it. And given the content seemed to be loading fine before the error message came up, well, it feels forced.
1. We changed the source code too much that we are not able to merge it back to RocksDB easily (This project started at 2016 as an close-source project)
2. We have different road path with RocksDB (e.g. We will remove a lot of un-used code to make TerarkDB much more light-weight than current version in the future)
3. We have lots of third-party partners (e.g. Intel, on Opatane SSD/Memory and others with ZNS...) may participant in this project
so we want to handle all commits ourself to make sure everything is under control.
It's open source now, right? Outside of 2 and 3, could someone incorporate (some) of the improvements from TerarkDB into RocksDB? Or does it truly require some major rewrite to achieve the tail-latency benefits?
The comparison figures presented looked really impressive, thanks for sharing it.
First, it’s reeaallllyyyy expensive to invest enough in an open source project that you have a reasonable chance of steering it.
Second, even if you do the first, the whole thing gets screwed up again when you start trying to introduce vendor code into the mix. Generally, no one upstream gives a crap that you have super compelling business reasons to compromise on code quality (or even trivial things like how code is committed: tarballs vs good git hygiene), and vendors sometimes compromise a lot.
So it’s not surprising that sometimes groups choose to do the expedient thing to get something to market instead of doing things “the right way.” In a lot of respects, the original Android did this with Linux.
Imagine if there were multiple incompatible and competing linux kernels. What we have now is AMD/MS/Apple etc... contributing to the kernel through "vendor code". Imagine if AMD released a AMDLinux and Nvidia had NvidiaLinux.
This already happens, because most (?) people aren’t running vanilla kernels. Many (most?) distros compile their kernels with config options and patches that “make sense to them.” In the most egregious cases, you end up with things like bpf being intentionally broken by default.
It is perfectly in line with open source philosophy to be able to fork a project and have control over my fork. Especially given 2 where they have different goals from upstream.
Amazon did not fork mongodb, they won’t touch AGPL code, they reimplemented the server side protocol and a backend implementation on top of postgresql afaics.
It feels like this is healthy, organic and very much in line with the ethos of open source to see a project take this path and arrive back in open source. If the rocks team wanted to cherry pick some compatible advancements from this project they are now free to do so.
There are much more egregious and fundamentally different violations to open source namely those you mention in your comment.
Wasn't the driver for nvim specifically disagreements with the direction/priorities/steer of the project? Is progress in a different direction necessarily a bad thing, especially if that effort couldn't be directly applied to the original anyway?
Please someone feel free to correct me, but if I recall correctly a lot of the improvements in Vim 8 were a result of the popularity of functionality in NeoVim?
You're correct -- which is why I've used it as an example of forking for project-control reasons to be perfectly in line with an open-source philosophy.
I didn't know this. How do I contribute to Oracle's Unbreakable Linux or Redhat's RHEL? I know I can fork them, but not sure how I can push my commits into their code and didn't realize that was required!
Leadership or steering committee is a key factor for open source projects operated by companies. A closed pull request with comment "We won't accept the pull request because ..." should not be on the trajectory of an infrastructure project, which is to be/being widely used by any giant vendor.
So RocksDB came from LevelDB and here we go again.
We are working on our `all-in-one docs` which will explain everything.
I want to address that we are not meant to "get rid of" RocksDB (which lots of KV engine claimed). What we want to do is provide another solution for storage engine users with different road path (focusing on new hardware and heavy-write workloads).
For simple use cases, there will be no difference no matter what engine you use.
And for most cases, upgrade your hardware (e.g. SATA SSD to NVMe SSD) or tuning your RocksDB parameters would save you lots time, just make sure you understand what you are doing.
There's no cue for every workloads, try TerarkDB if RocksDB happens not fit your scenario.
The reasons we did a better job(from our own perspective) than RocksDB are:
1. We moved lots of code out side db_mutex (db mutex is convenient but costs too much)
2. We introduced a new KV separation implementation that we believe is better than RocksDB’s implementation (we didn't hear any production user are using RocksDB's KV separation yet)
3. We introduced a lazy compaction strategy that can delay compaction task while online services are dealing with short-time heavy writing.
4. Other optimizations like time histogram based TTL, pipelined WAL sync.
I see "#include <terark/fsa/cspptrie.inl>" in the "memtable/terark_zip_entry_index.cc" but I can't find "cspptrie.inl" in the repo.
Is the code auto-generated or not open source now?
Sorry for the unclear response. 1) We use TerarkDB under a distributed SQL database and TerarkDB helps to store its pages (16KB page), its one of the most widely used SQL database inside Bytedance. 2) We use TerarkDB under a Redis compatible distributed cache system to store raw key value pairs.
Almost all kinds of workloads are here since TerarkDB runs under too many database clusters (each cluster only serves a single application)
Since TerarkDB is latency-optimized, it would be quite interesting to test it with Kafka Streaming or Flink which are currently using RocksDB for stateful stream processing.
Hopefully, that will also brings better Java test coverage and integration.
I have helped backport RocksDB Java API changes for Kafka in the past. The main issue in the integration seems to be organisational rather than technical. Kafka has a more conservative release approach, whereas RocksDB has lots of releases and the API changes frequently.
I would be happy to be contacted about specific Java API issues with RocksDB; Maybe I can help.
We didn't test the Java Binding for quick a long time, I am not sure if it can still compile the Java Binding well, please fill an issue on Github if you find it didn't work anymore, thanks!
I remembered that we tried it on Flink in early versions are the result is pretty good.
Some of the benefits of this come from separation of values from the keys. This is an increasingly widely used technique: it is described in the WiscKey paper [1] and is also used in the PingCAP fork of RocksDB. It seems Chinese companies like forking RocksDB, I am not sure why, perhaps the combination of firewall+language barrier just makes it easier to fork and move fast than try to work regularly with upstream.
By separating out large values there's less write amplification and things get faster because more of the SSTs fit in RAM cache. RocksDB wasn't historically a great choice to hold things like file uploads - you'd use the traditional filesystem for that. But that's quite constraining. When large values work better, it not only is a performance increase, but it enables new software designs too.
What I don't understand on many pages (e.g. the TerarkDB Github README, or all the release pages linked from HN):
Why not explain your project to me first? Assume I know nothing about the project and followed a link from HN.
"TerarkDB is a RocksDB replacement"
doesn't help me, what is RocksDB?
This is a huge missed chance for projects to get new users. Start every release note with a sentence explaining your project. Assume people reading your release notes are non-users.
But what if RocksDB or TerakDB would solve a specific problem I'm facing and I just don't know of this solution? Lots of us have problems but don't know the exact tech stack to solve our problems; this is true at nearly every tech company I've worked for. My favourite anecdote for this was a guy basically reinvented map/reduce in the form of hacky scripts from Hadoop round about the time Hadoop and map/reduce was starting to get traction.
There are two kinds of mindsets when it comes to learning: a consumer mindset and an autodidact mindset.
In an organization, it's easy to recognize consumers -- they typically say things like: "I don't understand this. Is there a training course for this that I can sign up for?" and expect to be assigned to an internal training session or to some external course.
An autodidact on the other hand goes: "I don't understand this. Let me do some research on my own and try to teach myself."
I've been both at various junctures in my life but I've learned that in order to progress to higher levels, it's better to be an autodidact instead of a consumer. When it comes to new knowledge, there's rarely someone who will feed it to me -- I have to take the initiative to learn it myself.
There's nothing wrong in asking for a clarifying blurb (good marketing aims to make things frictionless for potential customers). But RocksDB is its own universe and it's actually pretty well known. I don't work in this space, and even I know what RocksDB is because it has come up a lot in technical discussions about storage engines. When I first encountered it, I had no idea what it was, but I gathered from comments that people were excited about it, so I googled "wiki rocksdb". It took 2 seconds.
Truly curious people are autodidacts, not consumers.
p.s. the HN comment section is a great venue to "overhear" what the community is talking about and what they find exciting. It provides a good signal to dive into certain topics. Knowledge acquisition is very much a sociological exercise as much as it is an individual one.
> An autodidact on the other hand goes: "I don't understand this. Let me do some research on my own and try to teach myself."
As someone who is very autodidact, I can tell you, that just because I don't understand something doesn't mean I go and learn it. There are far too many things to do and to learn just to go and learn things when I have a reason to. Just like consumers will ask for training sessions when they have a reason to.
This is marketing 101, you have a product and you want people to use it. Even in open source, you still want people to use it, you want there to be value in the thing that you built. Build it and they will come doesn't work.
> There's nothing wrong in asking for a clarifying blurb (good marketing aims to make things frictionless for potential customers). But RocksDB is its own universe and it's actually pretty well known. I don't work in this space, and even I know what RocksDB is because it has come up a lot in technical discussions about storage engines. When I first encountered it, I had no idea what it was, but I gathered from comments that people were excited about it, so I googled "wiki rocksdb". It took 2 seconds.
I have heard of RocksDB before, but I still don't know the use-case, why? Because there are so many other database systems. And even if I was using RocksDB looking at that paper I don't know why as a company I would invest in a rewrite to switch over to this new one, since the performance benefits don't seem massively clear.
> Truly curious people are autodidacts, not consumers.
What I think you think autodidacts are, are people with no focus and spend time researching every new thing that pops up. The sort of people that "Jack of all trades, master of none" is made up to describe.
You've made that point much better than me. As someone who teached myself coding in a department store as a kid in 80 or 81 and learned 20+ languages in the last 40 years on my own, I exactly agree with your point about autodidacts.
I'm interested in a many different things, as you I have read about RocksDB before but my time is limited between my family, hobbies and work.
I'm not saying don't learn RocksDB - quite the opposite. It's a great tool to have in your toolbelt.
I'm saying that unless you've used and hit the limits of RocksDB - and it's already absurdly fast - there's zero reason to utilize this project.
Maybe it'll mature one day, have multiplatform support and a wide array of client libraries, and be to RocksDB what RocksDB was to LevelDB. But today is not that day.
For now, developers that don't immediately understand what this project is for would best be served with a simple link to RocksDB.
And how should I know this from the README? How should I decide between a project that is interesting and not explained from a project that is not intended for me?
> How should I decide between a project that is interesting and not explained from a project that is not intended for me?
You first try rocksdb/lmdb, learn it all, break things, hit limitations, and lower your standards enough to search for other things, that don't have comprehensive documentation but just a small readme/paper and checking the code.
Don't expect to build a better Postgresql on the first try.
I was rephrasing my point because some didn't understand it the first time. It appears you are cherry picking and missresenting my post again.
It had several points where the first point was about me as someone reading dozens of HN posts a day following links and needing to go on a googling spree to find out about a product I might or might not be interested in.
The other point was about the missed opportunity to attract users.
And even the second point used "projects" and talked about relase pages which made it clear that the it's about the generality of the problem (the linked page wasn't even a release page) Picking out the specifics you either haven't read the post or are misresenting it intentionally.
You gave advice to the project on how to attract users that don't know what RocksDB is. I'm saying that they probably don't want users that don't know what RocksDB is.
RocksDB is a fork of Google's LevelDB [1] at Facebook. Optimized to exploit many CPU cores, and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads.
LevelDB [2] is an open-source on-disk key-value store written by Google fellows Jeffrey Dean and Sanjay Ghemawat. Inspired by Bigtable.
Bigtable [3] is a compressed, high performance, proprietary data storage system built on Google File System, Chubby Lock Service, SSTable (log-structured storage like LevelDB) and a few other Google technologies.
I hope the info above is useful for others as I dont have the slightest idea what RockDB is, but I do know LevelDB.
[1] https://en.wikipedia.org/wiki/RocksDB
[2] https://en.wikipedia.org/wiki/LevelDB
[3] https://en.wikipedia.org/wiki/Bigtable