TerarkDB, ByteDance's RocksDB replacement

ksec · on Dec 23, 2020

TerarkDB is a RocksDB replacement at ByteDance, optimized for tail latency, throughput and compression.

RocksDB is a fork of Google's LevelDB [1] at Facebook. Optimized to exploit many CPU cores, and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads.

LevelDB [2] is an open-source on-disk key-value store written by Google fellows Jeffrey Dean and Sanjay Ghemawat. Inspired by Bigtable.

Bigtable [3] is a compressed, high performance, proprietary data storage system built on Google File System, Chubby Lock Service, SSTable (log-structured storage like LevelDB) and a few other Google technologies.

I hope the info above is useful for others as I dont have the slightest idea what RockDB is, but I do know LevelDB.

[1] https://en.wikipedia.org/wiki/RocksDB

[2] https://en.wikipedia.org/wiki/LevelDB

[3] https://en.wikipedia.org/wiki/Bigtable

karlding · on Dec 23, 2020

CockroachDB also has Pebble [0], which is mentioned in their blog post [1] as a RocksDB replacement for their use-case.

[0] https://github.com/cockroachdb/pebble

[1] https://www.cockroachlabs.com/blog/pebble-rocksdb-kv-store/

dboreham · on Dec 23, 2020

Forgot the reference to BerkeleyDB ;)

joshspankit · on Dec 23, 2020

Excellent, thank you. I wish more announcements had this type of breakdown.

anticensor · on Dec 23, 2020

LevelDB and Hadoop are cousins according to that family tree.

royguo1988 · on Dec 23, 2020

TerarkDB was acquired by Bytedance two years ago and is now using widely in Bytedance's database services.

I am one of the maintainers of this project you can ask any question here.

continuations · on Dec 23, 2020

I remember reading about this a few years ago here. If I remember correctly back then the main selling point was that it used succinct data structure and it was only the compression algo that was not open source - everything else was.

But now when I look at the new repo and the online doc there is no mention of succinct data struct anywhere.

Also, the benchmarks back then claimed 10x or more faster than RocksDB. Now the performance claim is much more modest.

Does that mean TerarkDB no longer uses succinct data struct? Or are you just open sourcing a lower-end version of the software without the secret sauce?

Can you talk about what makes TerarkDB faster than RocksDB?

royguo1988 · on Dec 23, 2020

Thanks for your attention, glad someone here still remember our history, TerarkDB is now FULLY open source with `succinct data structures`.

Here's the reasons: 1. Our `all-in-one` docs are still under writing, we will cover that part later. 2. For the performance part, we are now showing real-world cases, not a well-designed benchmark.(We selected the best result to show our work few years ago, don't want to do it anymore) 3. About why TerarkDB is faster than RocksDB will be explained in our `all-in-one docs` in one week, and most of the reasons are not magic, just engineering efforts

Thanks again for your remembering us.

e12e · on Dec 23, 2020

> most of the reasons are not magic

So... You're saying there is magic? :)

continuations · on Dec 23, 2020

Great. Looking forward to learn more about this.

why_only_15 · on Dec 23, 2020

I think the performance images would be a lot more clear if they were on the same scale, as it stands it was unclear what was happening with, e.g. the disk write image until i zoomed into the axes.

royguo1988 · on Dec 23, 2020

Thanks for your suggestion, I will update the image soon

nikhilsimha · on Dec 23, 2020

Always love it when I see a maintainer offer clarifications in a HN comment section! <3

What are the reasons for the perf improvements we see here?

royguo1988 · on Dec 23, 2020

We are working on our `all-in-one docs` right now, please watch our repo, thanks! I replied some of the reasons in previous comment.

josephg · on Dec 23, 2020

Your all-in-one docs[1] refuse to render at all in firefox? Seems like a strange restriction, and disappointing that it doesn't even let you read the document in non-webkit browsers. Feels like the IE days all over again.

"An error occurred. This browser is not supported, click here to learn more."

[1] https://bytedance.feishu.cn/docs/doccnZmYFqHBm06BbvYgjsHHcKc

ddorian43 · on Dec 23, 2020

It renders on ubuntu 18.04 Firefox but also displays "This browser not supported with https://www.feishu.cn/hc/en-us/articles/360038713913". Probably doesn't support linux.

royguo1988 · on Dec 24, 2020

This is wired, I will call our internal Lark team to deal with it.

erk__ · on Dec 23, 2020

I get the same on Windows 10 so it is probably the browser.

oefrha · on Dec 23, 2020

Works perfectly for me in Firefox (macOS). Might be a misleadingly worded "something went wrong, we don't know what" message. Probably want to check your console, could be a network problem.

josephg · on Dec 23, 2020

Yeah there's some unreadable minified error stack traces in the console. Weirdly it seems to load fine and I can scroll the content for a second or two while its still loading in. Then it puts up an error dialog and I can't access the content any more. Once thats happened if I reload the page the error dialog comes up immediately and the page doesn't bother loading behind it.

I'm on firefox on MacOS too, and it happens with my ad blocker on or off. Chrome works fine. The support document linked in the error explicitly says only chrome and safari are supported on macos. I'm confused why my grandparent comment has been downvoted - this is a real bug report stopping me (and maybe others) from reading documentation that looks to have a lot of thought put into it. And given the content seemed to be loading fine before the error message came up, well, it feels forced.

sudeepj · on Dec 23, 2020

Why not merge the improvements into RocksDB itself?

royguo1988 · on Dec 23, 2020

There are mainly three reasons here:

1. We changed the source code too much that we are not able to merge it back to RocksDB easily (This project started at 2016 as an close-source project) 2. We have different road path with RocksDB (e.g. We will remove a lot of un-used code to make TerarkDB much more light-weight than current version in the future) 3. We have lots of third-party partners (e.g. Intel, on Opatane SSD/Memory and others with ZNS...) may participant in this project so we want to handle all commits ourself to make sure everything is under control.

loeg · on Dec 23, 2020

It's open source now, right? Outside of 2 and 3, could someone incorporate (some) of the improvements from TerarkDB into RocksDB? Or does it truly require some major rewrite to achieve the tail-latency benefits?

The comparison figures presented looked really impressive, thanks for sharing it.

ssakamoto · on Dec 23, 2020

3) is not in line with an open source philosophy.

EDIT: Detrimental to the original. Eg. Amazon forking and selling MongoDB.

alexgartrell · on Dec 23, 2020

First, it’s reeaallllyyyy expensive to invest enough in an open source project that you have a reasonable chance of steering it.

Second, even if you do the first, the whole thing gets screwed up again when you start trying to introduce vendor code into the mix. Generally, no one upstream gives a crap that you have super compelling business reasons to compromise on code quality (or even trivial things like how code is committed: tarballs vs good git hygiene), and vendors sometimes compromise a lot.

So it’s not surprising that sometimes groups choose to do the expedient thing to get something to market instead of doing things “the right way.” In a lot of respects, the original Android did this with Linux.

Competition is good.

klodolph · on Dec 23, 2020

> In a lot of respects, the original Android did this with Linux.

Android vendors keep doing this over and over again with Linux, which explains why so many phones are stuck on old versions of Android.

ssakamoto · on Dec 23, 2020

Imagine if there were multiple incompatible and competing linux kernels. What we have now is AMD/MS/Apple etc... contributing to the kernel through "vendor code". Imagine if AMD released a AMDLinux and Nvidia had NvidiaLinux.

alexgartrell · on Dec 24, 2020

This already happens, because most (?) people aren’t running vanilla kernels. Many (most?) distros compile their kernels with config options and patches that “make sense to them.” In the most egregious cases, you end up with things like bpf being intentionally broken by default.

cowsandmilk · on Dec 23, 2020

It is perfectly in line with open source philosophy to be able to fork a project and have control over my fork. Especially given 2 where they have different goals from upstream.

ssakamoto · on Dec 23, 2020

No necessarily true in this case. They are compatible and it's merely a performance improvements from the code.

kapilvt · on Dec 23, 2020

Amazon did not fork mongodb, they won’t touch AGPL code, they reimplemented the server side protocol and a backend implementation on top of postgresql afaics.

rishav_sharan · on Dec 23, 2020

How so? Unless they are stopping normal users from committing code as well?

tinco · on Dec 23, 2020

Even if they stopped normal users from committing it would still be adhering to open source philosophy.

tgtweak · on Dec 23, 2020

It feels like this is healthy, organic and very much in line with the ethos of open source to see a project take this path and arrive back in open source. If the rocks team wanted to cherry pick some compatible advancements from this project they are now free to do so.

There are much more egregious and fundamentally different violations to open source namely those you mention in your comment.

setr · on Dec 23, 2020

Sure it is; it’s exactly equivalent to something like forking Linux with the reasoning “I want to be the BDFL now” — eg the nvim fork

haar · on Dec 23, 2020

Wasn't the driver for nvim specifically disagreements with the direction/priorities/steer of the project? Is progress in a different direction necessarily a bad thing, especially if that effort couldn't be directly applied to the original anyway?

Please someone feel free to correct me, but if I recall correctly a lot of the improvements in Vim 8 were a result of the popularity of functionality in NeoVim?

setr · on Dec 23, 2020

You're correct -- which is why I've used it as an example of forking for project-control reasons to be perfectly in line with an open-source philosophy.

ssakamoto · on Dec 23, 2020

No disagreements there. Contention is it's not good for the original.

random5634 · on Dec 23, 2020

I didn't know this. How do I contribute to Oracle's Unbreakable Linux or Redhat's RHEL? I know I can fork them, but not sure how I can push my commits into their code and didn't realize that was required!

ssakamoto · on Dec 23, 2020

I did not say it was required. But you can always contribute.

smarx007 · on Dec 23, 2020

(3) is exactly how SQLite is developed

nextaccountic · on Dec 23, 2020

> Eg. Amazon forking and selling MongoDB.

Are they giving back the source? And letting Mongo merge their changes if they wish?

Because that's what open source is all about.

ivzhh · on Dec 23, 2020

Leadership or steering committee is a key factor for open source projects operated by companies. A closed pull request with comment "We won't accept the pull request because ..." should not be on the trajectory of an infrastructure project, which is to be/being widely used by any giant vendor.

So RocksDB came from LevelDB and here we go again.

yomly · on Dec 23, 2020

Do you have a write up for why you got rid of RocksDB

royguo1988 · on Dec 23, 2020

We are working on our `all-in-one docs` which will explain everything.

I want to address that we are not meant to "get rid of" RocksDB (which lots of KV engine claimed). What we want to do is provide another solution for storage engine users with different road path (focusing on new hardware and heavy-write workloads).

For simple use cases, there will be no difference no matter what engine you use.

And for most cases, upgrade your hardware (e.g. SATA SSD to NVMe SSD) or tuning your RocksDB parameters would save you lots time, just make sure you understand what you are doing.

There's no cue for every workloads, try TerarkDB if RocksDB happens not fit your scenario.

royguo1988 · on Dec 23, 2020

The reasons we did a better job(from our own perspective) than RocksDB are: 1. We moved lots of code out side db_mutex (db mutex is convenient but costs too much) 2. We introduced a new KV separation implementation that we believe is better than RocksDB’s implementation (we didn't hear any production user are using RocksDB's KV separation yet) 3. We introduced a lazy compaction strategy that can delay compaction task while online services are dealing with short-time heavy writing. 4. Other optimizations like time histogram based TTL, pipelined WAL sync.

ddorian43 · on Dec 24, 2020

There is https://pingcap.com/blog/titan-storage-engine-design-and-imp... that splits keys from values.

ssakamoto · on Dec 23, 2020

Why not submit these improvements to rocksdb ?

meta2meta · on Dec 23, 2020

I see "#include <terark/fsa/cspptrie.inl>" in the "memtable/terark_zip_entry_index.cc" but I can't find "cspptrie.inl" in the repo. Is the code auto-generated or not open source now?

wanghenshui · on Dec 23, 2020

submodule, https://github.com/bytedance/terark-zip

royguo1988 · on Dec 23, 2020

All source code is open source now. You can find them in `third-party/terark-zip`, terark-zip is a standalone repo that contains only core algorithms.

polskibus · on Dec 23, 2020

You may want to update the TerarkDb entry on dbdb.io.

royguo1988 · on Dec 23, 2020

Thanks, tried to log-in & reset my password but didn't receive reset email.

apavlo · on Dec 23, 2020

Email me (pavlo@cs.cmu.edu). I don't think you ever had an account.

gurkanoluc · on Dec 23, 2020

Is TerarkDB used as store engine for MySQL like FB does?

loeg · on Dec 23, 2020

That glue layer is called MyRocks (MySQL -> MyRocks -> RocksDB). It may be possible to slot this in to replace RocksDB in that stack. (I don't know.)

supergirl · on Dec 23, 2020

where is it used? what kind of data is stored in it?

royguo1988 · on Dec 23, 2020

In bytedance, a few database services are using TerarkDB.

supergirl · on Dec 23, 2020

yes, I got that :) but can you say more? what kind of database services? what data is stored, what is the scale, what are the requirements, etc.

royguo1988 · on Dec 23, 2020

Sorry for the unclear response. 1) We use TerarkDB under a distributed SQL database and TerarkDB helps to store its pages (16KB page), its one of the most widely used SQL database inside Bytedance. 2) We use TerarkDB under a Redis compatible distributed cache system to store raw key value pairs.

Almost all kinds of workloads are here since TerarkDB runs under too many database clusters (each cluster only serves a single application)

amrx431 · on Dec 23, 2020

[flagged]

siggen · on Dec 23, 2020

Given that it is open source, you can answer this question yourself?

dikei · on Dec 23, 2020

Since TerarkDB is latency-optimized, it would be quite interesting to test it with Kafka Streaming or Flink which are currently using RocksDB for stateful stream processing.

Hopefully, that will also brings better Java test coverage and integration.

adamretter · on Dec 23, 2020

I have helped backport RocksDB Java API changes for Kafka in the past. The main issue in the integration seems to be organisational rather than technical. Kafka has a more conservative release approach, whereas RocksDB has lots of releases and the API changes frequently.

I would be happy to be contacted about specific Java API issues with RocksDB; Maybe I can help.

royguo1988 · on Dec 23, 2020

We didn't test the Java Binding for quick a long time, I am not sure if it can still compile the Java Binding well, please fill an issue on Github if you find it didn't work anymore, thanks!

I remembered that we tried it on Flink in early versions are the result is pretty good.

thu2111 · on Dec 23, 2020

Some of the benefits of this come from separation of values from the keys. This is an increasingly widely used technique: it is described in the WiscKey paper [1] and is also used in the PingCAP fork of RocksDB. It seems Chinese companies like forking RocksDB, I am not sure why, perhaps the combination of firewall+language barrier just makes it easier to fork and move fast than try to work regularly with upstream.

By separating out large values there's less write amplification and things get faster because more of the SSTs fit in RAM cache. RocksDB wasn't historically a great choice to hold things like file uploads - you'd use the traditional filesystem for that. But that's quite constraining. When large values work better, it not only is a performance increase, but it enables new software designs too.

[1] https://www.usenix.org/system/files/conference/fast16/fast16...

KingOfCoders · on Dec 23, 2020

What I don't understand on many pages (e.g. the TerarkDB Github README, or all the release pages linked from HN):

Why not explain your project to me first? Assume I know nothing about the project and followed a link from HN.

"TerarkDB is a RocksDB replacement"

doesn't help me, what is RocksDB?

This is a huge missed chance for projects to get new users. Start every release note with a sentence explaining your project. Assume people reading your release notes are non-users.

reitzensteinm · on Dec 23, 2020

Generally I agree, but if you don't know exactly what RocksDB is and why you'd want to fork it, this just isn't for you.

There's probably a mutually beneficial filter being applied here by not letting beginners stumble in.

that_guy_iain · on Dec 23, 2020

But what if RocksDB or TerakDB would solve a specific problem I'm facing and I just don't know of this solution? Lots of us have problems but don't know the exact tech stack to solve our problems; this is true at nearly every tech company I've worked for. My favourite anecdote for this was a guy basically reinvented map/reduce in the form of hacky scripts from Hadoop round about the time Hadoop and map/reduce was starting to get traction.

wenc · on Dec 23, 2020

There are two kinds of mindsets when it comes to learning: a consumer mindset and an autodidact mindset.

In an organization, it's easy to recognize consumers -- they typically say things like: "I don't understand this. Is there a training course for this that I can sign up for?" and expect to be assigned to an internal training session or to some external course.

An autodidact on the other hand goes: "I don't understand this. Let me do some research on my own and try to teach myself."

I've been both at various junctures in my life but I've learned that in order to progress to higher levels, it's better to be an autodidact instead of a consumer. When it comes to new knowledge, there's rarely someone who will feed it to me -- I have to take the initiative to learn it myself.

There's nothing wrong in asking for a clarifying blurb (good marketing aims to make things frictionless for potential customers). But RocksDB is its own universe and it's actually pretty well known. I don't work in this space, and even I know what RocksDB is because it has come up a lot in technical discussions about storage engines. When I first encountered it, I had no idea what it was, but I gathered from comments that people were excited about it, so I googled "wiki rocksdb". It took 2 seconds.

Truly curious people are autodidacts, not consumers.

p.s. the HN comment section is a great venue to "overhear" what the community is talking about and what they find exciting. It provides a good signal to dive into certain topics. Knowledge acquisition is very much a sociological exercise as much as it is an individual one.

that_guy_iain · on Dec 24, 2020

> An autodidact on the other hand goes: "I don't understand this. Let me do some research on my own and try to teach myself."

As someone who is very autodidact, I can tell you, that just because I don't understand something doesn't mean I go and learn it. There are far too many things to do and to learn just to go and learn things when I have a reason to. Just like consumers will ask for training sessions when they have a reason to.

This is marketing 101, you have a product and you want people to use it. Even in open source, you still want people to use it, you want there to be value in the thing that you built. Build it and they will come doesn't work.

> There's nothing wrong in asking for a clarifying blurb (good marketing aims to make things frictionless for potential customers). But RocksDB is its own universe and it's actually pretty well known. I don't work in this space, and even I know what RocksDB is because it has come up a lot in technical discussions about storage engines. When I first encountered it, I had no idea what it was, but I gathered from comments that people were excited about it, so I googled "wiki rocksdb". It took 2 seconds.

I have heard of RocksDB before, but I still don't know the use-case, why? Because there are so many other database systems. And even if I was using RocksDB looking at that paper I don't know why as a company I would invest in a rewrite to switch over to this new one, since the performance benefits don't seem massively clear.

> Truly curious people are autodidacts, not consumers.

What I think you think autodidacts are, are people with no focus and spend time researching every new thing that pops up. The sort of people that "Jack of all trades, master of none" is made up to describe.

KingOfCoders · on Dec 24, 2020

You've made that point much better than me. As someone who teached myself coding in a department store as a kid in 80 or 81 and learned 20+ languages in the last 40 years on my own, I exactly agree with your point about autodidacts.

I'm interested in a many different things, as you I have read about RocksDB before but my time is limited between my family, hobbies and work.

reitzensteinm · on Dec 23, 2020

I'm not saying don't learn RocksDB - quite the opposite. It's a great tool to have in your toolbelt.

I'm saying that unless you've used and hit the limits of RocksDB - and it's already absurdly fast - there's zero reason to utilize this project.

Maybe it'll mature one day, have multiplatform support and a wide array of client libraries, and be to RocksDB what RocksDB was to LevelDB. But today is not that day.

For now, developers that don't immediately understand what this project is for would best be served with a simple link to RocksDB.

KingOfCoders · on Dec 23, 2020

And how should I know this from the README? How should I decide between a project that is interesting and not explained from a project that is not intended for me?

ddorian43 · on Dec 23, 2020

> How should I decide between a project that is interesting and not explained from a project that is not intended for me?

You first try rocksdb/lmdb, learn it all, break things, hit limitations, and lower your standards enough to search for other things, that don't have comprehensive documentation but just a small readme/paper and checking the code.

Don't expect to build a better Postgresql on the first try.

reitzensteinm · on Dec 24, 2020

You're shifting the goal posts.

You wrote the initial post saying they're hurting themselves by not being more clear. I don't think that's true.

Now you're saying they should be more clear to help beginners in the field. I completely agree.

KingOfCoders · on Dec 24, 2020

I was rephrasing my point because some didn't understand it the first time. It appears you are cherry picking and missresenting my post again.

It had several points where the first point was about me as someone reading dozens of HN posts a day following links and needing to go on a googling spree to find out about a product I might or might not be interested in.

The other point was about the missed opportunity to attract users.

And even the second point used "projects" and talked about relase pages which made it clear that the it's about the generality of the problem (the linked page wasn't even a release page) Picking out the specifics you either haven't read the post or are misresenting it intentionally.

KingOfCoders · on Dec 23, 2020

"but if you don't know exactly what RocksDB"

So things I don't know exactly about are not for me? That seriosly hinders my personal development.

As a two decade CTO I think I should be interested in things I don't know about.

reitzensteinm · on Dec 23, 2020

You gave advice to the project on how to attract users that don't know what RocksDB is. I'm saying that they probably don't want users that don't know what RocksDB is.