The description sounds very complicated. I doubt this will ever fit into my brai...

throwthere · on July 11, 2021

I agree with you that I'll probably never use this, at least not in the next few years. But I think it's worth putting the research in context-- they've only been published for 2 days now. Expecting a production-ready library is... unreasonable? The tooling, the documentation, the textbooks, the wiki page, etc aren't going to compare to Bloom filters, which have been used for over 40 years.

ithkuil · on July 11, 2021

"However, we do not implement efficiency gains at all engineering costs, so it’s also important to have a user-friendly data structure. This issue stalled implementation of other Bloom alternatives offering some space savings. "

TFA article shows that they do share your worries, but they believe this approach is indeed simple enough to be worth it.

bradleyjg · on July 11, 2021

That’s a separate issue. The paper is concerned with the data structure not having gotchas (i.e. it is performant across a wide, continuous range of configuration and input values) whereas nn3 is concerned about personally understanding the design of the data structure.

ithkuil · on July 11, 2021

The two aspects are intimately tied. A library that implements an algorithm that has few gotchas and can be used intuitively requires less troubleshooting/debugging and understanding.

Sure you need to trust the authors. I'm sure we regularly do such leap of faith all the time we use various software components we surely don't review on a daily basis, not sure why this particular tool should be judged a different standard (provided that is indeed straightforward to operate)

nn3 · on July 11, 2021

I suspect they just refer to "black box practicality" here. As in they have a nice library for FB developers and the API is simple enough. And if something goes wrong the FB person can contact the author and they will fix it for them. I guess that's practicable enough for them.

I was looking more for "I can understand/implement/debug it myself" practicability, avoiding black boxes.

Even in the FB case there is the bus factor of course. When the author at some point moves on to greener pastures they can only hope that it still fits into the brain of whoever replaces him.

rurban · on July 12, 2021

Obacht. They don't even mention the original Bloom filter problem, the constant overhead of calculating 3-5 hashes per key, which is problematic with overly slow hash functions (like siphash) or overlong keys. Then it's faster to go without bloom, xor or ribbon, which just cancels searching for not-included keys.

Or if you know that your key is always included in the dataset, it is pure overhead.

_eojb · on July 11, 2021

> TFA article

??

ithkuil · on July 11, 2021

Yes A stands for Article, of course :-)