Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The description sounds very complicated. I doubt this will ever fit into my brain, which would make me wary to ever use this. It seems more like a "voodoo data structure" like Judy tables, where you have to trust some library and likely there's no practical way to debug it when it ever goes wrong. Perhaps it has nice properties, but it just seems risky to use something like this. In this case I don't even see a link to a library.


I agree with you that I'll probably never use this, at least not in the next few years. But I think it's worth putting the research in context-- they've only been published for 2 days now. Expecting a production-ready library is... unreasonable? The tooling, the documentation, the textbooks, the wiki page, etc aren't going to compare to Bloom filters, which have been used for over 40 years.


"However, we do not implement efficiency gains at all engineering costs, so it’s also important to have a user-friendly data structure. This issue stalled implementation of other Bloom alternatives offering some space savings. "

TFA article shows that they do share your worries, but they believe this approach is indeed simple enough to be worth it.


That’s a separate issue. The paper is concerned with the data structure not having gotchas (i.e. it is performant across a wide, continuous range of configuration and input values) whereas nn3 is concerned about personally understanding the design of the data structure.


The two aspects are intimately tied. A library that implements an algorithm that has few gotchas and can be used intuitively requires less troubleshooting/debugging and understanding.

Sure you need to trust the authors. I'm sure we regularly do such leap of faith all the time we use various software components we surely don't review on a daily basis, not sure why this particular tool should be judged a different standard (provided that is indeed straightforward to operate)


I suspect they just refer to "black box practicality" here. As in they have a nice library for FB developers and the API is simple enough. And if something goes wrong the FB person can contact the author and they will fix it for them. I guess that's practicable enough for them.

I was looking more for "I can understand/implement/debug it myself" practicability, avoiding black boxes.

Even in the FB case there is the bus factor of course. When the author at some point moves on to greener pastures they can only hope that it still fits into the brain of whoever replaces him.


Obacht. They don't even mention the original Bloom filter problem, the constant overhead of calculating 3-5 hashes per key, which is problematic with overly slow hash functions (like siphash) or overlong keys. Then it's faster to go without bloom, xor or ribbon, which just cancels searching for not-included keys.

Or if you know that your key is always included in the dataset, it is pure overhead.


> TFA article

??


Yes A stands for Article, of course :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: