Show HN: A fast, hopefully accurate, fuzzy matching library written in Go

ericpauley · on April 29, 2018

Made a PR to switch to runes for iteration. Runes are canonical in Go for unicode codepoints and also have no memory allocation, so they're wicked fast! More importantly they make the code compatible with unicode names.

You can also save on a ton of allocation if you reuse unleaked position slices on each match. It may also be nice to have a maxMatches argument that lets users set a limit, which would save on unnecessary allocation.

sahilmuthoo · on April 29, 2018

/xpost from the PR :)

Wow! mind = blown. Please let me digest this code. I will merge later today. I'm probably going to come back and ask a few questions. I wouldn't want to pass up the opportunity to learn from you.

Thank you very much!

ericpauley · on April 29, 2018

Thanks for the cool project!

On a side note, working on this made me realize that case-insensitive comparison in Go is pretty inefficient, made a CL to improve it: https://go-review.googlesource.com/c/go/+/110018

wincent · on April 29, 2018

If you want to make it really fast you could steal some ideas from: https://wincent.com/blog/optimization (tales of many years of optimizing a fuzzy search implementation).

sahilmuthoo · on April 29, 2018

Thanks for writing command-T :)

Let me pull up some embarrassing numbers of fuzzy matching on all of Chromium.

derwiki · on April 29, 2018

I still use and recommend Command-T :-)

glangdale · on April 29, 2018

Have you tried out one of the standard 'distance' metrics like Hamming distance or Levenstein distance? This would at least give you an objective measure of success. Whether that's something that actually works for what you're trying to achieve is of course an open question, but both Hamming distance matching and Levenstein distance matching (the latter is harder but probably better for your purposes) are very well understood.

purple-again · on April 29, 2018

I made a similar tool and believe that if the author is interested in this route he may prefer the Jaro-Winkler algorithm as it is more highly tailored to names.

Likely he rolled his own solution for the same reason that I did, he is targeting a specific type of names (file names for him) which comes with its own subset of rules that can be incorporated for better matching.

_-david-_ · on April 29, 2018

How does this compare with fzf (https://github.com/junegunn/fzf)?

sahilmuthoo · on April 29, 2018

fzf is a tool with a much larger feature set. fuzzy is just a humble library that you can add to your code.

Rafert · on April 29, 2018

The logical follow-up question would be; how does it compare to fzy? :) https://github.com/jhawthorn/fzy

forrestthewoods · on April 29, 2018

I approve of any and all fuzzy matchers. The world would be a better place if every search field was a fuzzy search field. Thanks for the shout out!

amjith · on April 29, 2018

I did a similar library for Python (~3yrs ago).

blog: http://blog.amjith.com/fuzzyfinder-in-10-lines-of-python

library: https://github.com/amjith/fuzzyfinder

I haven't done any benchmarking to check the performance. Feedback welcome.

jokh · on April 29, 2018

Nice! How do you rank the file names when returning the search results? It'd be cool if you could rank them by factoring in git commit frequency for files, because more frequently modified files are more likely being searched for. Cool project!

_xnmw · on April 29, 2018

Now if only this could be baked into VSCode for workspace-global symbol search. I would then finally ditch WebStorm.

arghwhat · on April 29, 2018

Hopefully accurate? It's not supposed to be accurate! It's fuzzy search!

mseepgood · on April 29, 2018

That's the joke. Glad you explained it.

arghwhat · on April 30, 2018

I'm glad I could be of service.

anirudh83 · on April 29, 2018

cool stuff!!

tdurden · on April 29, 2018

Why do projects written in Go always advertise they are written in Go?

_pfxa · on April 29, 2018

In this case this is a library and the information that it is written in Go is indeed useful: you can easily use it from Go code, not so much from other langauges.

virmundi · on April 29, 2018

Novelty and notoriety. I’ll throw in kindred spirits too. Go is still trying to get traction. It’s not Java, nor C#, not PHP. It’s still trying to prove itself outside of well know, but niche projects like Docker.

asdjlkadsjklads · on April 29, 2018

/shrug, language matters to me. I like knowing what code i can interface with. Generally speaking, written in Go is a boon to me (as a Go dev obv), written in Python/Node/Ruby is a negative with runtime requirements for me.

bitwize · on April 29, 2018

Hopefully accurate?

You should probably write some tests to ascertain that.

sahilmuthoo · on April 29, 2018

I've found it hard to write tests (though there are a few here - https://github.com/sahilm/fuzzy/blob/master/fuzzy_test.go ). Match quality is often subjective. Tweaking one parameter messes with others.

This library came out of a project I started. The project never saw the light of day. If I do see real world use, it'll be easier to find bugs and fix them :)

yani · on April 29, 2018

No tests makes it impossible to add it to real world projects IMO

sahilmuthoo · on April 29, 2018

There are tests.

bacongobbler · on April 29, 2018

That sounds to me like a great starting point for others to contribute code. :P

Can't we all just be happy that someone decided to write a fun project and made it freely available for others to learn from?

wumpus · on April 29, 2018

People are supposed to be gentle in Show HN postings, but often they’re a bit mean.

userbinator · on April 29, 2018

I suspect "hopefully" is in reference to the fact that it's doing fuzzy matching.