That name sound very familiar, as does the feature set. Managing Gigabytes[1], or "mg" was the output of a University of Melbourne and RMIT research in the 1990s. It went on to be commercialized as SIM and later TeraText[2] and has largely disappeared into the government intelligence indexing and consulting-heavy systems space (where it is now presumably being trounced by Palantir).
That's exactly what I thought - I worked on index construction for MG back in 1994. (Note, although my name is Tim Bell, I'm not Timothy C. Bell, the coauthor of "Managing Gigabytes".)
I don't how this project ended up here in this moment, but as one of the authors let me answer the main questions.
1) The name is just a coincidence. I learned originally about indexing from the "Managing Gigabytes" book, and that's the reason for the name, but the book is now completely obsolete, and, even at that time, it contained a significant number of red herrings. There's no connection or code or idea sharing of any kind.
2) MG4J is our playground for doing research in information retrieval. This means, for example, that we designed new data structures, such as Elias-Fano indexing, which make MG4J have ridiculously faster times in benchmarks (see https://github.com/lintool/IR-Reproducibility). Elias-Fano is now the main Facebook indexing algorithm and it is slowly percolating to Lucene (look in the sources).
3) You can define your queries using a very rich interval language with a very fast implementation based on new algorithms. You can easily create parallel indices with text and tagging and ask whether a phrase falls into an area tagged as "location", for example.
2) MG4J is a project of two people and at this time I'm the only maintainer. You cannot expect that it is refined as Lucene or Solr. But you can very easily hack into it (even without modifying the sources), which is why it has been popular with people experimenting with indexing. For example, there are many tools to manipulate index, splitting them with a specified strategy, combining them, etc.
3) So if you want an out-of-the-box solution for indexing, forget about it. If you want a fun playground for doing research or a very efficient backbone on which to build your infrastructure, MG4J might be useful to you. We used it recently for http://wikirank.di.unimi.it/ .
Blast from the past! Distributed is a bit of a stretch, I think you need to coordinate all of that yourself. It is no more distributed than Lucene (I think).
Their fastutil stuff is pretty interesting though for creating highly optimized algorithms. Lot's of primitive based data structures that are fast and memory efficient.
There was a C port at one point - https://github.com/dbalmain/ferret, maybe others. No idea if it's current or what the feature set comparison might look like.
[1] https://www.amazon.com/Managing-Gigabytes-Compressing-Indexi... - Note review from Peter Norvig!
[2] http://www.teratext.com/