Launch HN: Metal (YC W23) – Embeddings as a Service

kaycebasques · on March 29, 2023

Congrats on the launch. I started exploring applications of generative AI towards technical documentation this week. I quickly realized that embeddings are a key piece of the puzzle and I can give you some initial validation: I definitely don't want to manage this stuff myself and it really seems like I shouldn't need to. Also I am comfortable with services like Firebase so your product immediately makes sense to me because I basically think of it like Firebase for embeddings.

This is probably more feature creep than you can or want to sign up for at the moment, but I also don't really want to deal with manually transforming my Markdown or HTML into the sections of text that you use as input for embeddings. It would be nice if I could just provide URLs to my live documentation or Markdown source code, and your service takes a best guess at how to split it up into sections and then generate embeddings for each of those sections.

Last, I would be happy to talk to you all about docs strategy for your own docs sometime (I'm not looking for work at the moment; I just enjoy helping people with this stuff). You can contact me and learn more about my background via the social links on https://technicalwriting.tools (a blog about technical writing tooling topics that I just spun up).

Good luck!

tlowe11 · on March 29, 2023

Great to hear! Humbled by the Firebase comparison as well :) We've talked about the markdown/HTML feature, definitely something we want to build. And I'll ping you!

billybones · on March 28, 2023

Such an important problem!

I get the benefit over Pinecone (which wasn't built with LLMs, etc in mind)

How does this compare to Chroma? Feels like it has most of what you're talking about, and already has an open source product live.

https://www.trychroma.com/

gk1 · on March 28, 2023

> I get the benefit over Pinecone (which wasn't built with LLMs, etc in mind)

What do you mean?

Pinecone was specifically made to be used alongside LLMs and other embedding models. That’s how anyone uses Pinecone.

jxodwyer1 · on March 28, 2023

Chroma is awesome <3 - We have some overlap with them as we store the embeddings. But, we provide additional operations on top of the data, such as clustering/fine-tuning. We're also looking into open-sourcing some tools in the near future!

swalsh · on March 28, 2023

Postgres has an extension as well (pgvector). I've been using it, great performance, great scaling options (though I'm not even close to testing the limits) and gives you the full flexibility of Postgres.

It's easy enough to define a docker compose file, and deploy it to my environments.

sroussey · on March 28, 2023

That’s what I’m setting up now. What do you use to creat the embedding? OpenAI? Which model?

computerex · on March 28, 2023

Try one of the free models on huggingface: https://huggingface.co/sentence-transformers/multi-qa-mpnet-...

You can run it on your laptop and it's free.

sroussey · on March 29, 2023

That will be equivalent or better than text-davinci-003?

abyesilyurt · on March 28, 2023

How does it scale with the number of rows?

meekaaku · on March 28, 2023

Hi, Regarding a product catalog usecase. Say I embed our product catalog consisting of 1000 skus., then is there a way to update a specific field in the product? A product has name, description, sku etc that doesnt change much. But it also has frequently changed info like price, quantity_available, special_offer etc. How do I update these fields only and be able answer a question that customers send to our bot like:

Do you have this product A and what the price?

which means need to get the latest price and quantity_available field.

Is this possible to do with Metal?

jxodwyer1 · on March 28, 2023

We don’t support this use case yet, but we could by exposing an API to update the non-filterable metadata of the records. This is a cool use case; we would love to learn more about it. Would you want to create embeddings from the product name + description and then have the other attributes returned from the search results? We are very close to supporting this; just a matter of exposing a way to update those attributes

meekaaku · on March 28, 2023

Yes static info are mainly product name/code/description/keywords etc. Dynamic ones are price, quantity_available or similar feeds.

esafak · on March 29, 2023

The value proposition is not clear to me. You don't generate the embeddings and there are already numerous vector databases. Maybe the versioning part?

AmazingTurtle · on March 29, 2023

Yeah I also don't understand why there are tons of (YC backed) startups providing little to none value. They basically man-in-the-middle the OpenAI GPT Platform. So... Yeah I've been running a pg_vector database with OpenAI Embeddings for 6 months now and I'm a solo hobby dev who experiments with it. Guess I could've built a startup with that knowledge L M A O

PaulHoule · on March 28, 2023

I dunno. It took like one line in conda to bring in GPU PyTorch, one for sentence-transformers, one line of Python to initialize it, one line to encode. No worries about somebody else getting data breached, acqui-hired, or struggling to find a sensible, fair, and profitable pricing model.

Clustering with sci kit-learn is… easy. Indexing in FAISS is… easy. Maybe it’s hard if you use Rust and it was hard to do this in Pythoh 5 years ago. Dilbert’s Boss probably thinks it is hard but he got fired…

jxodwyer1 · on March 28, 2023

You’re right! If you want to do that in a notebook, it’s pretty straightforward. But if you want to have it running in production, it’s a bit more complicated. Also, providing users with a gui to run these operations without a notebook has resonated with many less ml savvy users. Dilbert's boss probably didn't know much about ml... :)

PaulHoule · on March 28, 2023

I don’t use a notebook. I write plain Python scripts for batch jobs (run every day) and the UI is backed by aiohttp and HTMLX. I has no fear when I demoed my app in public for the first time since I’ve used it every day since the beginning of the year and it spins like a top.

teaearlgraycold · on March 28, 2023

Keep in mind that people out there pay a monthly fee for feature flags as a service. There’s definitely a market for OP’s product.

swyx · on March 28, 2023

> feature flags as a service

and its a >$100mm/yr business :) things always get messy when you scale things up beyond a demo on one laptop

teaearlgraycold · on March 29, 2023

To be fair I set up a home grown solution in a “real” environment that worked fine. 10k sign ups per day, 200 requests per second. If you already have a separate analytics platform also paying for feature flags seems hard to justify in most cases.

jxodwyer1 · on March 28, 2023

right on! Sounds like you have a lot of the foundation for your infra setup, which is great

fzysingularity · on March 28, 2023

Congrats on the launch!

Few questions/thoughts: - What kind of overheads do you have right now with calling this API?

- What scales have you pressure-tested this with? Demo seems to show few 100s of embeddings. Selfishly, I'd like to see a demo of handling 10M+ vectors to be reasonably certain that any company can truly build infrastructure in this context. I guess I'm more interested in the out-of-core applications where I can really shove all my data in here, and see if the system can handle it.

- (dovetails with the previous one): What kind of access patterns are you seeing today, more indie developers pushing few 1000s of vectors into a DB or some heavy users pushing 100K-1M+ vectors.

- Less of a question, but one thought would be to partner with labeling companies to automatically fine-tune embeddings as part of a single embeddings-management platform.

- Would you eventually look to build your own vector DB + metadata / features stores as part of the long-term strategy or try to integrate with existing ones?

jxodwyer1 · on March 28, 2023

Thank you so much for your questions!

- As a managed service there are some overheads. We need to auth, validate and parse the inputs, fetch the index that is getting queried as we then need to use the index’s model to generate the embeddings. Then if the index is fine tuned/customized, we need to transform the embedding to the new vector space, to then call our vector index. We then fetch the metadata of the results from the db and parse the response to send it back.

- We’ve only tested upwards of 1M vectors ~ 1500 dimensions. But, more formal testing is required here and we plan to do so. I’m particularly curious about pg_vector and how it stacks up with other players as keeping the data central is a significant upside. We started with these lower vector indices to get something out there and iterate as a startup. But, scalability is part of what we want long term.

- We see both; we’ve had to turn down a very early lead with 100M+ vectors because it would derail other engineering efforts while we were starting. We’re now much better positioned to tackle that challenge as we have all the foundations.

- We haven’t considered this, but it’s an excellent idea. We’re currently discussing this with the team.

We would love to chat more; we appreciate your questions and feedback. Always happy to riff with someone who has seen issues around these use cases, like yourself. Feel free to reach us at founders@getmetal.io !

fzysingularity · on March 29, 2023

Thanks for your reply.

- I have a good sense of the overheads - I was more curious about the latencies (ms) you are observing with the system today.

- Out of curiosity, why did you pick Redis? Is it mostly due to familiarity and experience with it in the past? I'm curious if you foresee any challenges scaling to larger datasets due to the in-memory limitations.

- I'm assuming you're going with a usage-based model for large volumes of data managed? Do you support spinning down the service (moving things to cold-storage), and auto-scaling things back up when users actually search for things. Wondering how you're thinking about this especially if customers don't use the APIs daily.

- For the 100M+ vectors, what type of data were they dealing with, documents, images or something else?

Thanks!

jamesmcintyre · on March 28, 2023

EDIT: never mind, I didn't read your whole post, looks like you guys are working on an opensource option. Great!

Metal looks awesome. I've been comparing vector db solutions so your simple/abstracted sdk looks awesome. One thing I'd mention is with a solution like this that could be so critical to an apps functionality (and therefore so integrated into various parts of the app) I'd love to see that your team is vowing to give some sort of opensource self-hosted option. I want to root for any startup that is letting devs move faster in this area but there's a fear of committing to a solution that may pivot or be acquired/discontinued. Maybe even vowing a "safe-exit" for customers like I think rethinkdb did.

Good luck, looks awesome!

jxodwyer1 · on March 28, 2023

We agree with the sentiment; we’re currently figuring out the pieces we want to open source, as much of it is just infra (like the ingest pipeline). But the search server and some of our future work around memory will get open-sourced first.

m1117 · on March 28, 2023

This is similar to Pinecone/milvus, correct? What's the advantages of this compared to Pinecone/milvus?

jxodwyer1 · on March 28, 2023

We see ourselves a layer above vectorDB; we use Redis to index the data. We focused on building the ingest pipeline and operations on top of the embeddings, such as clustering and fine-tuning (embedding customization). Ultimately we want to provide the best developer experience possible, and we believe much work is needed here!

ChocoluvH · on March 28, 2023

haha. That case you might actually wanna consider FAISS/Milvus instead of Redis.

jxodwyer1 · on March 28, 2023

We’ve looked into FAISS and Milvus. Milvus is possibly an excellent option for us in the future. What’s your experience with these so far?

fzliu · on March 28, 2023

Great to hear that you're considering Milvus. Feel free to reach out if you ever have any questions/comments/concerns.

Just took a look at your docs and product page as well. Keep up the great work!

AmazingTurtle · on March 29, 2023

I tried out milvus. Developer Experience is crap. Documentation lacks some major core concepts. I've been experimenting with it for hours. Eventually I turned my back and said: Why not use pg_vector and scale the fuck out of the cluster? That should bring.. equal performance, as the pg_vector implementation is written in c and the comparing algorithms wouldn't differ too much from milvus.

leobg · on March 28, 2023

hnswlib? Best of the bunch imho

Ozzie_osman · on March 28, 2023

I think those assume you already have the embedding vector calculated, and they just store and retrieve the vectors.

Terretta · on March 30, 2023

This is going to make searching for ML things like converting LLM or transformers for "metal" a wonderful experience.

The Metal framework that was announced at WWDC 2014 for iOS and at WWDC 2015 also for OS X and tvOS. Metal is an interface for programming the Graphics Processing Unit (GPU) in your computer. The main advantages of using Metal are:

- provides the lowest overhead access to the GPU, hence it reduces all bottlenecks usually caused by data transferring between the CPU and GPU in other frameworks.

- provides up to 10 times the number of draw calls compared to OpenGL. Metal, however, is not cross-platform like OpenGL is, so it is not meant to be a replacement for OpenGL.

- allows to also run compute applications with performance levels comparable to similar technologies such as CUDA or OpenCL.

- has a custom shader language that allows shaders precompiling so they are a lot faster at run time. has built-in memory and resource management particularized to these platforms.

https://github.com/MetalKit/metal

https://github.com/MetalPetal/MetalPetal

https://github.com/tlkh/tf-metal-experiments

https://github.com/alexiscn/MetalFilters

Etc.

kacperlukawski · on March 28, 2023

What are your plans for providing some additional metadata except for embeddings? Semantic search often requires additional filtering, as vectors are not all we need. At Qdrant we have a unique mechanism for incorporating metadata filters into HNSW, so they might be applied during vector search phase (no pre- or post-filtering required): https://qdrant.tech/documentation/indexing/#filtrable-index

jxodwyer1 · on March 28, 2023

Qdrant is awesome :). Redis also supports metadata filtering we’re currently building. We are considering adding a different data store option and Qdrant might be our next choice.

alsodumb · on March 28, 2023

Love the idea and I’ve been looking for something like this. I wrongly assumed that Pinecone offered exactly this and was disappointed to realize that I had to figure out the embedding generation myself.

I am yet to completely explore your website, but do you by any chance let me export the generated embeddings to manage them using say Pinecone?

Also, any chance you guys plan to integrate OCR tools in your pipeline? Say I have images of text, which I know is text and don’t want to use a inage model for generating embeddings.

tlowe11 · on March 28, 2023

Thank you! We have an OCR pipeline already so you can upload the files and we’ll process them, chunk the text, create the embeddings and index them. Right now, we support PDFs, but the pipeline is ready to accept images as well. We’re opening those file types this week!

bcjordan · on March 28, 2023

Super cool!

I'm curious, does Metal's version support do anything to solve the problem of "I originally embedded with model A, but now I'd like to take my same data and re-embed with a new model B"? I've heard from others this is a pain point and I've experienced it myself - it feels like there would be some value in storing the embeddings' source data in the cloud to one-click re-embed as well.

jxodwyer1 · on March 28, 2023

Hey! We do support multiple versions of an Index under an App. When you fine-tune an embedding, we autogenerate the new embeddings for the entire dataset into a unique index. We store the raw data uploaded to our system via text or file imports. Although we don’t allow you to easily re-embed this data today, we have this on the roadmap!

yacine_ · on March 28, 2023

Super cool product! In general, peeling off infrastructure costs is always a good idea. And it would be really cool to have different places that keep a pulse on SOTA. I recently discovered instructor-xl performs better than openai's ada in some cases!

https://huggingface.co/spaces/mteb/leaderboard

jxodwyer1 · on March 28, 2023

Thank you! We’ve looked into instructor-xl, and it’s really awesome! We also accept custom embeddings, allowing developers to use whatever model they want. But we want to keep adding models to allow for better experimentation.

hallqv · on March 28, 2023

I’ve been working extensively with embeddings (LLM generated) for the last 3 years, and the problems your product seem to solve have not been any big pain points for me. If you want to discuss other pains related to embs I’m available in DMs.

jxodwyer1 · on March 28, 2023

Hey! I appreciate the comment, and we would love to hear about other pains you've encountered. I can't find a way to DM on HN, but please email us at founders@getmetal.io, and we can connect there!

infrawhispers · on March 28, 2023

Hi! (not a member of Metal) - I am curious about your big pain points. Happy to chat on twitter/email (doesn't appear to be any contact information in your profile).

Thanks!

fudged71 · on March 28, 2023

For the application of search/retrieval, it would be great if you can surface logs and insights into what people are searching for, and even what areas of your data is missing based on searches

sergioprada · on March 29, 2023

Yes! We've been thinking about techniques like pushing the queried embeddings into a datastore to detect anomalies and track outliers. Provide some insights there.

bobvanluijt · on March 29, 2023

Super interesting - might be an opportunity for a Weaviate module as well (Weaviate modules take care of vectorization but are model agnostic)

sergioprada · on March 29, 2023

100% - really admire all the work you're doing with Weaviate. Will reach out!

clark-kent · on March 30, 2023

Big fan of Weaviate.

correlator · on March 28, 2023

Very interesting project, congratulations on the launch! I've been playing with embedding search/clustering on larger documents, and I find that segmentation strategies can be quite tricky and heavily impact results. Do you offer any segmentation strategies via API, or do you expect this potentially personalized feature will be handled by devs on their own servers?

jxodwyer1 · on March 28, 2023

We don’t offer this through the API, yet! You can however run clustering in the UI. We are working on exposing classification so that you can generate clusters on specific topics. We plan to offer both in the API within the next week or two!

crosen99 · on March 28, 2023

This sounds less like Embeddings as a Service and more like Semantic Search (which happens to be using embeddings) as a Service.

jxodwyer1 · on March 28, 2023

Search is one use case we support, but you can perform a few other operations on your data, like clustering or fine-tuning. We're also working on a classification feature. Are there other async jobs you'd like to see?

crosen99 · on March 28, 2023

The problem I'd like solved is that when I want to retrieve chunks of data for retrieval augmented generation, it's challenging to optimize the choice of embeddings model, chunking strategy, and overall retrieval algorithm. I'm not sure if that's the sort of problem you're focused on.

jxodwyer1 · on March 28, 2023

We agree; this is precisely the problem area we’re focusing on!! We’re currently working on the ability for users to specify chunking strategies while providing a ton of guidance on this selection based on their particular data.

crosen99 · on March 28, 2023

In addition to the choices for how to chunk (i.e. defining chunk size, chunk boundaries, chunk overlap, etc.), there's also the question of what actually gets returned once finding the chunks that match. For example, perhaps I have a document with 100 1-page sections where each section is broken into roughly 5 chunks. I may get optimal performance in my RAG application not by retrieving the top K chunks from the index, but rather by returning the top K sections fom the document, where sections might be scored based on the number and scores of child chunks. It also might be useful to incorporate section summaries, etc., in the retrieval process.

jxodwyer1 · on March 28, 2023

This is great, and that makes a ton of sense! Would you want to define + experiment with these various configurations yourself explicitly, or would you expect a system to determine this automatically? I like the concept of rolling-up chunk scores!

jn2clark · on March 29, 2023

if you want some more options (chunking, models, +more) check here https://github.com/marqo-ai/marqo and an example for RAG using context aware trimming of text for fitting into context windows https://github.com/marqo-ai/marqo/blob/mainline/examples/GPT...

qwick23 · on March 28, 2023

Wouldn't it be better to partner with an existing managed cloud provider like Pinecone or Qdrant? Why Redis at all? :-0

jxodwyer1 · on March 28, 2023

Redis provides indexes for vector similarity. And we have a lot of experience with Redis. We see a future where we can offer more than one datastore, and we’ve been considering Qdrant as the next datastore to support.

crawdog · on March 28, 2023

You should look at Lucene core - they have incorporated vector embeddings in 9.4.x and it could provide you better scale than Redis with durability as well.

https://lucene.apache.org/core/9_4_2/demo/index.html

howon92 · on March 28, 2023

Congrats on launching! Does Metal compete with https://github.com/openai/chatgpt-retrieval-plugin or does it provide a different value?

jxodwyer1 · on March 28, 2023

There’s some overlap with information retrieval for chat GPT applications. As a managed service, we handle all of the infrastructure and maintenance. Also, we support additional use cases for web applications/backends, such as clustering and fine-tuning. We’re also working on an open-source alternative to the retrieval plugin.

ushakov · on March 28, 2023

I’m wondering about YC’s series of investments in this area

How many of these new AI companies will stick?

jxodwyer1 · on March 28, 2023

Great question; while it’s still super early, we believe that some of the most critical problems to solve will involve making current APIs compatible with AI use cases. Products like ChatGPT Plugins are game changers, but they will still be limited by the APIs they interact with.

Ozzie_osman · on March 28, 2023

Do you support custom or fine-tuned models for generating the embeddings?

jxodwyer1 · on March 28, 2023

Yes, we do! We allow users to run `metal.tune` to determine whether two vectors should be close to each other. Then we use that to recalculate the embeddings similar to the customized embeddings cookbook from OpenAI. Then the queries get embedded and transformed into the same space.

Ozzie_osman · on March 28, 2023

Looked at the docs. It looks like yes!

mattgreenrocks · on March 29, 2023

Most important question: what is your favorite metal album currently, and why is it Periphery V? :)

Had to ask, noticed all the metal references in the docs.

tlowe11 · on March 29, 2023

Fantastic question. It's a few years old but I still love Infest the Rats Nest by King Gizzard. Especially awesome considering they're not a full time metal band. I'll give Periphery V a spin tonight!! Thanks for sharing :D

sergioprada · on March 29, 2023

Rust in Peace by Megadeth.

AmazingTurtle · on March 29, 2023

Wow, seems like YC is backing anything these days

monkeydust · on March 28, 2023

Looks cool, does it work with langchain? If so suggest a short tutorial and video showing how to latch onto the buzz of that offering.

jxodwyer1 · on March 28, 2023

We love langchain! That’s a great idea – we want to provide examples using langchain and look into ways to better integrate into libraries like this.

qwertyuiop_ · on March 28, 2023

I stopped at "send data to our system"

jxodwyer1 · on March 28, 2023

We have some open-source tooling in the works! :) We understand that some users are sensitive to managed services, we’re starting with this, but we’re planning to open source tools to improve developer experience around information retrieval and memory.

modernpink · on March 28, 2023

How would you say your product compares to Pinecone, GCP's Matching Engine or any other product in the space?

jxodwyer1 · on March 28, 2023

It does compare with them, but we want to lower the barrier of entry for any developer to build features that use embeddings. So we want to give regular software engineers superpowers in providing this technology within their stack and out of the box offering the infrastructure and high-level APIs to run operations on top of the vector db.

pbmango · on March 28, 2023

Great demo video - I like the focus on being open an flexible, knowing how much will change in the next year.

jiwidi · on March 28, 2023

So, a vector store/vector db?

jxodwyer1 · on March 28, 2023

We store the vectors, but we also provide additional operations that would require additional code/infra if you just use a vectorDB. We also have the infrastructure in place to ingest all the data, generate the embeddings (we also take raw embeddings), and provide APIs for fine-tuning and clustering. Another big difference coming soon is index versioning, allowing developers to test multiple models/embeddings.

dchuk · on March 29, 2023

So are you spinning up a redis instance/container per tenant?

PaulHoule · on March 28, 2023

Why redis instead of a specialized database like faiss?

jxodwyer1 · on March 28, 2023

Redis provides indexes for vector similarity. And we have a lot of experience with Redis. We have plans to expand into offering other data stores, like Qdrant

youssefabdelm · on March 28, 2023

What method do you use to cluster? HDBSCAN?

jxodwyer1 · on March 28, 2023

We will be adding hdbscan in the coming days! Right now we only offer kmeans but for dimensionality reduction we offer pca, tnse, & pca .

flohofwoe · on March 28, 2023

As if googling for Apple's 3D API documentation wasn't already hard enough ;)

blululu · on March 28, 2023

The trademark infringement claims are serious. Metal is more than just 3d graphics framework. It is a general purpose parallel computing framework, and this application would very much fall within the purview of its trademark. E.g. if you were going to implement an embedding based classifier on iOS/MacOS you would most likely use compute shaders written in Metal. The fact that the website styles are almost identical down the color palette doesn't help the case: https://www.getmetal.io https://developer.apple.com/metal/

jxodwyer1 · on March 28, 2023

Hey! I'd love to understand what you're referring to with this

arthurcolle · on March 28, 2023

https://developer.apple.com/metal/

jxodwyer1 · on March 28, 2023

Whoa! Thanks for sharing -- we haven't seen this!

yumraj · on March 28, 2023

Please note that this is not a snark, but am genuinely curious since you're a YC company - didn't anyone from YC or from the YC network point you to that?

I'd hoped that proper product naming, and avoiding such minefields, be one of the things someone from YC or YC network would help/advise or at least give input on.

stuartjohnson12 · on March 28, 2023

I think names don't really matter that much in the grand scheme of things, short of being catastrophically bad. Bonus points if you can get the single word .com at some point, bonus points if it's memorable, but you can always rebrand down the road and of the list of things to worry about, I don't think it's very high. Certainly not a minefield.

yumraj · on March 28, 2023

In general, yes I agree.

However, in some cases it can indeed be an issue when there is potential conflict with some very litigious companies.

Edit: I have no idea if it will be an issue in this case or not, but given Apple and similar domain (AI/ML), it may be an issue.

pavlov · on March 28, 2023

Apple is famously protective of its trademarks against small software companies.

I forget the details so I can’t Google it, but twenty years ago there was a case where a Mac developer had a name collision with an Apple product, emailed Steve Jobs, and he replied with “No big deal, change the name.” — the little guy was expected to bear the burden of coming up with a new brand, but Jobs was (in his own view) kind enough not to sue.

PaulHoule · on March 28, 2023

‘metal’ is a trademark infringement lawsuit just waiting to happen. It’s a super-generic name that people are going to confuse with something else.

I use code names for projects like that but I would never name a company something I couldn’t get the the domain for without some prefix attached.