Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Launch HN: Metal (YC W23) – Embeddings as a Service
196 points by tlowe11 on March 28, 2023 | hide | past | favorite | 99 comments
Hey HN! We’re Taylor, James and Sergio – the founders of Metal (https://www.getmetal.io/). You can think of Metal as embeddings as a service. We help developers use embeddings without needing to build out infrastructure, storage, or tooling. Here’s a 2-minute overview: https://www.loom.com/share/39fb6df7fd73469eaf20b37248ceed0f

If you’re unfamiliar with embeddings, they are representations of real world data expressed as a vector, where the position of the vector can be compared to other vectors – thereby deriving meaning from the data. They can be used to create things like semantic search, recommender systems, clustering analysis, classification, and more.

Working at companies like Datadog, Meta, and Spotify, we found it frustrating to build ML apps. Lack of tooling, infrastructure, and proper abstraction made working with ML tedious and slow. To get features out the door we’ve had to build data ingestion pipelines from scratch, manually maintain live customer datasets, build observability to measure drift, manage no-downtime deployments, and the list goes on. It took months to get simple features in front of users and the developer experience was terrible.

OpenAI, Hugging Face and others have brought models to the masses, but the developer experience still needs to be improved. To actually use embeddings, hitting APIs like OpenAI is just one piece of the puzzle. You also need to figure out storage, create indexes, maintain data quality through fine-tuning, manage versions, code operations on top of your data, and create APIs to consume it. All of this friction makes it a pain to ship live applications.

Metal solves these problems by providing an end-to-end platform for embeddings. Here’s how it works:

Data In: You send data to our system via our SDK or API. Data can be text, images, PDFs, or raw embeddings. When data hits our pipeline we preprocess by extracting the text from documents and chunking when necessary. We then generate embeddings using the selected model. If the index has fine-tuning transformation, we transform the embedding into the new vector space so it matches the target data. We then store the embeddings in cold storage for any needed async jobs.

From there we index the embeddings for querying. We use HSNW right now, but are planning to support FLAT indexes as well. We currently index in Redis, but plan to make this configurable and provide more options for datastores.

Data Out: We provide querying endpoints to hit the indexes, finding the ANN. For fine-tuned indexes, we generate embeddings from the base model used and then transform the embedding into the new vector space during the pre-query phase.

Additionally, we provide methods to run clustering jobs on the stored embeddings and visualizations in the UI. We are experimenting with zero-shot classification, by embedding the classes and matching to each embedding in the closest class, allowing us to provide a “classify” method in our SDK. We would love feedback on what other async job types would be useful!

Examples of what users have built so far include embedding product catalogs for improved similarity search, personalized in-app messaging with user behavior clusters, and similarity search on images for content creators.

Metal has a free tier that anyone can use, a developer tier for $20/month, and an enterprise tier with custom pricing. We’re currently building an open source product that will be released soon.

Most importantly, we’re sharing Metal with the HN community because we want to build the best developer experience possible, and the only metric we care about is live apps on prod. We’d love to hear your feedback, experiences with embeddings, and your ideas for how we can improve the product. Looking forward to your comments, thank you!



Congrats on the launch. I started exploring applications of generative AI towards technical documentation this week. I quickly realized that embeddings are a key piece of the puzzle and I can give you some initial validation: I definitely don't want to manage this stuff myself and it really seems like I shouldn't need to. Also I am comfortable with services like Firebase so your product immediately makes sense to me because I basically think of it like Firebase for embeddings.

This is probably more feature creep than you can or want to sign up for at the moment, but I also don't really want to deal with manually transforming my Markdown or HTML into the sections of text that you use as input for embeddings. It would be nice if I could just provide URLs to my live documentation or Markdown source code, and your service takes a best guess at how to split it up into sections and then generate embeddings for each of those sections.

Last, I would be happy to talk to you all about docs strategy for your own docs sometime (I'm not looking for work at the moment; I just enjoy helping people with this stuff). You can contact me and learn more about my background via the social links on https://technicalwriting.tools (a blog about technical writing tooling topics that I just spun up).

Good luck!


Great to hear! Humbled by the Firebase comparison as well :) We've talked about the markdown/HTML feature, definitely something we want to build. And I'll ping you!


Such an important problem!

I get the benefit over Pinecone (which wasn't built with LLMs, etc in mind)

How does this compare to Chroma? Feels like it has most of what you're talking about, and already has an open source product live.

https://www.trychroma.com/


> I get the benefit over Pinecone (which wasn't built with LLMs, etc in mind)

What do you mean?

Pinecone was specifically made to be used alongside LLMs and other embedding models. That’s how anyone uses Pinecone.


Chroma is awesome <3 - We have some overlap with them as we store the embeddings. But, we provide additional operations on top of the data, such as clustering/fine-tuning. We're also looking into open-sourcing some tools in the near future!


Postgres has an extension as well (pgvector). I've been using it, great performance, great scaling options (though I'm not even close to testing the limits) and gives you the full flexibility of Postgres.

It's easy enough to define a docker compose file, and deploy it to my environments.


That’s what I’m setting up now. What do you use to creat the embedding? OpenAI? Which model?


Try one of the free models on huggingface: https://huggingface.co/sentence-transformers/multi-qa-mpnet-...

You can run it on your laptop and it's free.


That will be equivalent or better than text-davinci-003?


How does it scale with the number of rows?


Hi, Regarding a product catalog usecase. Say I embed our product catalog consisting of 1000 skus., then is there a way to update a specific field in the product? A product has name, description, sku etc that doesnt change much. But it also has frequently changed info like price, quantity_available, special_offer etc. How do I update these fields only and be able answer a question that customers send to our bot like:

Do you have this product A and what the price?

which means need to get the latest price and quantity_available field.

Is this possible to do with Metal?


We don’t support this use case yet, but we could by exposing an API to update the non-filterable metadata of the records. This is a cool use case; we would love to learn more about it. Would you want to create embeddings from the product name + description and then have the other attributes returned from the search results? We are very close to supporting this; just a matter of exposing a way to update those attributes


Yes static info are mainly product name/code/description/keywords etc. Dynamic ones are price, quantity_available or similar feeds.


The value proposition is not clear to me. You don't generate the embeddings and there are already numerous vector databases. Maybe the versioning part?


Yeah I also don't understand why there are tons of (YC backed) startups providing little to none value. They basically man-in-the-middle the OpenAI GPT Platform. So... Yeah I've been running a pg_vector database with OpenAI Embeddings for 6 months now and I'm a solo hobby dev who experiments with it. Guess I could've built a startup with that knowledge L M A O


I dunno. It took like one line in conda to bring in GPU PyTorch, one for sentence-transformers, one line of Python to initialize it, one line to encode. No worries about somebody else getting data breached, acqui-hired, or struggling to find a sensible, fair, and profitable pricing model.

Clustering with sci kit-learn is… easy. Indexing in FAISS is… easy. Maybe it’s hard if you use Rust and it was hard to do this in Pythoh 5 years ago. Dilbert’s Boss probably thinks it is hard but he got fired…


You’re right! If you want to do that in a notebook, it’s pretty straightforward. But if you want to have it running in production, it’s a bit more complicated. Also, providing users with a gui to run these operations without a notebook has resonated with many less ml savvy users. Dilbert's boss probably didn't know much about ml... :)


I don’t use a notebook. I write plain Python scripts for batch jobs (run every day) and the UI is backed by aiohttp and HTMLX. I has no fear when I demoed my app in public for the first time since I’ve used it every day since the beginning of the year and it spins like a top.


Keep in mind that people out there pay a monthly fee for feature flags as a service. There’s definitely a market for OP’s product.


> feature flags as a service

and its a >$100mm/yr business :) things always get messy when you scale things up beyond a demo on one laptop


To be fair I set up a home grown solution in a “real” environment that worked fine. 10k sign ups per day, 200 requests per second. If you already have a separate analytics platform also paying for feature flags seems hard to justify in most cases.


right on! Sounds like you have a lot of the foundation for your infra setup, which is great


Congrats on the launch!

Few questions/thoughts: - What kind of overheads do you have right now with calling this API?

- What scales have you pressure-tested this with? Demo seems to show few 100s of embeddings. Selfishly, I'd like to see a demo of handling 10M+ vectors to be reasonably certain that any company can truly build infrastructure in this context. I guess I'm more interested in the out-of-core applications where I can really shove all my data in here, and see if the system can handle it.

- (dovetails with the previous one): What kind of access patterns are you seeing today, more indie developers pushing few 1000s of vectors into a DB or some heavy users pushing 100K-1M+ vectors.

- Less of a question, but one thought would be to partner with labeling companies to automatically fine-tune embeddings as part of a single embeddings-management platform.

- Would you eventually look to build your own vector DB + metadata / features stores as part of the long-term strategy or try to integrate with existing ones?


Thank you so much for your questions!

- As a managed service there are some overheads. We need to auth, validate and parse the inputs, fetch the index that is getting queried as we then need to use the index’s model to generate the embeddings. Then if the index is fine tuned/customized, we need to transform the embedding to the new vector space, to then call our vector index. We then fetch the metadata of the results from the db and parse the response to send it back.

- We’ve only tested upwards of 1M vectors ~ 1500 dimensions. But, more formal testing is required here and we plan to do so. I’m particularly curious about pg_vector and how it stacks up with other players as keeping the data central is a significant upside. We started with these lower vector indices to get something out there and iterate as a startup. But, scalability is part of what we want long term.

- We see both; we’ve had to turn down a very early lead with 100M+ vectors because it would derail other engineering efforts while we were starting. We’re now much better positioned to tackle that challenge as we have all the foundations.

- We haven’t considered this, but it’s an excellent idea. We’re currently discussing this with the team.

We would love to chat more; we appreciate your questions and feedback. Always happy to riff with someone who has seen issues around these use cases, like yourself. Feel free to reach us at founders@getmetal.io !


Thanks for your reply.

- I have a good sense of the overheads - I was more curious about the latencies (ms) you are observing with the system today.

- Out of curiosity, why did you pick Redis? Is it mostly due to familiarity and experience with it in the past? I'm curious if you foresee any challenges scaling to larger datasets due to the in-memory limitations.

- I'm assuming you're going with a usage-based model for large volumes of data managed? Do you support spinning down the service (moving things to cold-storage), and auto-scaling things back up when users actually search for things. Wondering how you're thinking about this especially if customers don't use the APIs daily.

- For the 100M+ vectors, what type of data were they dealing with, documents, images or something else?

Thanks!


EDIT: never mind, I didn't read your whole post, looks like you guys are working on an opensource option. Great!

Metal looks awesome. I've been comparing vector db solutions so your simple/abstracted sdk looks awesome. One thing I'd mention is with a solution like this that could be so critical to an apps functionality (and therefore so integrated into various parts of the app) I'd love to see that your team is vowing to give some sort of opensource self-hosted option. I want to root for any startup that is letting devs move faster in this area but there's a fear of committing to a solution that may pivot or be acquired/discontinued. Maybe even vowing a "safe-exit" for customers like I think rethinkdb did.

Good luck, looks awesome!


We agree with the sentiment; we’re currently figuring out the pieces we want to open source, as much of it is just infra (like the ingest pipeline). But the search server and some of our future work around memory will get open-sourced first.


This is similar to Pinecone/milvus, correct? What's the advantages of this compared to Pinecone/milvus?


We see ourselves a layer above vectorDB; we use Redis to index the data. We focused on building the ingest pipeline and operations on top of the embeddings, such as clustering and fine-tuning (embedding customization). Ultimately we want to provide the best developer experience possible, and we believe much work is needed here!


haha. That case you might actually wanna consider FAISS/Milvus instead of Redis.


We’ve looked into FAISS and Milvus. Milvus is possibly an excellent option for us in the future. What’s your experience with these so far?


Great to hear that you're considering Milvus. Feel free to reach out if you ever have any questions/comments/concerns.

Just took a look at your docs and product page as well. Keep up the great work!


I tried out milvus. Developer Experience is crap. Documentation lacks some major core concepts. I've been experimenting with it for hours. Eventually I turned my back and said: Why not use pg_vector and scale the fuck out of the cluster? That should bring.. equal performance, as the pg_vector implementation is written in c and the comparing algorithms wouldn't differ too much from milvus.


hnswlib? Best of the bunch imho


I think those assume you already have the embedding vector calculated, and they just store and retrieve the vectors.


This is going to make searching for ML things like converting LLM or transformers for "metal" a wonderful experience.

The Metal framework that was announced at WWDC 2014 for iOS and at WWDC 2015 also for OS X and tvOS. Metal is an interface for programming the Graphics Processing Unit (GPU) in your computer. The main advantages of using Metal are:

- provides the lowest overhead access to the GPU, hence it reduces all bottlenecks usually caused by data transferring between the CPU and GPU in other frameworks.

- provides up to 10 times the number of draw calls compared to OpenGL. Metal, however, is not cross-platform like OpenGL is, so it is not meant to be a replacement for OpenGL.

- allows to also run compute applications with performance levels comparable to similar technologies such as CUDA or OpenCL.

- has a custom shader language that allows shaders precompiling so they are a lot faster at run time. has built-in memory and resource management particularized to these platforms.

https://github.com/MetalKit/metal

https://github.com/MetalPetal/MetalPetal

https://github.com/tlkh/tf-metal-experiments

https://github.com/alexiscn/MetalFilters

Etc.


What are your plans for providing some additional metadata except for embeddings? Semantic search often requires additional filtering, as vectors are not all we need. At Qdrant we have a unique mechanism for incorporating metadata filters into HNSW, so they might be applied during vector search phase (no pre- or post-filtering required): https://qdrant.tech/documentation/indexing/#filtrable-index


Qdrant is awesome :). Redis also supports metadata filtering we’re currently building. We are considering adding a different data store option and Qdrant might be our next choice.


Love the idea and I’ve been looking for something like this. I wrongly assumed that Pinecone offered exactly this and was disappointed to realize that I had to figure out the embedding generation myself.

I am yet to completely explore your website, but do you by any chance let me export the generated embeddings to manage them using say Pinecone?

Also, any chance you guys plan to integrate OCR tools in your pipeline? Say I have images of text, which I know is text and don’t want to use a inage model for generating embeddings.


Thank you! We have an OCR pipeline already so you can upload the files and we’ll process them, chunk the text, create the embeddings and index them. Right now, we support PDFs, but the pipeline is ready to accept images as well. We’re opening those file types this week!


Super cool!

I'm curious, does Metal's version support do anything to solve the problem of "I originally embedded with model A, but now I'd like to take my same data and re-embed with a new model B"? I've heard from others this is a pain point and I've experienced it myself - it feels like there would be some value in storing the embeddings' source data in the cloud to one-click re-embed as well.


Hey! We do support multiple versions of an Index under an App. When you fine-tune an embedding, we autogenerate the new embeddings for the entire dataset into a unique index. We store the raw data uploaded to our system via text or file imports. Although we don’t allow you to easily re-embed this data today, we have this on the roadmap!


Super cool product! In general, peeling off infrastructure costs is always a good idea. And it would be really cool to have different places that keep a pulse on SOTA. I recently discovered instructor-xl performs better than openai's ada in some cases!

https://huggingface.co/spaces/mteb/leaderboard


Thank you! We’ve looked into instructor-xl, and it’s really awesome! We also accept custom embeddings, allowing developers to use whatever model they want. But we want to keep adding models to allow for better experimentation.


I’ve been working extensively with embeddings (LLM generated) for the last 3 years, and the problems your product seem to solve have not been any big pain points for me. If you want to discuss other pains related to embs I’m available in DMs.


Hey! I appreciate the comment, and we would love to hear about other pains you've encountered. I can't find a way to DM on HN, but please email us at founders@getmetal.io, and we can connect there!


Hi! (not a member of Metal) - I am curious about your big pain points. Happy to chat on twitter/email (doesn't appear to be any contact information in your profile).

Thanks!


For the application of search/retrieval, it would be great if you can surface logs and insights into what people are searching for, and even what areas of your data is missing based on searches


Yes! We've been thinking about techniques like pushing the queried embeddings into a datastore to detect anomalies and track outliers. Provide some insights there.


Super interesting - might be an opportunity for a Weaviate module as well (Weaviate modules take care of vectorization but are model agnostic)


100% - really admire all the work you're doing with Weaviate. Will reach out!


Big fan of Weaviate.


Very interesting project, congratulations on the launch! I've been playing with embedding search/clustering on larger documents, and I find that segmentation strategies can be quite tricky and heavily impact results. Do you offer any segmentation strategies via API, or do you expect this potentially personalized feature will be handled by devs on their own servers?


We don’t offer this through the API, yet! You can however run clustering in the UI. We are working on exposing classification so that you can generate clusters on specific topics. We plan to offer both in the API within the next week or two!


This sounds less like Embeddings as a Service and more like Semantic Search (which happens to be using embeddings) as a Service.


Search is one use case we support, but you can perform a few other operations on your data, like clustering or fine-tuning. We're also working on a classification feature. Are there other async jobs you'd like to see?


The problem I'd like solved is that when I want to retrieve chunks of data for retrieval augmented generation, it's challenging to optimize the choice of embeddings model, chunking strategy, and overall retrieval algorithm. I'm not sure if that's the sort of problem you're focused on.


We agree; this is precisely the problem area we’re focusing on!! We’re currently working on the ability for users to specify chunking strategies while providing a ton of guidance on this selection based on their particular data.


In addition to the choices for how to chunk (i.e. defining chunk size, chunk boundaries, chunk overlap, etc.), there's also the question of what actually gets returned once finding the chunks that match. For example, perhaps I have a document with 100 1-page sections where each section is broken into roughly 5 chunks. I may get optimal performance in my RAG application not by retrieving the top K chunks from the index, but rather by returning the top K sections fom the document, where sections might be scored based on the number and scores of child chunks. It also might be useful to incorporate section summaries, etc., in the retrieval process.


This is great, and that makes a ton of sense! Would you want to define + experiment with these various configurations yourself explicitly, or would you expect a system to determine this automatically? I like the concept of rolling-up chunk scores!


if you want some more options (chunking, models, +more) check here https://github.com/marqo-ai/marqo and an example for RAG using context aware trimming of text for fitting into context windows https://github.com/marqo-ai/marqo/blob/mainline/examples/GPT...


Wouldn't it be better to partner with an existing managed cloud provider like Pinecone or Qdrant? Why Redis at all? :-0


Redis provides indexes for vector similarity. And we have a lot of experience with Redis. We see a future where we can offer more than one datastore, and we’ve been considering Qdrant as the next datastore to support.


You should look at Lucene core - they have incorporated vector embeddings in 9.4.x and it could provide you better scale than Redis with durability as well.

https://lucene.apache.org/core/9_4_2/demo/index.html


Congrats on launching! Does Metal compete with https://github.com/openai/chatgpt-retrieval-plugin or does it provide a different value?


There’s some overlap with information retrieval for chat GPT applications. As a managed service, we handle all of the infrastructure and maintenance. Also, we support additional use cases for web applications/backends, such as clustering and fine-tuning. We’re also working on an open-source alternative to the retrieval plugin.


I’m wondering about YC’s series of investments in this area

How many of these new AI companies will stick?


Great question; while it’s still super early, we believe that some of the most critical problems to solve will involve making current APIs compatible with AI use cases. Products like ChatGPT Plugins are game changers, but they will still be limited by the APIs they interact with.


Do you support custom or fine-tuned models for generating the embeddings?


Yes, we do! We allow users to run `metal.tune` to determine whether two vectors should be close to each other. Then we use that to recalculate the embeddings similar to the customized embeddings cookbook from OpenAI. Then the queries get embedded and transformed into the same space.


Looked at the docs. It looks like yes!


Most important question: what is your favorite metal album currently, and why is it Periphery V? :)

Had to ask, noticed all the metal references in the docs.


Fantastic question. It's a few years old but I still love Infest the Rats Nest by King Gizzard. Especially awesome considering they're not a full time metal band. I'll give Periphery V a spin tonight!! Thanks for sharing :D


Rust in Peace by Megadeth.


Wow, seems like YC is backing anything these days


Looks cool, does it work with langchain? If so suggest a short tutorial and video showing how to latch onto the buzz of that offering.


We love langchain! That’s a great idea – we want to provide examples using langchain and look into ways to better integrate into libraries like this.


I stopped at "send data to our system"


We have some open-source tooling in the works! :) We understand that some users are sensitive to managed services, we’re starting with this, but we’re planning to open source tools to improve developer experience around information retrieval and memory.


How would you say your product compares to Pinecone, GCP's Matching Engine or any other product in the space?


It does compare with them, but we want to lower the barrier of entry for any developer to build features that use embeddings. So we want to give regular software engineers superpowers in providing this technology within their stack and out of the box offering the infrastructure and high-level APIs to run operations on top of the vector db.


Great demo video - I like the focus on being open an flexible, knowing how much will change in the next year.


So, a vector store/vector db?


We store the vectors, but we also provide additional operations that would require additional code/infra if you just use a vectorDB. We also have the infrastructure in place to ingest all the data, generate the embeddings (we also take raw embeddings), and provide APIs for fine-tuning and clustering. Another big difference coming soon is index versioning, allowing developers to test multiple models/embeddings.


So are you spinning up a redis instance/container per tenant?


Why redis instead of a specialized database like faiss?


Redis provides indexes for vector similarity. And we have a lot of experience with Redis. We have plans to expand into offering other data stores, like Qdrant


What method do you use to cluster? HDBSCAN?


We will be adding hdbscan in the coming days! Right now we only offer kmeans but for dimensionality reduction we offer pca, tnse, & pca .


As if googling for Apple's 3D API documentation wasn't already hard enough ;)


The trademark infringement claims are serious. Metal is more than just 3d graphics framework. It is a general purpose parallel computing framework, and this application would very much fall within the purview of its trademark. E.g. if you were going to implement an embedding based classifier on iOS/MacOS you would most likely use compute shaders written in Metal. The fact that the website styles are almost identical down the color palette doesn't help the case: https://www.getmetal.io https://developer.apple.com/metal/


Hey! I'd love to understand what you're referring to with this



Whoa! Thanks for sharing -- we haven't seen this!


Please note that this is not a snark, but am genuinely curious since you're a YC company - didn't anyone from YC or from the YC network point you to that?

I'd hoped that proper product naming, and avoiding such minefields, be one of the things someone from YC or YC network would help/advise or at least give input on.


I think names don't really matter that much in the grand scheme of things, short of being catastrophically bad. Bonus points if you can get the single word .com at some point, bonus points if it's memorable, but you can always rebrand down the road and of the list of things to worry about, I don't think it's very high. Certainly not a minefield.


In general, yes I agree.

However, in some cases it can indeed be an issue when there is potential conflict with some very litigious companies.

Edit: I have no idea if it will be an issue in this case or not, but given Apple and similar domain (AI/ML), it may be an issue.


Apple is famously protective of its trademarks against small software companies.

I forget the details so I can’t Google it, but twenty years ago there was a case where a Mac developer had a name collision with an Apple product, emailed Steve Jobs, and he replied with “No big deal, change the name.” — the little guy was expected to bear the burden of coming up with a new brand, but Jobs was (in his own view) kind enough not to sue.


‘metal’ is a trademark infringement lawsuit just waiting to happen. It’s a super-generic name that people are going to confuse with something else.

I use code names for projects like that but I would never name a company something I couldn’t get the the domain for without some prefix attached.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: