What if your product simply stores a lot of data (ie a search engine) How is tha...

belak · on Aug 2, 2022

That's fair - I added "are working on a specific problem which needs a more complicated setup" to my original comment as a nicer way of referring to edge cases like search engines. I still believe that 99% of applications would function perfectly fine with a single primary DB.

zasdffaa · on Aug 2, 2022

Depends what you mean by a database I guess. I take it to mean an RDBMS.

RDBMSs provide guarantees that web searching doesn't need. You can afford to lose a pieces of data, provide not-quite-perfect results for web stuff. It's just wrong for an RDBMS.

altdataseller · on Aug 2, 2022

What if you are using the database as a system of record to index into a real search engine like Elasticsearch? For a product where you have tons of data to search from (ie text from web pages)

IggleSniggle · on Aug 2, 2022

In regards to Elasticsearch, you basically opt-in to which behavior you want/need. You end up in the same place: potentially losing some data points or introducing some "fuzziness" to the results in exchange for speed. When you ask Elasticsearch to behave in a guaranteed atomic manner across all records, performing locks on data, you end up with similar constraints as in a RDBMS.

Elasticsearch is for search.

If you're asking about "what if you use an RDBMS as a pointer to Elasticsearch" then I guess I would ask: why would you do this? Elasticsearch can be used as a system of record. You could use an RDBMS over top of Elasticsearch without configuring Elasticsearch as a system of record, but then you would be lying when you refer to your RDBMS as a "system of record." It's not a "system of record" for your actual data, just a record of where pointers to actual data were at one point in time.

I feel like I must be missing what you're suggesting here.

altdataseller · on Aug 2, 2022

Having just an Elasticsearch index without also having the data in a primary store like a RDMS is an anti-pattern and not recommended by almost all experts. Whether you want to call it a “system of record”, i wont argue semantics. But the point is, its recommended hacing your data in a primary store where you can index into elasticsearch.

zasdffaa · on Aug 3, 2022

Have you a link for this? Never heard of this requirement (but not an elastic user so no surprise).

skeeter2020 · on Aug 2, 2022

This is not typically going to be stored in an ACID-compliant RDBMS, which is where the most common scaling problem occurs. Search engines, document stores, adtech, eventing, etc. are likely going to have a different storage mechanism where consistency isn't as important.

rmbyrro · on Aug 2, 2022

a search engine won't need joins, but other things (ie text indexing) that can be split in a relatively easier way.