In terms of etcd3 vs sqlite3, it is as reliable as most airplane systems that de...

asark · on Feb 27, 2019

> I think the "high availability by redundancy" story is oversold.

More generally, an awful lot of systems that'd be fine with a SQL mirror (non-failover), some real backups on top, and a restore-from-backup event every 3-5 years, plus the occasional 1hr maintenance window, are instead made "HA" and "redundant" at 10+x the ops cost and, unless very well done, with even more down-time anyway because they're so damn complex.

philips · on Feb 27, 2019

Alternatively the K3s authors could have embedded a single node etcd process into Kubernetes using the embed package instead of introducing sqlite.

https://godoc.org/github.com/etcd-io/etcd/embed

This is something the Kubernetes community might consider as well.

smarterclayton · on Feb 27, 2019

Yeah, OpenShift did this from the very first version. It worked pretty well. Memory use was very reasonable from etcd 3.0 on.

tyingq · on Feb 27, 2019

If someone wanted to do it, the lxc/lxd team has a distributed sqlite, so you could adapt that to k3s. Though I suppose that's not a good match for the stated purpose of k3s.

https://github.com/lxc/lxd/blob/master/doc/database.md

https://github.com/CanonicalLtd/dqlite

eikenberry · on Feb 27, 2019

The reliability of sqlite3 is not in question, the reliability of the system running it is. When running in any of the cloud providers you will get systems that briefly lose network connection or just go away. Sometimes with notification, sometimes without. You will have down time if you don't plan for HA. So the question is whether it is worth the complexity trade off for that period of downtime or how many 9s do you need in your SLA.

detaro · on Feb 28, 2019

Not worried about reliability of sqlite, but the system running it, and what happens if it goes down. E.g. if it relies on a single node, but the cluster just can't make changes anymore but continues to run just fine and cleanly recovers once the main DB is back, that's probably a tradeoff that works often. If stuff starts breaking quickly, not so much.

geofft · on Feb 27, 2019

Airplanes also contain expensive hardware. My own desire for reliability via redundancy is that commodity hardware (which is what's in most datacenters) likes to fail.

monstrado · on Feb 27, 2019

Interesting enough, FoundationDB currently uses sqllite's storage engine for persistence.

rickycook · on Feb 27, 2019

pretty sure critical airline systems have redundancy