There are another base images from google that are smaller than the base images and come handy when deploying applications that runs on single binary.
> Distroless images are very small. The smallest distroless image, gcr.io/distroless/static-debian11, is around 2 MiB. That's about 50% of the size of alpine (~5 MiB), and less than 2% of the size of debian (124 MiB).
Distroless are tiny, but sometimes the fact that don't have anything on them other than the application binary makes them harder to interact with, specially when troubleshooting or profiling. We recently moved a lot of our stuff back to vanilla debian for this reason. We figured that the extra 100MB wouldn't make that big of a difference when pulling for our Kubernetes clusters. YMMV.
I found this to be an issue as well, but there are a few ways around this for when you need to debug something. The most useful approach I found was to launch a new container from a standard image (like Ubuntu) which shares the same process namespace, for example:
docker run --rm -it --pid=container:distroless-app ubuntu:20.04
You can then see processes in the 'distroless-app' container from the new container, and then you can install as many debugging tools as you like without affecting the original container.
Alternatively distroless have debug images you could use as a base instead which are probably still smaller than many other base images:
I've found myself exec-ing into containers a lot less often recently. Kubernetes has ephemeral containers for debugging. This is of limited use to me; the problem is usually lower level (container engine or networking malfunctioning) or higher level (app is broke, and there is no command "fix-app" included in Debian). For the problems that are lower level, it's simplest to resolve by just ssh-ing to the node (great for a targeted tcpdump). For the problems that are higher level, it's easier to just integrate things into your app (I would die without net/http/pprof in Go apps, for example).
I was an early adopter of distroless, though, so I'm probably just used to not having a shell in the container. If you use it everyday I'm sure it must be helpful in some way. My philosophy is as soon as you start having a shell on your cattle, it becomes a pet, though. Easy to leave one-off fixes around that are auto-reverted when you reschedule your deployment or whatever. This has never happened to me but I do worry about it. I'd also say that if you are uncomfortable about how "exec" lets people do anything in a container, you'd probably be even more uncomfortable giving them root on the node itself. And of course it's very easy to break things at that level as well.
Also if you are running k8s, and use the same base image for your app containers, you amortize this cost as you only need to pull the base layers once per node. So in practice you won’t pull that 100mb many times.
(This benefit compounds the more frequently you rebuild your app containers.)
Doesn't that only work if you used the exact same base? If I build 2 images from debian:11 but one of them used debian:11 last month and one uses debian:11 today, I thought they end up not sharing a base layer because they're resolving debian:11 to different hashes and actually using the base image by exact image ID.
Base images like alpine/debian/ubuntu get used by a lot of third party containers too so if you have multiple containers running on the same device they may in practice be very small until the base image gets an upgrade.
I think this something that people miss a lot when trying to optimize their Docker builds. Is the whole optimizing for most of your builds vs optimizing for a specific build. Not easy.
There are some tools that allow you to copy debug tools into a container when needed. I think all that needs to be I'm the container is tar and it runs `kubectl exec ... tar` in the container. This allows you to get in when needed but still keep your production attack surface low.
Either way as long as all your containers share the same base layer it doesn't really matter since they will be deduplicate.
The way I imagine this is best solved is by keeping a compressed set of tools on your host and then mounting those tools into a volume for your container.
So if you have N containers on a host you only end up with one set of tooling across all of them, and it's compressed until you need it.
You can decouple your test tooling from your images/containers, which has a number of benefits. One that's perhaps understated is reducing attacker capabilities in the container.
With log4j some of the payloads were essentially just calling out to various binaries on Linux. If you don't have those they die instantly.
It got removed from the README at some point, but the smallest distroless image, gcr.io/distroless/static is 786KB compressed -- 1/3 the size of this image of shipping containers[0], and small enough to fit on a 3.5" floppy disk.
So the percentage makes it look impressive, but... you're saving no more than 5MB. Don't get me wrong, I like smaller images, but I feel like "smaller than Alpine" is getting into -funroll-loops territory of over-optimizing.
> Distroless images are very small. The smallest distroless image, gcr.io/distroless/static-debian11, is around 2 MiB. That's about 50% of the size of alpine (~5 MiB), and less than 2% of the size of debian (124 MiB).
https://github.com/GoogleContainerTools/distroless