GPUs as a service with Kubernetes Engine are now generally available

minimaxir · on June 19, 2018

With the new discounts on preemptible GPUs (https://cloudplatform.googleblog.com/2018/06/Introducing-imp...), the economics of quickly spinning up a fleet of GPUs with Kubernetes for a quick parallelizable ML task become very interesting. (assuming that Google allows enough GPU quota for a fleet of GPUs for nonenterprise users anyways)

What I want to use Kubernetes + instant-GPU-fleet for deep learning hyperparameter grid searching. (i.e. spin up a lot of preemptible GPUs; for each parameter config, train the model on a single GPU in parallel for linear scanning speed scaling).

Kubeflow (https://github.com/kubeflow/kubeflow) is close to this functionality, but not quite there yet in user-friendlyness. (you have to package everything in a huge Docker container and launch jobs from the CLI; ideally what I want to do is to spawn containers and start training directly from the JupyterHub notebook on the master node)

boulos · on June 19, 2018

Disclosure: I work on Google Cloud (and helped launch Preemptible VMs).

> (assuming that Google allows enough GPU quota for a fleet of GPUs for nonenterprise users anyways

This is actually why we have separate preemptible quota [1], which we grant more freely. You can't stock out our full-price customers, so we're happy to let you spin up tons of V100s (and as of this morning TPUs!).

[1] https://cloud.google.com/compute/quotas#quotas_for_preemptib...

minimaxir · on June 19, 2018

I was wondering why there was a separate quota. That makes much more sense!

jamesblonde · on June 19, 2018

If you are adventurous enough to try a more user-friendly platform for Distributed TensorFlow (Horovod, grid-search using Spark/TensorFlow), have a look at http://www.hops.io (watch video on front page). In Hops, you don't program infrastructure (no YAML file, no Dockerfiles). You can run with 100s of GPUs for grid search using this python code:

  def train(param1, param2, ..):
    # tf code here (import any libraries here)

  dict = { 'lr': [0.001, 0.0001, 0.00001], 'dropout': [0.4, 0.6, 0.8]}
  experiment.launch(spark, train, dict)

The reason this works on clusters is that there is a persistent conda environment installed on all hosts in the cluster for every 'project' (think Github project). The above function 'train' is a pyspark mapper and the python libraries needed in it are found in the local conda environment. You install conda libraries by search/click in a UI, not by writing a Dockerfile. You create your cluster by selecting how many GPUs/memory you need in a UI, not by writing a YML file.

Diclosure: i work on this project.

w1nk · on June 19, 2018

Out of curiosity, why are you grid searching if you have access to google's infrastructure (vizier, etc).

minimaxir · on June 19, 2018

You mean Cloud ML Engine? (https://cloud.google.com/ml-engine/)

Cloud ML Engine is a bit opaque in terms of price efficiency (how powerful is a "training unit"?) but I suppose that's another option.

w1nk · on June 19, 2018

Hm, actually sorry I assumed they had made vizier available as a blackbox service (https://static.googleusercontent.com/media/research.google.c...). If they haven't, you can save money/time by employing better than grid search. There are some re implementations of vizier available, I'll see if I can make ours too (email in profile). I do agree with that billing can be..complex ;)

minimaxir · on June 19, 2018

From the paper synopsis, it's implemented as a part of HyperTune. (https://cloud.google.com/ml-engine/docs/tensorflow/using-hyp...)

kozikow · on June 19, 2018

We have been using GPUs with GKE for a while. At some point, we used 20+ GPUs in an production workflow without any problems.

Everything generally works well, maybe except the initial phase when some containers won't port well from nvidia-docker-compose due to problems with Cuda libraries. Ideally, you need to match the version of Cuda everywhere.

My dev setup for quick experimentation with GPU docker container on GKE: https://tensorflight.blog/2018/02/23/dev-environment-for-gke... .

rjain15 · on June 19, 2018

Are these GPUs on bare metal or virtualized GPUs on VMs

wmf · on June 19, 2018

Everything in GCE/GKE is running inside VMs, but when you attach a GPU you get the whole GPU (via PCI passthrough).

fpgaminer · on June 19, 2018

On a related note, last week I took a dive into Kubernetes on gcloud for a personal project and came out with some interesting knowledge.

First off, this was for a _small_ personal project. Something that I originally intended to run on an f1-micro. I decided to check out Kubernetes mostly to learn, but also to see if it could offer a more maintainable setup (typically I just write a mess of provisioning shell scripts and cloud-init scripts to bring up servers. A bit of a mess to maintain long-term). So basically, I was using Kubernetes "wrong"; its target audience is workloads that intend to use a fleet of machines. But I trudged forward anyway.

This resulted in the first problem. You can't spin up a Kubernetes cluster with just one f1-micro. Google won't let you. I could either do a 3x f1-micro cluster, which would be ~$12/month, or 1x f1-small, should would be about the same price. Contrast with my original plan of a single f1-micro, which is ~$4/mo. Hmm...

Well after playing around I discovered a "bug" in gcloud's tooling. You can spin up a 3x f1-micro cluster. Then add a node pool with just one or two f1-micros in it. Then kill the original node pool. This is all allowed, and results in a final cluster with only one or two nodes in it. Nice. "I know what I'm doing, Google is just being a dick!" I thought. I could still spin up Pods on the cluster, no problem.

Then the second discovery. The Kubernetes console was reporting unschedulable system pods. Turns out, Google has a reason for these minimums.

All the system pods, the pods that help orchestrate the show, provide logging, metrics, etc; they take up a whopping 700 MB of RAM and a good chunk of CPU as well. I was a bit shocked.

I'm sure most developers are just shrugging right now. 700 MB is nothing these days. But remember, my original plan was a single f1-micro which only has 700 MB. This is a personal project, so every bit counts to keep long-term costs down. And, in deep contrast to Kubernetes' gluttony, the app I intend to run on this system only uses ~32 MB under usual loads. That's right; 32 MB. It's a webapp running on a Rust web server.

So hopefully you can imagine my shock at Kube's RAM usage. As I dug in I discovered most all of the services are built using Go. No wonder. I love Go, but it's a memory hog. My mind started imagining what the RAM usage would be like if all these services had been written in Rust...

Point being, 700MB exceeds what one f1-micro can handle. And it exceeds what two f1-micros can handle, because a lot of those services are per-node services. Combined with the base RAM usage of the (surprisingly) bloated Container OS that Google runs on the nodes in the cluster. (Spinning up a Container Optimized image on a GCE instance I measured something like 500 or more RAM usage on a bare install...). And hence why Google won't let you spin up a cluster of less than three f1-micros. You can, however, use a single f1-small since it has 1.7MB of RAM in a single instance.

At this point I resigned myself to just having a cluster of three nodes. shrug the expense of learning, I suppose. And perhaps I could re-use the cluster to host other small projects.

It was at this point I hit another road block. To expose services running on your cluster you, more or less, have to use the LoadBalance feature of Kube. It's convenient; a single line configuration option and BAM your service is live on the internet with a static IP. Except for one small detail that Google barely mentions. Their load balancers cost, at minimum, $18/mo. That's more than my whole cluster! And my original budget was $4/mo...

There are workarounds, but they are ugly. NodePort doesn't work because you can't expose port 80 or 443 with it. You can use host networking or a host port; something like that. Basically build your own load balancer Pod, assign it to a specific node, and manually assign a static IP to that node. (hand waving and roughly recalling the awkward solution I conjured). But it requires manual intervention every time you want to perform maintenance on your cluster. The opposite of what I was trying to achieve.

To sum it all up; you need to be willing to spend _at least_ $30/mo on any given Kubernetes based project.

So I gave up on that idea. For now I've fallen on provisioning shell scripts again. Though I've shoved my application into containers and am using Docker-Compose to at least make it a little nicer deployment.

I also took a few hours to run through the Kubernetes The Hard Way tutorial; thirsty for a deeper understanding of how Kube works under the hood. It's a fascinating system. But after working through the tutorial it became _very_ clear that Kube isn't something you'd want to run yourself. Not unless you have a dedicate sys/devops to manage it.

Also interesting is that Kube falls over when you need to run a relational database. The impedance mismatch is too great. Kube is designed for services that can be spread across many disposable nodes. Not something Postgres/etc are designed well for. So the current recommendation, if you're using a relational database, is to just use traditional provisional or a managed service like Cloud SQL.

P.S. For as long as I've used Google Cloud, I have and continue to be eternally frustrated by the service. It's a complete mess. Last week while doing this exploration I ran into the problem where half my nodes were zombies; never starting and taking an hour to finally die. I had to switch regions to "fix" the problem. Gcloud provides _no_ support by default, even though I'm a paying customer. Rather, you have to pay _more_ for the _privilege_ of talking to someone about problems with the services you're already paying for. Incredibly frustrating, but that's Google's typical M.O.

Not to mention 1) poor, out-dated documentation; 2) The gcloud CLI is abysmally slow to tab-complete even simple stuff; 3) The web console is made of molasses and eats an ungodly amount of CPU resources just sitting and doing nothing; 4) little to no way to restrict billing; the best you can do for most services is just set up an alert and pray that you're awake if shit hits the fan. 5) I'm not sure I can recall a single gcloud command I've run lately that hasn't spewed at least one warning or deprecation notice at me.

puzzle · on June 19, 2018

Were the system pods using all that memory or just reserving it? It's not straightforward to scale them, because the node might run just your tiny Rust server or 20 high traffic web apps. You don't want the log agent to keel over just because of the latter. GKE and many other Kubernetes deployments use something called addon-resizer to determine CPU and RAM given to cluster services. The problem is that, typically, it scales based on node count and the settings are usually conservative on the lower end, i.e. your case of a single node. I think it assumes clusters are all at least 10/15 nodes. On a test cluster, I see the metrics server using only 16MB of RAM, but it requests 104MB. Ironically, the autoscaling nanny in the same pod uses another 8MB.

This is a known issue that is not easy to solve in the general case. I think Tim Hockin ran a conversation about how to autoscale on the very low end at last year's Kubecon, with people like you in mind. The other use case he brought up is how to set up services in a Minikube cluster that might be running in a 2GB VM.

fpgaminer · on June 19, 2018

> Were the system pods using all that memory or just reserving it?

700MB was the sum of all the requested minimum RAM for all those service pods. So yeah, you're probably right that they're ceilings of sorts. Still it's a bit crazy to see a logging service, who's job is merely to haul logs off to a different server, requesting 200MB.

I'm also bewildered by Container Optimized OS's memory consumption. IIRC it was 500MB+ bare; doing nothing. As reported by top. I forget which, but I stood up either Debian Stretch or Ubuntu 18.04 and it was only ~200MB with Docker installed.

puzzle · on June 19, 2018

The logging service numbers are easily explained: the remote server might have transient failures, so the forwarder will cache stuff in memory (the alternative is to just stop reading the logs, but then you risk losing messages if the pod dies in the meantime). You complained about Go, but it doesn't help that the fluentd agent is written in Ruby. There's a new Go rewrite of it, but I don't think GKE or others use it yet.

Was COS a standalone GCE instance or GKE? In the latter case, memory will be used by the usual suspects: fluentd, kubelet, kube-proxy, docker, node-problem-detector. For both, there are also a few Google daemons in Python (ugh).

wstrange · on June 19, 2018

A PaaS solution (like App Engine) would be far more appropriate for what you want to do. Unless your goal is to learn Kubernetes - in which case minikube works just fine.

W.r.t. to the GCP console and tools - I guess it's a preference thing - but I vastly prefer them over the AWS tools. They work fine for me. I like the feature in the GUI where it shows you the equivalent gcloud command line.

fpgaminer · on June 19, 2018

I love App Engine and have used it in the past, but it: 1) is quite old and receives precious little love from Google; 2) isn't great if you want to use a relational database; 3) not a great option for applications that require iron clad security.

To be clear on #1, App Engine has been nothing but reliable for me. Yet it receives few updates; for example, only supporting Python 2.7...

#2: It works great with Datastore, but for SQL you have to use a separate instance or Cloud SQL; either will cost additional money and maintenance. And, last I checked, Postgres was a no-go for App Engine.

#3: It can be hard to secure App Engine apps properly. User data leaking in the logs, for example. And I've encountered a few bugs that lead me to distrust the runtime they use. (I reported the bugs, but still). Where security is of the utmost importance, I have to opt for my own stack.

[This is all Standard Environment. Flexible is brutally expensive.]

thesandlord · on June 20, 2018

1) Node.js was just released. Python 3 is in alpha. Definitely check it out! There is a brand new sandbox that allows us to ship new runtimes much faster now.

2) True, but you would need this no matter where you are running right? Also, I think the new sandbox runtimes (Java8, Node.js, Python3) should support Postgres.

3) Leaking data in the logs can happen with any app though? What in particular are you seeing? WRT sandbox, the new sandbox should be a lot more robust as well.

I'd definitely give App Engine another shot, maybe in the near future. We are definitely investing a lot on it, I'm sorry it hasn't felt that way for a while.

(I work for GCP)

brianwawok · on June 19, 2018

Sorry, this is not a use case they should worry about. Running a $4 server to play with it. You can use minikube for that.

Many of us really really really love GKE and get a lot of use out of it. And yes, I run it myself on about 1h of work a week (no dedicated devops) for my entire startup.

jfoutz · on June 19, 2018

Not sure if they're still doing it, but it was trivial to get a $300 credit to spin up enough hosts to make sense.

the great Kelsey Hightower's tutorials take advantage of the credits. https://github.com/kelseyhightower/kubernetes-the-hard-way/b...

erikb · on June 19, 2018

How about making vanilla k8s usable on-premise first...

ethanwillis · on June 19, 2018

I have a lot of bare metal servers in a rack at my home.. Tried getting kube running well last night actually.. Well if you're using Ubuntu server 18.04 GOOD LUCK. There's plenty of issues in the various repositories around kube. I eventually just tried using Canonical's conjure-up tool to install kube.

You'd think that Canonical's tools would work on their LTS right? wrong.. absolutely wrong. It couldn't even do an openstack install either. Absolutely abysmal.

erikb · on June 20, 2018

Yep and there are altogether problems that come up after you got it running. E.g. you probably want to make use of PVCs, but there is no stable dynamic-storage provider yet. Google isn't even working on it as far as github is showing.