Google open sourced GPipe, a library for training large ML models

visarga · on March 5, 2019

Interesting, if you're a large company that can afford the compute.

zak · on March 5, 2019

Actually, anyone can use a Cloud TPU v2 (one of the accelerators mentioned in the blog post) for free via Colab: https://colab.research.google.com/notebooks/tpu.ipynb

Several other TPU-enabled Colabs are linked from the TFRC website: https://www.tensorflow.org/tfrc/

EForEndeavour · on March 5, 2019

Colaboratory is a great resource for learning and prototyping code, but it has drawbacks that render it useless to anybody who needs to deploy anything compute-intensive. The grandparent commenter has a point despite the availability of free GPU/TPU compute like Colab.

Here are two dealbreakers that come to mind, with quotations from https://research.google.com/colaboratory/faq.html:

1. Vague disclaimers that your VM is reset (even when it's running? Probably, but who knows?) after some unspecified maximum lifetime, and GPUs are "sometimes unavailable" (What does "sometimes" mean? They don't tell you. Does this apply to TPUs as well? Probably, but they don't say). Also, the mere act of manually resetting the VM is "sometimes unavailable"!

Relevant passages: "Code is executed in a virtual machine dedicated to your account. Virtual machines are recycled when idle for a while, and have a maximum lifetime enforced by the system."

"Colaboratory limits how often [resetting all your managed VMs] can be done to prevent undue resource consumption. If an attempt fails please try again later."

"How may I use GPUs and why are they sometimes unavailable? [...] _Long-running background computations, particularly on GPUs, may be stopped._"

2. The only way currently to use Google's free GPU/TPU resources is to connect through a browser running a Colab Notebook, which is Google's homemade flavour of Jupyter Notebook, which adds Google-Drive-like live editing of the same document, but (in my opinion) ruins critical UX/UI features that make vanilla Jupyter Notebooks tolerable to use. Colab has worse keyboard shortcuts, and _sort of_ offers command and edit modes, but half the time you can't see which mode you're in, and it's not obvious or consistent at all how and when to switch modes. Not only does Colab force over-reliance on notebooks and Google Drive as opposed to "proper" version control, but it limits you to a nonstandard Notebook version that makes the code-writing experience far less fluent than it has to be.

tylerl · on March 5, 2019

Yeah, if those are your deal breakers, then you seem to be a prime candidate for the "pay for your own damn resources" tier of pricing model, that way you can have whatever you want.

EForEndeavour · on March 5, 2019

I fully agree! The purpose of my mini-rant was simply to address the parent comment, which seemed to imply that thanks to Colab, paying for compute is obsolete.

syntaxing · on March 5, 2019

I thought TPUs are mainly only for inferencing? You can also select GPU on Colab for free which uses a Tesla K80.

thesandlord · on March 5, 2019

TPU v1 was inference only, Cloud TPUs (v2 and v3) do both.

syntaxing · on March 5, 2019

Interesting, did not know that. Do you know how it compares to a K80?

thesandlord · on March 6, 2019

There are quite a few benchmarks and comparisons online, I work for GCP so I don't want to give a biased answer.

Also, K80s are pretty old these days, though this makes them much cheaper than P100s and V100s. Comparing all of them from a price/perf standpoint (especially with TPUs in the mix) can get tricky because it's different for every use case.

yzh · on March 5, 2019

This is actually not about abusively using the computing power (I believe you can use it on a 8-GPU DGX or your DIY workstation), but how to use the same number of GPUs to train a larger model with a larger input size. Pipeline model parallelism seems to be a promising method to increase accuracy by enlarge the model/data size, IIUC.

novaRom · on March 5, 2019

Tried similar approach several years ago but was unsatisfied with final accuracy results. Pipelining is obvious method to accelerate a sequence of computations, but with respect to SGD it requires some accuracy tolerance because weight updates go out of sync.

aoeusnth1 · on March 5, 2019

It seems they're not actually increasing the minibatch size here. I don't see how the updates for N microbatches of size M will be any more out of sync than a single large minibatch of size N*M.