Why Probabilistic Programming Matters

dmlorenzetti · on Jan 6, 2013

The point with probabilistic programming is you are able to explore slightly more complex models very easily.

Coincidentally, a colleague at lunch yesterday made exactly this observation about Stan, a probabilistic programming language he is using (http://mc-stan.org/).

Stan's notation is very similar to, if not the same as, that shown here. In Stan, those model{} statements are combined with data{} and parameter{} statements that plug in values of interest. The algorithmic details appear to get specified by the command-line invocation.

I found the Stan user guide a bit more enlightening than this essay. The essay shows a bunch of models, but it has very little discussion about what the models mean, or how they express the ideas alluded to in the short introductory paragraphs. The Stan UG does a marginally better job of showing why the model looks the way it does, and relating it to the math they're trying to solve. Still, like this essay, the exposition is definitely pitched toward somebody who already knows the field (unfortunately, not me).

algorias · on Jan 6, 2013

I had never heard of probabilistic programming languages and was intrigued but unfortunately also rather confused by the submission. So thank you for the excellent pointer.

cf · on Jan 6, 2013

Anyway, I wrote this post as a precursor to a more substantial set of writings explaining how to practically use these upcoming languages. Most of what I am trying to show here is that while we use specialized algorithms for most of our machine learning at this moment, it is only a temporary solution before it becomes feasible to just specify your models in one of these languages. The point of the code is to show that many really complicated models with high specialized libraries for just them is less than 10 lines of code in this framework.

For a really good introduction to this topic I recommend of Josh Tenenbaum's Growing a Mind talks, http://videolectures.net/aaai2012_tenenbaum_grow_mind/

I never realized this article would be read by regular folks and am incredibly confused why something I wrote 6 months ago is suddenly all over the internet now.

apw · on Jan 6, 2013

This is an amazing talk by Tenenbaum. Thank you for posting it!

I'd really like to subscribe to your blog/site so I can read your more substantial set of writings as they appear -- but I can't seem to find the appropriate RSS or ATOM feed on your site. Am I missing something?

cf · on Jan 6, 2013

You can subscribe with http://zinkov.com/rss.xml. I get this question a lot, what's the standard way this is solved?

icebraining · on Jan 6, 2013

You can enable RSS auto-discovery by adding a tag to your HTML:

  <link rel="alternate" type="application/rss+xml" title="RSS Feed" href="http://zinkov.com/rss.xml" />

Any decent RSS reader (e.g. Google Reader, TTRSS) would then detect the feed if the user tried to subscribe to your homepage. Some browsers and browser extensions also provide a one-click subscribe button based on that tag.

It's also common to just put a visible link somewhere on the page, since not everyone knows about auto-discovery.

apw · on Jan 6, 2013

Thank you. I subscribed.

The usual practice seems to be to put a link to the rss feed (the one you just gave) somewhere on the archive page or at the top of each post page.

Thanks again.

ced · on Jan 6, 2013

Stan aims to be the successor to BUGS, and it is made in part by the great Bayesian Andrew Gelman. Nevertheless, BUGS was the de facto language for probabilistic modelling for many years, and you may find better documentation for it.

andrewcooke · on Jan 6, 2013

i am confused about whether it's best to post here on in a separate thread, so apologies for doing both, but this provoked me to look for a good intro to probabilistic programming and i found http://research.microsoft.com/en-us/um/cambridge/projects/in... which is a very easy read.

abecedarius · on Jan 6, 2013

I only got partway into http://projects.csail.mit.edu/church/wiki/Probabilistic_Mode... but mean to come back -- it looks worthwhile.

Roydanroy · on Jan 6, 2013

See http://probabilistic-programming.org

joe_the_user · on Jan 6, 2013

The question I'd have about creating a new programming language for a new technique is what does the syntax give you to compensate for the costing of learning a new language?

tikhonj · on Jan 6, 2013

This is not an issue of syntax. You can have probabilistic languages with existing syntax--Church, for example, uses Scheme syntax[1].

[1]: http://projects.csail.mit.edu/church/wiki/Mixture_models

You can even embed your new language in another, well-known general-purpose language, letting you reuse the host language's syntax. Haskell is great for this sort of work.

This is entirely a question of semantics. And what do probabilistic semantics give you? They make writing complex models for data much more natural. These languages allow you to abstract away much of the incidental complexity of specifying these models, making the logic clearer. In turn, this simplifies reasoning about the problem and working on a solution.

More generally, this is the motivation behind most domain-specific languages. The core idea is to express yourself in a way that fits the domain you're working in. For machine-learning, you want to express your program in terms of probability distributions. For writing parsers, you want to express yourself in terms of CFGs. As people like to repeat, a program written more for people to read than computers to execute, and reading something in terms relevant to the domain is easier than trying to force the domain to fit an existing programming language and paradigm.

cf · on Jan 6, 2013

The trouble with probabilistic programming is it requires very unnatural control flow to run inference on the models. Now there any solutions like Factorie, Figaro and Infer.NET where they are just a library you call into your code. The trouble is because of the representation you end up with much more verbose code.

The cost of learning a new language isn't one you can avoid even if it weren't a language. To write these programs in any system you have to get used to thinking of your data as generated by this unobserved process. Thinking in that way isn't something you can avoid and is actually the main stumbling block.