What Is Bayesian/Frequentist Inference?

Eliezer · on Nov 18, 2012

I don't think this guy understands the debate. A quick summary:

If you think statistics is a big toolbox, some of the tools give different answers that are better or worse in various ways, and you can just take out whatever tool you like, you're a frequentist.

If you think that there's such a thing as a correct probability estimate, and all coherent reasoning is required to come up with consistent answers regardless of which different path was taken to arrive at the same destination, you're a Bayesian. From this perspective, a "confidence interval" isn't a tool that's useful on some occasions, it's just plain crazy and wrong, like a weather forecaster who only tells you the probability that it's raining here xor in Narnia. Sure, the forecast is generated by a process that's sorta related to the correct answer, but by manipulating the imaginary land of Narnia you can make the forecast be basically anything. With Bayesianism there are no degrees of freedom in the likelihood ratio you report. See http://xkcd.com/1132/.

It doesn't do any good to appeal to the idea that Bayesian methods are just one tool in the toolbox. Only frequentists think in terms of toolboxes in the first place.

Also Bayes's Rule is tautologically equivalent to Bayes's Theorem. There's more wrong, but meanwhile, color me unimpressed.

keithwinstein · on Nov 18, 2012

I wouldn't want to go up against Eliezer Yudkowsky casually, but here goes: the guy is basically correct. (Although I also didn't follow his statement that Bayes Theorem != Bayes Rule.)

You can see my essay here, where I express a similar view: http://www.quora.com/What-is-the-difference-between-Bayesian...

Confidence intervals and credibility intervals are both mathematical objects that have well-specified (and different) properties. Confidence intervals are a worst-case technique and posterior probabilities are a sort of average-case technique. It's not "wrong" to say that the worst-case runtime of QuickSort is O(n^2) and it's not wrong to say that, given a uniform probability distribution over inputs, the expected runtime is O(n log n).

Which statement is useful to make depends on your requirements. They're both true.

In my "100 independent robots" example, for instance, the credibility interval or posterior probability produces answers that are not helpful for the application (and not particularly intuitive either -- the 70% credibility interval is "wrong" 80% of the time, given a certain value of the parameter). This can be a question of engineering and there's no need to be dogmatic about it.

It is perverse and incorrect to say that "Only frequentists think in terms of toolboxes"! All mathematicians and engineers have access to the whole world of theorems and algorithms and techniques, all of them true, or meeting their specifications. No mathematician would argue that the Chinese remainder theorem is wrong because they are a Galois theorist! And no practitioner of Bayesian methods should argue that confidence intervals that meet their worst-case coverage guarantee are "wrong" because the person uses posterior probabilities.

(My quibble here is with dogmatism, not with Bayesians, because the dogmatic frequentists are just as bad. They just don't hang out on Hacker News.)

gjm11 · on Nov 19, 2012

I would hesitate a great deal before entering an argument with either Larry Wasserman or Eliezer Yudkowsky, but here goes.

You're right that if, for whatever reason, someone is fascinated by "coverage" then confidence intervals will answer their questions better than Bayesian posteriors. But I think Eliezer's right that there's something very wrong with thinking that "coverage" in this sense is what matters.

Let's consider your example again. In what circumstances is the following actually a useful problem to solve? "Given an observation of one thing from a box, tell me a set of box-types in such a way that for each box-type you'll choose a set including the right one at least 70% of the time."

I can think of some. For example: a mad scientist starts sending you boxes, with instructions to start guessing; he's going to monitor your results on each box-type and if he sees you getting any type of box wrong more than about 30% of the time he'll kill you. Otherwise he'll reward you for nominating fewer box-types each time. But (1) that's a desperately contrived situation and (2) the most diehard Bayesian, in that situation, will produce something like "confidence intervals" because that's what Bayesian decision theory says to do.

Is there any not-so-contrived situation where the problem solved by confidence intervals is actually an important one?

By the way, my best guess about the Rule / Theorem thing is that he's distinguishing between a theorem about conditional probabilities, and a normative rule saying "when you get new information, update your beliefs like so".

keithwinstein · on Nov 19, 2012

> the most diehard Bayesian, in that situation, will produce something like "confidence intervals" because that's what Bayesian decision theory says to do.

I disagree that this is what "Bayesian" decision theory says to do. It's what decision theory says to do, and it's what math says to do, and it's what the constraints require. It's not particularly "Bayesian" -- it's just what you have to do.

If everything that happens to be the correct answer (including frequentist confidence intervals when called for) is now described as Bayesian, then the term has no meaning and we are all Bayesians. :-)

> Is there any not-so-contrived situation where the problem solved by confidence intervals is actually an important one?

What would you do in the case of my 100 robots, where you want 70 of them to come to the correct decision, and they have to make their decisions independently? Having them all calculate a posterior independently works terribly (as I showed, 80% of them come to the wrong conclusion with >73% belief). Confidence intervals work a heck of a lot better.

The optimal approach would consider what single algorithm works best when run independently. Finding these solutions (on, e.g., a decentralized POMDP) is an open problem.

gjm11 · on Nov 19, 2012

I called it Bayesian because it describes what Bayesians will do. It does indeed also describe what anyone else sane will do. The point is that in order to make confidence intervals the right answer you need a situation weird enough to make even Bayesians use confidence intervals.

In the case of your 100 robots, why am I supposed to want 70 of them to come to the correct decision? This seems just like my mad-scientist example: contrived to force confidence intervals (or something very like them) to be the right answer. Can you explain in what sort of situation this would be a sensible thing to care about?

jules · on Nov 20, 2012

So explicitly state the property you do want to optimize for in your robot example, and state your prior belief, and then crank the handle on the Bayesian reasoning machine to obtain your optimal answer.

You may get an intractable problem that you can't solve exactly, and for a carefully cherry picked objective the frequentist answer might even be a good approximation.

khafra · on Nov 27, 2012

What if you're an Italian seismologist, and you want to produce a prediction which is in some way useful, but still expresses an appropriate level of doubt to a lay audience?

loup-vaillant · on Nov 19, 2012

> All mathematicians and engineers have access to the whole world of theorems and algorithms and techniques, all of them true, or meeting their specifications.

Only crazy mathematicians would think nothing of a set of tools that contradict each other. If I recall correctly, frequentist methods often yield different (and therefore contradictory) results depending on the way you look at the data. That's crazy.

I also have read your link: the prior problem can easily be solved by just communicating the likelihood ratios. Those are pretty much indisputable, and the actual contribution of the paper. Let the others start from their own priors. We have to chose one anyway.

tel · on Nov 19, 2012

Bayesian methods are also inconsistent wrt infinitely small changes in priors. See Wasserman's blog for more details. If you're worried about the Likelihood Principle causing "frequenting" inconsistency then you're taking a bitter pill in that not everyone actually believes the LP, the assertions leading to it can be debated, and, further, Likelihoodist methods (like most of Fisher's work) are not affected by it while still worrying themselves with coverage instead of posteriors.

loup-vaillant · on Nov 19, 2012

> Bayesian methods are also inconsistent wrt infinitely small changes in priors. See Wasserman's blog for more details.

That looks like it would falsify the method instantly. Or are you talking about merely small changes in priors? Anyway, I can't find the details you speak of, do you have a link?

As for the likelihood principle, my math is rusty at the moment, I'll check. I expect however that I will just accept this principle as obvious. People don't believe it? They probably had the wrong teachings, just like most religious people. I'm not sure I know on a gut level how hopelessly wrong people often are. I'm not sure I properly feel the fear that should come with the possibility that I am hopelessly wrong. But I have an idea.

tel · on Nov 18, 2012

I agree with you and Larry and want to elaborate. Dogmatic Bayesians often come armed with assertions that probability calculus is the "correct" way of doing inference and, further, that there is some kind of ethical maximum achieved when you make your decisions exclusively via inference.

They have strong reason to feel this way, of course. Probability calculus is a very elegant theory that can be constructed from rather sane "desiderata" a la Cox's Theorem. Furthermore, if you think of ethics as an optimization problem, which is fair, then you're likely to be attracted to decision theory as a powerful tool for being more ethical. QED?

Unfortunately, neither of these reasons have the conclusive force that they appear to. The sanity of Cox's postulates and ethics of optimization are still (arbitrary) choices which serve mostly to create a formalized world for us to live in and study with efficacy. There is no doubt that people have derived great power from similar use of powerful models, but models they are nonetheless.

Asserting that modeling the world based on these postulates is ethically mandatory is weird. Feeling confident that you can become more powerful than others by doing so is a bet.

So, assuming you're a betting type instead of a religious type, would you rather use models that are elegant but incomplete or practical and battle-tested? I believe both Sampling philosophy and Bayesian philosophy to be mathematically elegant. I also have seen them both very practically used.

And once you're interested in measuring practical performance of these methods, once you decide to pick things like loss functions and philosophies of measurement, then it's easy to find places where either framework fails.

---

So really, in my mind, there are Bayesian methods and Frequentist methods and Bayesians who, probably following Jaynes' hyperbolic and recommended work, feel that there is a certain theory of mind which is correct due to its elegance. Most Engineers can pick between the methods and most mathematicians can pick between the theories.

And having a favorite theory isn't so bad either. I quite like some "Likelihoodists" as well and learn from them every time we talk.

---

I think the best model of the situation is that of logic. It's like we have "First-order logicians" and "Higher-order logicians", each of which espousing a viewpoint that a particular Theory is "more true" than the other. Truly, they both have fortes and foibles, though, and the practical mathematician studies both and uses whichever one lets them prove what they need to. The problem comes into play when "higher-order logicians" suggest that "first-order logicians" are inherently wrong since their very theory prevents them from being able to conceive of a better theory and this reticence is leading to disarray. But then the "higher-order logicians" can't come to a consensus on how to fix things and also take a potentially infinite amount of time to reach a conclusion.

Really, though, our thought is neither bound to first or second order logic. Perhaps this causes us pain from time to time, but it also lets us pick and choose.

loup-vaillant · on Nov 18, 2012

> Dogmatic Bayesians often come armed with assertions that probability calculus is the "correct" way of doing inference and, further, that there is some kind of ethical maximum achieved when you make your decisions exclusively via inference.

Put it more crudely, "Bayesians are a bit extremist, therefore we shouldn't trust them too much".

> Unfortunately, neither of these reasons have the conclusive force that they appear to. The sanity of Cox's postulates and ethics of optimization are still (arbitrary) choices…

I have read the first two chapters of Jayne's Book, so I must ask: do you know of any other choice that isn't completely insane? You need very few assumptions to get to probability theory.

> …which serve mostly to create a formalized world for us to live in and study with efficacy.

Last time I checked, it looked like our world runs on math (which math is the big question). But even if it doesn't, do you expect we can find anything better to study it? Even if the world is chaotic, it doesn't mean our thinking shouldn't be lawful.

tel · on Nov 19, 2012

Haynes is intuitively convincing, but intuition isn't everything. If you buy his interpretation of his desiderata and buy his assertions (like: plausibility is a real) then its convincing to believe that his argument leads to probability---though I think Cox's Theorem as presented by Jaynes has been disproved? I think it still holds in another form though.

The dogmatism is not that we should forsake math, but instead that we should accept any One True Math. History is littered with the broken careers of those who did so.

loup-vaillant · on Nov 19, 2012

Disproved?! frantic search… Oh, you meant for infinite sets. Somehow I'm not surprised. When one can make a 1-1 correspondence between even numbers and natural numbers… Maybe that's one reason why Nick Bostrom says Infinite Ethics is hard? http://www.nickbostrom.com/ethics/infinite.html

Plausibility is a real… What else could it be? I still need something continuous, quantifiable, and totally ordered. Why, might one say? I'm not sure I can explain. Some things we just take for granted. Only a rock would take nothing for granted.

ced · on Nov 18, 2012

The Goal of Bayesian Inference: Quantify and manipulate your degrees of beliefs. In other words, Bayesian inference is the Analysis of Beliefs.

Bayesian inference is no more about beliefs than logic is (or any scientific inference, really). "M(G) AND C(M) => C(G)" can be rendered as "If you believe that glass is a metal, and you believe that metals are good conductors, then you should also believe that glass is a good conductor". Scientists omit the "if you believe" out of conciseness.

Some subjective Bayesians will tell you that their job is to produce the above. Then they're done. "You said you believe that glass is a metal, so I put that into my Bayesian inference procedure, and it says that you should also believe that glass is a good conductor."

But this is not what science is about! Obviously, "glass is a conductor" strongly contradicts empirical data. We have to challenge every assumption, and possibly change models!

This is why smart Bayesians check the fit of their model, and I would strongly recommend Gelman's Induction and Deduction in Bayesian Data Analysis to any statistician interested in that perspective. It places Bayesianism squarely in the paradigm of traditional scientific analysis.

http://www.rmm-journal.de/downloads/Article_Gelman.pdf

loup-vaillant · on Nov 18, 2012

"Metal" only refer to a set of kinds of matter. A set we shaped because it helps us make useful inferences without using too much brain power. Like the fast rules: "Most metals are good conductors", "Most metals are strong", "Most metals are hard", "Most metals are heavy".

Then someone comes and shows you that new material called "glass" that is heavy, hard, and strong (this one is bullet proof). You'd be quick to infer that it is a metal, and therefore probably a good conductor. But didn't we tell you that most metals are opaque?

http://lesswrong.com/lw/no/how_an_algorithm_feels_from_insid...

mjn · on Nov 18, 2012

I agree with much of this, and tend to be fairly ecumenical/pragmatic in my own choice of tools, but there are two things that lead to the "identity statistics" that are only briefly covered here, I think.

One is the entire philosophical debate, e.g. at least some Bayesians think arguments against the coherence of frequentist statistics are damning enough to make it questionable whether the methods should be considered rigorous statistics at all (admittedly this is basically the hardline view) [1].

The other is that it's not always agreed when it's appropriate to look for coverage versus to analyze beliefs, partly due to the philosophical debate, and partly because often what you ultimately want is a decision, and there are arguments for whether you should base decisions on frequentist-coverage machinery, or on belief-update machinery. For example, to move slightly afield from bounding a parameter, let's say we want an estimate of the region in which bombs are likely to fall. This can be formulated in frequentist statistics as a tolerance interval, with two decision thresholds, one for how many bombs we want to bound, and one for how confident we want to be in the bound: we want an interval that includes at least x% of the population with y% confidence, e.g. that with 99% confidence we'll bound 99% of bombs [2]. On the other hand, it can be formulated as a question about belief: essentially, we want to find the range in which we believe (for some suitably conservative definition of belief) we are going to find falling bombs, which Bayesian predictive statistics looks at.

[1] One famous/infamous such argument: http://en.wikipedia.org/wiki/Likelihood_principle

[2] I wrote a bit on why tolerance intervals should really be a more prominent part of the frequentist toolbox: http://www.kmjn.org/notes/tolerance_intervals.html

shardling · on Nov 18, 2012

>let's say we want an estimate of the region in which bombs are likely to fall.

That seems like fundamentally the wrong sort of question. You never actually care directly about something like -- the point of statistics is to inform some decision-making process. And it almost always becomes more obvious how to proceed when you keep in mind what you're actually using the stats for.

sillysaurus · on Nov 18, 2012

the point of statistics is to inform some decision-making process.

An estimate of the region in which bombs are likely to fall would directly inform your decision-making process ("Don't go over there!") so I don't understand your objection.

lisper · on Nov 18, 2012

I don't know what shardling intended, but here's my take on it: the kind of estimate you want depends on the question you want to answer. I live in California, so the analysis I'm doing on the rockets coming out of Gaza is likely very different from the one being performed by the people living in Tel Aviv. And both of those analyses are different from the ones being performed by Hamas. So there is no such thing as a reliable "estimate of the region in which bombs are likely to fall" independent of the particular question you want to answer.

Another example, from the original article:

"a weather forecaster is good if it rains 95 percent of the times he says there is a 95 percent chance of rain"

It's not clear whether or not there's something special about the number 95, or whether the intent is that a forecaster is good if it rains X% of the time he says there's an X% chance of rain for any X. So consider a 100 day period during which it rains 10 days, and a forecaster who every day predicts a 10% chance of rain. Is that a "good" forecast? If you're a farmer, it might be. If you're planning a picnic, not so much.

shardling · on Nov 18, 2012

Because boiling everything down to a simple model like "regions where bombs are likely" is throwing away information.

We do so because it's easier to analyze and think about a simple model than to try to deal with the whole dataset all at once -- except that if you have a specific question, it might be possible to treat that question directly. Because you're not losing information, doing so provides a more accurate result.

In practice this might not matter, and simplified models might be good enough. But as computers become more powerful, it becomes easier to query the data directly.

madhadron · on Nov 18, 2012

sigh Here we go again. Inferential statistics is unified by a field called decision theory, which is the mathematical formulation of how you choose a "good" mapping from the set of possible outcomes from your experiments to a set of possible decisions.

Bayesian and frequentist are interpretations of probability theory, nor are they the only ones (nor are all interpretations even concerned with a formalization of a notion of "chance"). They are not necessary to statistics.

loup-vaillant · on Nov 18, 2012

You mean causal decision theory?

Omega comes to you and presents two boxes. One is transparent and contains $1000. The other is opaque. Then Omega says "I give you 2 choices: either you take the two boxes, or you take only the opaque one. I have studied your brain, and have predicted your choice. If I have predicted that you will take only the opaque box, I have put $1M in it. If I have predicted you will take both boxes, I put nothing in it." Note that when Omega comes to you, the content of the opaque box is already fixed. So. What do you chose?

Decision theory is not solved yet.

Evbn · on Nov 18, 2012

http://en.m.wikipedia.org/wiki/Bayes_theorem#Bayes.27s_rule

What is the difference between Bayes Rule and Bayes Theorem?

dllthomas · on Nov 18, 2012

Also from that Wikipedia article, "The application of Bayes's theorem to update beliefs is called Bayesian inference."

While this guy says,

"Bayesian Inference {\neq} Using Bayes Theorem"

If this guy is correct, he should update the wiki...

ced · on Nov 18, 2012

Bayes' theorem and Bayes' rule are essentially the same equation. Most people will use the two interchangeably.

Bayesianism is a perspective on how to do modelling under uncertainty. It doesn't reduce to "use Bayes' theorem", even though all Bayesian inference will do that in some fashion.

dllthomas · on Nov 19, 2012

Hm, I guess the point of contention would be the "to update beliefs"?