Stat Comput (2009) 19: 111–112 DOI 10.1007/s11222-008-9069-8
BOOK REVIEW
Jim Albert: Bayesian computation with R Springer, 2007. ISBN: 978-0-387-71384-7 Nicolas Chopin
Received: 29 February 2008 / Accepted: 16 April 2008 / Published online: 3 May 2008 © Springer Science+Business Media, LLC 2008
According to the back cover, this new book by Jim Albert ‘introduces Bayesian modeling by the use of computation using the R language’. It is the latest addition to the new Springer series ‘Use R!’, and is close in spirit to ‘Bayesian data analysis’, by Gelman et al. (2003) and, to a lesser extent, to ‘the Bayesian core’, by Marin and Robert (2007). However, Albert’s book distinguishes itself from these or similar references by reducing mathematical sophistication to a particularly low level. This simplicity seems welcome for audiences such as undergraduate students, but it should come with a few ‘caveat lector’, as discussed below. Chapter 1 is a brief presentation of the R language. Albert covers quickly R basics, using a ‘home-made’ dataset collected from his students (which seems to be a nice way to make students more involved). He also introduces the LearnBayes R package he especially designed for the book, and which is available from the CRAN website. This chapter is quite short and covers only about ten commands in R, but they are already many good books and on-line tutorials on R, and learning a programming language is best done through personal experimentation, using the examples of the other chapters for instance. Chapter 2 introduces ‘Bayesian thinking’, starting expectedly with the binomial model with unknown probability p. Surprisingly, Albert introduces the posterior density as the product ‘prior times likelihood’ without further justification or explanation. Bayes’ theorem is not discussed, not to mention the concept of conditional probability. Similarly, the author introduces later in this chapter the predictive N. Chopin () ENSAE & CREST-LS, Timbre J120, 13, Avenue Pierre Larousse, 92245 Malakoff Cedex, France e-mail:
[email protected]
distribution of the next observation by just providing its expression, and refers to ‘flat’, ‘noninformative’ or even ‘improper’ priors throughout the text without explaining what these terms mean, nor that some of these densities do not necessarily integrate to one. It seems to me that Bayes’ theorem, that is, the ability to transform prior probability into posterior probability using the available information, is the main concept to be discussed in any Bayesian textbook, and that it can be done in a quite non-technical, intuitive way. Similarly, the choice of a particular prior is something that can and should be addressed. What may explain these omissions is that the book is not intended to be self-contained; for instance, the Preface mentions as a possible use to serve as a companion book to another introductory Bayesian text. Yet, going back to the quote that starts this review, it is fair to say that this book does not introduce, but rather illustrates Bayesian modeling through simple examples coded in R. Chapters 3 and 4 cover respectively single- and multiparameter models that are simple enough to be tackled using pre-MCMC methods, such as contour plots or exact simulation. Some of them are nice illustrations of simple phenomena, such as the greater robustness of a Student prior with respect to unusual observations, relative to the Gaussian distributions. Examples based on Baseball or American football may be difficult to grasp for non-American audiences, but they are few. The example on comparing two proportions is a bit odd, as the result depends on a hyper-parameter the calibration of which is not discussed. As in the previous chapter (or later in the book), the choice of a particular prior distribution is rarely discussed and thus may look arbitrary to readers discovering Bayesian analysis. Chapter 5 introduces pre-MCMC computational tools, such as standard Monte Carlo, the Laplace transform, rejection sampling, importance sampling and importance resampling. All the methods are well explained and illus-
112
trated in very simple terms, yet again simplicity goes too far sometimes: standard pitfalls such as multimodality (for the Laplace transform) or the possibility of having infinite variance (for the importance sampling weights) are not mentioned, nor is the fact that resampling increases the variance of estimates. Chapter 6 covers MCMC. It starts with a gentle description of discrete Markov chains, then describes the standard flavours of Hastings-Metropolis; generic versions are implemented in The LearnBayes package. Gibbs sampling is introduced in less than one page, but is covered in more details in chapter 10. The chapter concludes with standard graphical diagnostics, and lengthy examples. In particular, the Cauchy example contains a pedagogical comparison between Laplace approximation, direct sampling, and various MCMC algorithms. Chapter 7 addresses hierarchical Bayes modelling, insisting on the borrowed strength derived from hierarchical constructs. Model checking is handled with posterior predictive quantities, resulting in an hybrid kind of Bayesian p-value. The main illustration is a Poisson/Gamma model for heart transplant data. In Chapter 8, concepts of Bayesian model comparison are discussed, including the Bayes factor. In the same spirit as the rest of the book, justifications are quite limited. The connection between the posterior probability of a one-sided hypothesis and the p-value in the spirit of Casella and Berger (1993) is mentioned, although in a way that may be more confusing than helpful. Unfortunately, JeffreysLindley paradox, namely the lack of stability of the Bayes factor when the prior variance goes to infinity, is not mentioned. Chapter 9 focuses on linear regression, using Jeffreys’ prior on the pair (β, σ ). A non-Gaussian linear regression is later used as an example. Chapter 10 returns to Gibbs sampling. Again, since conditional distributions and conjugacy are not discussed anywhere in the book, readers are not expected to be able to derive full conditional distributions; instead they are provided for each example. One of these examples is based on Albert and Chib (1993)’s famous paper on logistic regression.
Stat Comput (2009) 19: 111–112
Chapter 11 explains how to use WinBugs within R, and covers a few models, in a spirit similar to the WinBugs example list available on the web. A section is devoted to the boa package for MCMC convergence diagnostic. All chapters conclude with about five exercises. These exercises almost always start with a practical problem, e.g. Bob claims he has ESP. To test his claim. . . , sometimes drawn from the applied statistics literature. The questions allows the students to discover the proper implementation step by step, and, in my view, provides the right amount of hand holding for the intended audience. To conclude, I recommend this book to anybody teaching Bayesian Statistics at an undergraduate level. It covers interesting examples, datasets and exercises for setting up computer classes or similar activities (which is obviously a very good thing to do). On the other hand, I would not be entirely comfortable recommending this book to students for self-study, because it overlooks too many basic points. Obviously, a work around is to introduce formally Bayesian statistics first, before having them read the book. Yet the inconsistency between both presentations may be difficult to handle. Same remark applies to the situation where one would recommend two textbooks, this one and a more formal one, not to mention the added cost. (The book costs 45$ on Amazon US and 38 Euros on Amazon France.) As said above, my experience is that formal concepts of Bayesian Statistics are relatively easy to teach to second or third-year undergraduates, even if their background in probability is limited. But, of course, an individual’s teaching experience is always a biased, small-size sample.
References Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88, 669–679 (1993) Casella, G., Berger, J.: Reconciling Bayesian and frequentist evidence in the one-sided testing problem. J. Am. Stat. Assoc. 82, 106–111 (1993) Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.: Bayesian Data Analysis. Chapman & Hall, London (2003) Marin, J.M., Robert, C.P.: Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer, New York (2007)