PSYCHOMETRIKA-VOL. 46, NO. 4. DECEMBER, 1981 REVIEWS
REVIEWS James B. Ramsey and Gerald L. Musgrave. APL-Stat : A Do-It-Yourself Guide to Computerized Statistics Using APL. Belmont, California: Lifetime Learning Publications, 1980, pp. 250. $14.95. APL is a computer .language which has a small but dedicated following. The "true believers" proclaim APL's conciseness and power; they enjoy writing one line of code to do operations which require F O R T R A N programmers to write five, ten, or even twenty lines. One aspect of APL which has captured the interest of many people working in statistics, (particularly multivariate statistics) is that APL has built-in capabilities for dealing with arrays of any practical size. Matrix addition, multiplication, and even inversion are primitive (built-in) functions; there's no need to keep track of subscripts, or to write nested loops to do these operations. Instead of a whole subroutine to compute regression weights, all you need is one line: B *-- Y [ ] X. As an APL enthusiast, I'm always glad to see new additions to the literature, particularly in the area of applications in statistics. A P L - S T A T is, as its subtitle tells us, "a do-it-yourself guide to computational statistics using APL." It presents a fairly complete treatment of many APL functions, and even has sections for those who are unfamiliar with computers. The leisurely presentation of material is appropriate for beginners, and the authors gradually build toward more complex skills. A frequent criticism of APL is that it instills a tendency to be too concise, so that no one but the programmer (and not even he or she after a period of time) can understand what the line of code does. Ramsey and Musgrave do a very good job of starting with a section of code written one step at a time, and then show how to combine the separate lines into tighter groups which retain their readability. This procedure should help novices avoid panic when seeing long lines of code and will aid them in developing the skills necessary to decipher them. The coverage of APL functions and their use in statistical contexts omits some of the available APL primitives. Some of the omitted functions which I have found useful are signum, membership, execute, dyadic format, and the circular functions (sine, cosine, etc.). Many statistical techniques covered in beginning courses are treated, though some incompletely or incorrectly. For example, the two-way analysis of variance doesn't include a computation for the interaction, and the discussion of contingency tables talks about expected values without specifying the model which generates them. A few multivariate techniques are discussed, such as multiple regression and weighted regression, as well as simultaneous equation models, though the discussion is much more compact than for the simpler techniques. The book's merits--leisurely pace, a number of examples, and statistical application-are balanced by several flaws. There are a number of instances of bad programming. I am not just talking about those errors which were deliberately put in--and well noted--as examples of real program development. I am referring to practices which can easily lead to mistakes in writing programs, or can make programs more difficult to use, modify, or understand. For instance, when discussing line labels the authors use the label SIX for line number six; this obscures the reason for labels: to avoid linking lines with their line number. While the label SIX is perfectly valid, and will work correctly no matter where it is 473
474
PSYCHOMETRIKA
placed, it is very misleading for those unfamiliar with APL. Another misleading practice is the use of the catenate function (comma) as a separator between numbers entered into the computer. While its correct use is explained later in the book, and APL statements with commas as separators will usually work as intended, it is another example of a potentially confusing practice. Undergraduates or beginning graduate students who have little prior experience with computers are the best candidates for this book. The authors are obviously well aware of the need to explain in detail to beginners, and to build up slowly; their teaching style is excellent. But students should be warned that they may be learning some bad habits which will later have to be changed. Readers with more sophistication would probably be better offwith either Polivka and Pakin's APL" The language and its usage, which is the "bible" of APL, or Smillie's APL\360 with statistical examples, both of which were missing from the reference list in A P L - S T A T. CITY UNIVERSITYOF NEW YORK
David Rindskopf
REFERENCES Polivka, RaymondP., & Pakin, Sandra, APL : The language and its usage. EnglewoodCliffs,NJ: Prentice-Hall, 1975. SmiUie,Keith W. APL\360 with statistical examples. Reading,MA: Addison-Wesley,1974.
PSYCFIOMETRIKA--VOL.46, NO.4. DECEMBER, 1981 REVIEWS
Paul F. Velleman and David C. Hoaglin. Applications, Basics, and Computin9 of Exploratory Data Analysis. Boston, Mass.: Duxbury Press, 1981. pp. 354 + xxi. $10.95 The physical arrangement of the ABCs of EDA is a sequence of nine chapters, which summarize nine of the most useful techniques of Tukey's [1977] exploratory data analysis (EDA). The topics begin with the essentials: stem-and-leaf and letter-value displays, boxplots, and x - y plotting. Additional topics included are resistant techniques of !ine-fitting and smoothing, coded tables, two-way analysis by median-polish, and rootograms (suspended and otherwise). Each chapter contains a description of its topical technique with several examples, followed by listings of BASIC and FORTRAN computer programs which perform the analyses. Versions of the FORTRAN routines have been included in the most recent release of MINITAB [Ryan, Joiner, & Ryan, 1981] and ABCs of EDA includes as an appendix a users' guide to this new section of that package. The authors state that the book has two audiences: one audience consists of students of exploratory data analysis and researchers intending to use EDA methods, while the other audience consists of programmers intending to implement the programs included in the book or others like them. For the benefit of the student-audience (and their instructors) the authors provide in the introduction to ABCs of EDA a scheme for integrating the topics included in this book with the topics included in standard introductory statistics courses. And for the benefit of the programmer-audience, the authors provide a route, or "thread," through the book in a different order than that in which the chapters are bound. One might go further, however, than to say that ABCs of EDA has two audiences; one might say that it is 2.1 booklets bound together. The 0.1 booklet is the MINITAB minimanual, which is extremely useful if you have access to MINITAB. One of the two booklets is a well-condensed exposition of the essentials of EDA, and could be quite useful as a supplementary text in introductory-level courses in data analysis. The other booklet consists of the computer programs. The computer programs (i) do graphics (mostly) (ii) in BASIC or FORTRAN (for portability) (iii) on standard terminals (for economy and widespread use). That triple combination makes the computer programs very useful. But it also means that the programs use every trick in the book, and a few tricks that are only in this book; so, while they can be copied by anyone, they can probably only be read by relatively advanced programmers. Relatively advanced programmers are probably sufficiently sophisticated to read the techniques of EDA in Tukey's 1-1977] heavier volume, and then implement the programs from this book. Beginning students should probably be prevented from looking at the programs (lest they be intimidated by their incomprehensibility) while they read the chapters in between. ABCs of EDA is 2.1 booklets because the audiences for the two booklets are essentially mutually exclusive. The booklet of ABC of EDA which is a supplementary text for introductory courses in data analysis provides coverage of exactly the right topics. It is, however, somewhat variable in its clarity and comprehensibility for the student in a first course. The first chapter, on the stem-and-leaf, provides one of the clearest descriptions of this technique available, including (apparently) hand-made displays to show the student what they're supposed to look like. That chapter is a model of clarity. It assumes nothing. However, in the second chapter, the text moves quickly into an algebraic notation for "depth" which many students, unused to functional notations for arbitrary operations like counting and ordering, will find confusing. In the following chapter, the explanation of boxplots is very clear. But it 475
476
PSYCHOMETRIKA
is followed by an explanation of notched boxplots which does not show any boxplots with notches! The only examples are those produced by the computer programs, which use special characters [( ) or ( )] to show the location of the notches; but they are not notch-like, and if the readers have not seen the notched boxplots in McGill, Tukey, and Larsen [1978], they will have no idea what the word "notch" has to do with anything. Few students will have read the original paper. The resistant line-fitting chapter is very clear. But the smoothing chapter which follows is particularly terse. Overall, I recommend this booklet as a supplementary text for data analysis courses; it provides the students with a reasonably inexpensive, accurate summary of techniques of EDA which are usefully included in an introductory course. And possession of a printed source provides the students with a (frequently needed) source of security. But instructors would do well to be selective in their use of the text, and have the students avoid the programs. The programs, while they make terrible reading, are excellent tools for data analysis. With the programs from this book included, MINITAB is probably the best currently available interactive data analysis package for small sets of data. If you don't have these programs, get them. Copy them; but don't read them. David Thissen
UNIVERSITY OF KANSAS REFERENCES
McGill, R., Tukey, J~W., & Larsen, W. A. Variations of box plots. The American Statistician, 1978,32, 12-16. Ryan, T. A., Joiner, B. L., & Ryan, B. F. Minitab reference manual. University Park, Penn.: Minitab Project, The Pennsylvania State University, 1981. Tukey, J. W. Exploratory data analysis. Reading, Mass.: Addison-Wesley, 1977.