PSYCHOMETRIKA--VOL.57, NO. 3, 451--454 SEVrEMBER1992 REVIEWS
REVIEWS Paul E. Meehl. Why Summaries of Research on Psychological Theories are often Uninterpretable. Psychological Reports, Monograph Supplement 1-V66, 1990, 195-244, 50 pp. $ 3.00. Meehl's paper can be seen as an important contribution to the classical dispute on null hypothesis testing (see Morrison & Henkel, 1970), but also as a severe criticism on contemporary psychological research. Before I comment on this paper a summary of the essentials is presented. First, Meehl confines his discussion of the uninterpretability of psychological research reviews to surveys sharing three properties: (a) theories in so-called "soft areas" (clinical, counseling, personality and social psychology; theories in these areas are always "weak"), (b) correlational data, which include data from (quasi-)experimental research with one or more covariates, and (c) positive findings consisting of refutation of the null hypothesis. Secondly, Meehl propounds and defends " a radical and disturbing methodological thesis": "Null hypothesis testing of correlational predictions from weak substantive theories in soft psychology is subject to the influence of ten obfuscating factors whose effects are usually (I) sizeable, (2) opposed, (3) variable, and (4) unknown. The net epistemic effect of these ten obfuscating influences is that the usual research literature review is well-nigh uninterpretable" (p. 197). The usual situation in testing a substantive theory by predicting some observational relationship is that there is a substantive theory T, one or more auxiliary theories A 1, A 2 . . . . (about, for instance, the validity of a method for data collection, the psyche, etc.), the ceteris paribus clause Cp (because disturbing factors must be neutralized), and certain conditions Cn. The conjunction (T, A1, A2, etc., Cp and Cn) deductively entails that if the observational statement Ol is true, the observational statement 02 empirically also appears true. The ten obfuscating factors are the following: 1. Loose derivation chain. There is a large "logical distance" between substantive theory and the counter null hypothesis in soft psychology. Very few derivation chains running from the theoretical premises to the predicted observational relation are deductively tight. Therefore, a falsified prediction cannot constitute a strict, strong, definitive falsifier of the substantive theory. 2. Problematic auxiliary theories. In soft psychology it often happens that each auxiliary theory is itself nearly as problematic as the main theory we are testing. Therefore, it is not clear what to conclude from a falsified prediction: Is the main theory or an auxiliary theory false? 3. Problematic ceteris paribus clause. Variables which are not explicitly part of the research design must not covariate with variables which are explicitly part of the design, but it does not appear possible to guarantee this. 4. Experimenter error. Exactly how much experimenter error occurs either in experimental manipulation or experimenter bias in recording observations is still in dispute, but no knowledgeable psychologist would say that it is so rare as to be of zero importance. 5. Inadequate statistical power. Despite the mathematical truth of Fisher's point, 451
452
PSYCHOMETRIKA
a null result, a failure to reach significance, is regularly counted against a theory. But most of the time statistical power is insufficiently established. Jacob Cohen's classical paper (1962) on the power function has unfortunately been neglected. 6. Crudfactor. In psychology and sociology everything correlates with everything. An unknown complex of causal factors is always operating. The so-called "method variance" is only one factor. When N gets big enough almost all of the correlates will be statistically significant and consequently meaningless. Soft theories are too weak to make precise, numerical predictions or even to set up a range of admissable values that would be counted as corroborative. 7. Pilot studies. Pilot studies on the basis of which a line of research is dropped, are most often not published. However, such a pilot study is itself a research study and is a piece of evidence against the theory. Therefore, research literature reviews are mostly distorted pictures of the real research outcomes. In addition, a pilot study used to estimate the needed number of cases in the main study to get statistical significance, provides the occasion to "construct" statistical significance when there is enough time and money. However, what is the meaning of this type of corroboration? 8. Selective bias in submitting reports. Quite a few research results have not been reported in official journals et cetera, because the null hypothesis could not be rejected, although the sample size was adequate. In this case one could trust the research results from the power function standpoint. Thus, the value of research literature reviews is distorted. 9. Selective editorial bias. Editors and referees seem to be more favorably disposed to a clear finding of refuted H0 than one that simply fails to show a trend. This inclination distorts the reviews as well. 10. Detached validation claim for psychometric instruments. Some researchers generalize the validity of a research instrument established in a particular situation or context to other situations or contexts. However, this detached validity claim is more often than not unwarranted. Therefore, null hypothesis testing does not say much about the worth of the substantive theory to be tested. Meehl discusses several suggestions which he has for improving the situation as sketched above. I mention only a few of them. Investigators should strive for more precise predictions, for higher statistical power (.9 or better) and for reporting pilot studies. Journal editors should require that statistical tables include means and standard deviations. Journals should have a section reserved for publication of negative pilot studies. Ph.D.s in psychology should be required to learn undergraduate mathematics different from cookbook statistics. One of Meehl's suggestions seems very important to me: "We should accustom ourselves and our students to the idea that there are some interesting causal theories in soft areas that cannot presently be researched" (p. 238). According to Meehl a misconception exists ubiquitous among students and professors studying soft areas: It must be possible to test scientifically meaningful theoretical conjectures at the present time. Meehl is of the opinion that this mistake is a residue of 1929 operationalism and (misunderstood) logical positivism. He refers to the histories of different natural sciences which show that scientifically meaningful and truly empirical theories existed at a given time which could not be tested at the time for some reason or another. I would like to comment upon this point. Meehl's social scientific frame of reference has an empirical analytical nature. This explains why he calls theories in so-called soft areas " w e a k " (from the standpoint of deductive reasoning), why he takes the history of the natural sciences as a model, and why he refers to philosophers of science who are mainly oriented to the natural sciences. However, even within the empirical analytical philosophy of science consider-
REVIEWS
453
able attention has been given to different kinds of rationality according to different contexts of scientific discovery. Five important contexts are: the preparation-, invention-, plausibility-, acceptability- and intelligibility-context. Each of these contexts has its own logic of justification (Nickles, 1980). And these logics of justification correspond with different forms of testing. Hence, I propose to say that scientifically meaningful theories in the so-called soft areas are to be tested differently rather than saying that they cannot be tested at all. This proposal can be sustained, first, by the literature on the philosophy (or theory) of the social sciences (psychology included), which must not be restricted to the empirical analytical branch of philosophy of science, but extended to the so-called interpretative stream, and, secondly, by the literature on qualitative methodology for the social sciences. Meehl does not mention either of them. Meehl expresses his bafflement about the deficiency of the social sciences to have the kind of cumulative growth and integration of theoretical knowledge that characterizes the history of the more successful scientific disciplines. However, he does not question whether or not and in which way this demand can be vindicated with regard to the social sciences. Some philosophers and theoreticians have the opinion that the social sciences cannot be expected to show a cumulative growth of knowledge as is the case for the natural sciences, because the nature of their objects of study (Terwee, 1990). Meehl confines his notion of a substantive theory to an explanatory theory and supposes that an explanatory theory is tested by its prediction of some observational relationship. However, there are several kinds of explanations (see e.g. Mos & Boodt, 1990, on "theoretical explanations", and Searle, 1991, on "intentional explanations): An explanatory theory does not necessarily lead to a prediction as stated. Gergen, for example, argues that the value of a psychological theory must be sought in a direction different from prediction and control; a psychological theory can be conceived as a momentary kind of looking glass that may be used to guide our future actions in a certain historically and culturally determined region (see Gergen, 1980, 1982; and Gergen & Davis, 1985). With regard to qualitative methodology 1 would like to make the following remarks about testing a theory. It is always possible and often useful to execute a final test of an intended theory as a product. However, there are various opportunities of testing a theory or conceptual system during the process of developing that theory or system. I mention only a few of them: using extreme cases (for example, to get an idea about relevant causes), looking for negative evidence (actively seeking disconfirmation of what you think is true), checking out rival explanations (to avoid, for instance, a kind of theoretical ethnocentrism), and getting feedback from informants (they can act as a panel of judges, evaluating, singly and collectively the major findings of the study; see for many other tactics and techniques, e.g., Miles & Huberman, 1984; Strauss & Corbin, 1990; and Tesch, 1990). Qualitative methods and techniques can help to get a more relevant and valid substantive theory. It is interesting to note that Lee J. Cronbach, who has written a classical article together with Paul E. Meehl on construct validity (Cronbach & Meehl, 1955), expressed the opinion that neither experimental nor correlational designs are sufficient to discover relevant variables in an educational setting. Qualitative methods are needed for this purpose, namely, sensitive observation and protocol analysis (Cronbach, 1975). To get scientifically meaningful substantive theories qualitative research may be necessary. Theories which cannot be tested finally or statistically at the moment, need not to evolve in an empirical and rational vacuum. UNIVERSITY FOR HUMANIST STUDIES UTRECHT, THE N E T H E R L A N D S
Adri Smaling
454
PSYCHOMETRIKA References
Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145-153. Cronbach, L. J. (1975). Beyond the two disciplines of scientific psychology. American Psychologist, 30, 116-127. Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302. Gergen, K. J. (1980). Towards intellectual audacity in social psychology. In R. Gilmour & S. Duck (Eds.). The development of social psychology (pp. 239-270). London: Academic Press. Gergen, K. J. (1982). Toward transformation in social psychology. New York: Springer. Gergen, K. J. & Davis, K. E. (1985). The social construction of the person. New York: Springer. Miles, M. B. & Huberman, A. M. (1984). Qualitative data analysis. A source-book of new methods. Beverly Hills: Sage. Morrison, D. E. & Henkel, R. E. (Eds.). (1970). The significance test controversy. Chicago: Aldine. Mos, L. P. & Boodt, C. P. (1990). Hermeneutics of explanation: Or, if science is theoretical why isn't psychology? In Win. J. Baker, et. al. Recent trends in theoretical psychology (Vol. II, pp. 71-84). New York: Springer. Nickles, Th. (Ed.). (1980). Scientific discovery, logic and rationality. Dordrecht/Boston: Reidel. Searle, J. R. (1991). Intentionalistic explanations in the social sciences. Philosophy of the Social Sciences, 2•(3), 332-344. Strauss, A., & J. Corbin (1990). Basics of qualitative research. Grounded theory, procedures and techniques. Newbury Park: Sage. Terwee, S. J. S. (1990). Hermeneutics in psychology and psychoanalysis. New York: Springer. Tesch, R. (1990). Qualitative research. Analysis types & software tools. New York: The Falmer Press.
PSYCHOMETRIKA~VOL. 57, NO. 3, 455--457 SEPTEMBER 1992 REVIEWS
P. Arabie, J. Douglas Carroll and Wayne S. DeSarbo. Three-Way Scaling and Clustering. Newbury Park: Sage Publications, 1987, ISBN 0-8039-3068-2, 92 pp. (Quantitative Applications in the Social Sciences #65.) Since the majority of the applications of multidimensional scaling techniques are based on the INDSCAL model proposed by Carroll and Chang (1970), a monograph on this topic in the well-known Sage University Paper Series Quantitative Applications in the Social Sciences was long overdue (as well as this review of it). Three-Way Scaling and Clustering is intended to fill this gap and complements an earlier monograph in the same series, Multidimensional Scaling, by Kruskal and Wish (1978). Three-Way Scaling and Clustering is written by three knowledgeable researchers who have each made major contributions to the field of multidimensional scaling. The book constitutes an introduction to the INDSCAL model and primarily focuses on fitting the INDSCAL model by means of the SINDSCAL program. (SINDSCAL is an implementation of the original CANDECOMP algorithm for fitting the INDSCAL model devised by Carroll and Chang.) In addition, the book treats one particular three-way clustering technique, namely INDCLUS, the three-way generalization by Carroll and Arabie (1983) of the ADCLUS model proposed by Shepard and Arabie (1979). Introducing a methodology like multidimensional scaling based on the INDSCAL model in a 90 page booklet is far from easy. Without being too technical, one likes to make sure that the reader has a thorough understanding of the model and that he or she can carry out an INDSCAL based analysis independently, and interpret the solution adequately. The authors introduce the model via an illustrative application on some well-known data gathered by Rosenberg and Kim (1975). The example convincingly demonstrates the potential benefits of three-way multidimensional scaling. At the same time, it allows the authors to point out some potential pitfalls in interpreting the results of an INDSCAL analysis. For example, the so-called unique orientation of the dimensions in the common space or object space does not hold for planes where all subjects weigh the two dimensions equally. This point is fully discussed in the next section where the INDSCAL model is treated more formally. In this section, the authors mention the problem of negative subject weights only passingly ("The problem, of course, with negative weights having a larger absolute value is that they have no substantive interpretation" p. 18) and fail to point out that in the presence of negative weights the INDSCAL model does not define distances anymore. The section on the INDSCAL model is terminated with another illustrative application using the wellknown Miller-Nicely data (Miller & Nicely, 1955). While this example constitutes a truly nice application of the INDSCAL model, it requires considerable substantive knowledge of acoustics to appreciate it fully. I would not be surprised that most readers of the Quantitative Applications in the Social Sciences series lack this knowledge. In the next section, the authors discuss the practical issues involved in carrying out an INDSCAL based analysis. There are several algorithms for fitting the INDSCAL model; none of which can be considered as a de facto standard. However, because of space limitations and the introductory nature of the book, it was impractical to discuss all of them. Instead, the authors selected a single algorithm, namely the CANDECOMP 455
456
PSYCHOMETRIKA
method developed by Carroll and Chang (1970), and a single implementation of this algorithm, namely the SINDSCAL program distributed by AT&T Bell Laboratories. The nitty-gritty of setting up the input for SINDSCAL is covered in great detail. While a choice like that is always hard to make, I would be surprised that SINDSCAL is nowadays the prevailing software that is used for fitting the INDSCAL model. To make up for this somewhat one-sided choice, the authors include a section entitled "Other three-way MDS spatial representations" in which a discussion of other methods for fitting INDSCAL is confounded with some other models for three-way multidimensional scaling. This section is far from representative and some parts, such as the discussion of TUCKALS3, are far more technical than the rest of the book. In Appendix C two other widely used methods for fitting the INDSCAL model (besides the CANDECOMP algorithm that is described in Appendix B) are discussed in some detail, namely the maximum likelihood method developed by Ramsay (1977) and implemented in MULTISCALE II, and the ALSCAL procedure devised by Takane, Young, and de Leeuw (1977) and made available through statistical packages as SAS and SPSS. An important method that is missing here is the SMACOF approach to fitting the INDSCAL model as proposed by de Leeuw (1980) and implemented by Heiser and Stoop (1986). In the last major section, an expository introduction to the INDCLUS model (Carroll & Arabie, 1983) is given. The fact that such a three-way clustering procedure should be considered as a complement to an INDSCAL based analysis is nicely illustrated on the Rosenberg-Kim data previously analyzed using INDSCAL. The mathematical programming procedure developed by Carroll and Arabie for fitting this model is briefly described. The discussion of both the model and the method are rather cursory and it can be questioned whether it was wise to include such a chapter in this volume. Given the space limitations it might have been more worthwhile to fully discuss some important aspects of INDSCAL based three-way scaling that are now hardly dealt with at all. One such aspect concerns nonmetric fitting of the INDSCAL model. On page 10, the authors state that the distinction between metric and nonmetric approaches has proved quite valuable for two-way multidimensional scaling, but that it seems empirically less so for three-way MDS. Not only do the authors not provide any references to substantiate this assertion, but they also neglect to mention that some methods for fitting the INDSCAL model like ALSCAL and SMACOF allow for ordinal data, while a procedure like MULTISCALE II enables estimation of optimal parametric monotonic transformations of the proximity data. Another issue that deserves more attention is the correct analyses of subject weights. While the authors mention that the recommendation by Schiffman, Reynolds, and Young (1981) to use directional statistics has been undermined by recent research by Jones (1983) and Hubert, Golledge, Constanza, Gale, and Halperin (1984), they fail to provide the reader with specific guidelines on how to compare the weights of different groups of subjects correctly. Summing up, it can be concluded that Three-Way Scaling and Clustering is a highly needed addition to the Quantitative Applications in the Social Sciences series. The monograph provides a highly readable introduction to the INDSCAL model and should be useful for a very broad audience. However, the reader should realize that not all aspects of three-way multidimensional scaling based on the INDSCAL model are represented equally well. UNIVERSITY OF GHENT, BELGIUM
Geert De Soete
REVIEWS
457
References Carroll, J. D., & Arable, P. 0983). INDCLUS: An individual differences generalization of the ADCLUS model and the MAPCLUS algorithm. Psychometrika, 48, 157-169. Carroll, J. D., & Chang, J. J. (1970). Analysis of individual differences in multidimensional scaling via an N-way generalization of "Eckart-Young" decomposition. Psychometrika, 35, 283-319. de Leeuw, J. (1980). Majorization algorithms for individual differences in multidimensional scaling. Paper presented at the Symposium on Multidimensional Scaling and Interindividual differences held at the XXIInd International Congress of Psychology, Leipzig. Heiser, W. J., & Stoop, I. (1986). Explicit SMACOF algorithms for individual differences scaling (Report RR-86-14). University of Leiden, Department of Data Theory. Hubert, L. J., Golledge, R. G., Constanza, C. M., Gale, N., & Halperin, W. C. (1984). Nonparametric tests for directional data. In G. Bahrenberg, M. M. Fischer, & P. Nijkamp (Eds.), Recent developments in spatial data analysis (pp. 171-189). Aldershot, England: Gower. Jones, C. L. (1983). A note on the use of directional statistics in weighted Euclidean distances multidimensional scaling models. Psychometrika, 48, 473-476. Kruskal, J. B., & Wish, M. 0978). Multidimensional scaling. Beverly Hills: Sage Publications. Miller, G. A., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society o f America, 27, 338-352. Ramsay, J. O. (1977). Maximum likelihood estimation in multidimensional scaling. Psychometrika, 42, 241266. Rosenberg, S., & Kim, M. P. (1975). The method of sorting as a data-gathering procedure in multivariate research. Multivariate Behavioral Research, 10, 489-502. Shepard, R. N., & Arable, P. (1979). Additive clustering: Representation of similarities as combinations of discrete overlapping properties. Psychological Review, 86, 87-123. Schiffman, S. S., Reynolds, M. L., & Young, F. W. (1981). Introduction to multidimensional scaling. New York: Academic Press. Takane, Y., Young, F. W., & de Leeuw, J. (1977). Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features. Psychometrika, 42, 7-67.