PSYCHOMETRIKA--VOL. 51, NO. 3, 479-481 SEPTEMBER 1986 COMPUTATIONAL PSYCHOMETRICS
A N O T E O N ROY'S LARGEST R O O T WARREN F. KUHFELD THE UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL
The name "Roy's largest root" and similar names are used in practice to label two different but functionally related statistics--one proportional to an F, and the other, a squared canonical correlation. This note presents the logic that leads to the two formulations, states which statistic some popular statistical packages use, and shows the possible source of this inconsistency in the original work of Roy (1953) and Heck (1960). Key words: Roy's largest root, MANOVA, canonical correlation, multivariate test statistics.
A survey of popular statistical packages and an assortment of secondary sources has revealed inconsistencies in the definition of Roy's largest root statistic. The name "Roy's largest root," or some variation on it, is used in different sources to refer to two different statistics. This can be a source of confusion among students and researchers who are not aware of the existence of a second definition. This note will explore the nature and source of the inconsistencies in the definition of Roy's largest root. Consider a one-way MANOVA design where Y is an (n x p) matrix of dependent variables, and X is an (n x q) full rank design matrix. The common MANOVA test statistics such as Roy's largest root are functions of the matrices H, E, and T. H is the matrix of sum of squares and cross products for hypothesis. H = Y'[X(X'X)-IX ' - I ( I ' I ) - l l ' ] Y where 1 is an (n x 1) vector of ones. E is the matrix of sum of squares and cross products for error. E = Y'[l - X(X'X)-IX']Y where I is an (n x n) identity matrix. T is the matrix of sum of squares and cross products for total. T = H + E. Let 01 be defined as the largest root of H T - ' and 2, the largest root of H E - ' . It can be shown that 01 = 21/(1 + 2,) and 21 is bounded by zero and infinity while 01 is bounded by zero and one. 2, is the statistic that follows from generalizing univariate ANOVA from the analysis of a single dependent variable to the analysis of the linear combination of several dependent variables that maximizes F. The F for a linear combination of the columns of Y, say Yz for a coefficient vector z, is [(z'Hz)/dfh-I/[(z'Ez)/dfe]. Because the dfs have no effect on the maximization, F is maximized when (z'Hz)/(z'Ez) is maximized with respect to z. This occurs when z is zl, the characteristic vector associated with the largest root of the characteristic equation (H - 2E)z = 0 which has the same roots as H E - 1 . 2 , , the largest root, is proportional to the F from a univariate ANOVA of the optimum linear composite, Yzt. (21 = SSh/SSe where SSh and SS~ are sums of squares for hypothesis and error, respectively.) 0t is the statistic that follows from generalizing univariate multiple regression from the prediction of a single dependent variable to the prediction of a combination of several
Requests for reprints should be sent to Warren F. Kuhfeld, The L. L. Thurstone Psychometric Laboratory, Davie Hall 013-A, The University of North Carolina, Chapel Hill, NC 27514. The author wishes to thank Lyle Jones, Mark Appelbaum, and Elliot Cramer, for their comments on an earlier version of this note. SAS ® is a registered trademark of SAS Institute Inc, Cary, NC. SPSS ® is a registered trademark of SPSS Inc, Chicago IL. SPSS xTM is a trademark of SPSS Inc, Chicago, IL. B M D P ® is a registered trademark of B M D P Statistical Software, Los Angeles, CA. 0033- 3123/86/0900-A004500.75/0 © 1986 The Psychometric Society
479
480
PSYCHOMETRIKA
dependent variables. It equals the squared multiple correlation that would result from predicting the linear combination of the dependent variables that is maximally related to the independent variables. It can be shown that this optimal composite is the same Yz x as above. It can also be shown that 0: is the largest squared canonical correlation between the columns of Y and the columns of X. Maximizing the squared canonical correlation is equivalent to maximizing (z'Hz)/(z'Tz) with respect to z, leading to the characteristic equation (H - 0T)z -- 0 which has the same roots as H T - :. Roy's largest root has been defined as both 2~ and 0 r For example, Eaton (1983) states on page 349, "... reject Ho for large values of 2:. This is called Roy's maximum root test." Olson (1976) in Table 1 defines "Roy's largest root R" as "/:", the largest root of "H(H + E)-x", which is identical to 0: in the notation used here. Popular statistical packages do not agree on a definition either. The SAS G L M and C A N C O R R procedures (SAS Institute Inc, t985) define "ROY'S M A X I M U M R O O T C R I T E R I O N " (GLM) and "ROY'S G R E A T E S T R O O T " (CANCORR) to be 2 r SPSS ® (Hull & Nie, 1981) and SPSS x (SPSS Inc., 1983) MANOVA use 0: for "ROYS'. B M D P ® (Dixon, et al., 1983) P4V uses 0: for " M X R O O T " which the manual defines as "Roy's largest root statistic." A possible cause of the inconsistencies can be seen from Roy (1953) and Heck (1960). Roy 0953) states: "Three different types of hypotheses will be discussed here, namely, (i) the hypothesis of equality of covariance matrices of two p-variate normal populations, (ii) the hypothesis of equality of k means for each of p variates for k p-variate normal populations with the same covariance matrix (which is f o r m a l l y tied up with the general problem of testing a linear hypothesis), and (iii) the hypothesis that in a (p~ + p2)-variate normal population the set of, say, the first p: variates is uncorrelated with the set of the last P2 variates (sec. 5)." In later parts of section 5, Roy develops statistical tests for the three situations. Hypothesis (ii), MANOVA, leads to a characteristic equation with 2: as the largest root, and hypothesis (iii), canonical correlation, leads to a characteristic equation with 01 as the largest root. Heck's (1960) charts are for 01 in the canonical correlation situation, and for 0: = 21/(1 + 2:) in the MANOVA situation. Since both 01 and 21 are largest roots and are discussed by Roy, neither one can claim exclusive ownership of the title "Roy's largest root." The definition of Roy's largest root depends on the hypothesis being tested. It is the largest root of the characteristic equation that follows from the hypothesis being tested and it is transformed to 0t to enter Heck's tables. It is unfortunate that 0: and 21, two statistics as different as the univariate F and R 2, are often given the same name. It is even more unfortunate that many sources mention only one definition as if it were the only definition. It does not matter which statistic is used since each is a simple known function of the other, but to avoid possible confusion, it is important to know that there are two legitimate definitions of Roy's largest root, and it is important to know which definition was used when a Roy's largest root statistic is computed. Ambiguous or inconsistent definitions are not limited to the Roy's largest root statistic, of course. As users of statistical programs, we need to be periodically reminded that statistics that are computed by different programs or referred to by different sources will not necessarily be the same, even if they have been given the same name. References Dixon, W. D., Brown, M. B., Engleman, L., Frand, J. W., Hill, M. A., Jenrich, R. I., & Toporek, J. D. (Eds.). (t983). BMDP statistical software. Los Angeles: Universityof California Press. Eaton, M. L. (1983). Multivariate statistics, A vector space approach. New York: Wiley & Sons. Heck, D. L. (1960). Charts of some upper percentagepoints of the distribution of the largest characteristicroot. Annals of Mathematical Statistics, 31, 625-642. Hull, C. H, & Nie, N. H. (Eds.).(1981).SPSS Update 7-9. New York: McGraw Hill.
WARREN F. KUHFELD
481
Olson, C. L. (1976). On choosing a test statistic in multivariate analysis of variance. Psycholo#ical Bulletin, 83, 579-586. Roy, S. N. (1953). On a heuristic method of test construction and its use in multivariate analysis. Annals of Mathematical Statistics, 24, 220-238. SAS Institute Inc. (1985). SAS User's Guide: Statistics, Version 5 Edition. Cary, NC: Author. SPSS Inc. (1983). SPSS x User's Guide. New York: McGraw Hill.