PSYCHOMETRIKA--VOL. 3, NO. 2 JUNE, 1938
EXTENSIONS
OF FACTORIAL
HARRY
SOLUTIONS
H. HARMAN
The University of Chicago A method is developed for extending any type of factor solution t~ new tests. The theoretical basis for this approximating scheme is thoroughly investigated, and then a simplification in the technique is introduced for practical purposes. An example is presented which illustrates the procedure of extending a factor solution to three new tests simultaneously. 1. I n t r o d u c t i o n The writer was asked by a psychologist who was interpreting a f a c t o r solution w h e t h e r it w a s possible to o b t a i n t h e c o r r e l a t i o n of a t e s t t h a t w a s not p r e v i o u s l y in the t e s t b a t t e D ' , w i t h one o f t h e f a c t o r s . I t h o u g h t t h e p r o b l e m h a d p r a c t i c a l v a l u e a n d sufficient s t a t i s t i cal i n t e r e s t so I u n d e r t o o k to solve it. W h i l e p u t t i n g the final t o u c h e s on m y m e t h o d o f solution, m y j u d g m e n t as to the d e s i r a b i l i t y of a n answer to this problem was corroborated. A similar article by Paul S. D w y e r a p p e a r e d in t h i s j o u r n a l . * T h e m e t h o d p r e s e n t e d in the p r e s e n t a r t i c l e is q u i t e d i f f e r e n t f r o m D w y e r ' s m e t h o d , a l t h o u g h e a c h h a s its b a s i s in t h e l e a s t s q u a r e t h e o r y . B e f o r e p r o c e e d i n g to a n a n a l y s i s of t h e p r o b l e m , it m i g h t be well to n o t e w h e n such a p r o b l e m m a y arise. I t m a y h a p p e n t h a t a t t h e end o f a t e s t i n g p r o g r a m in w h i c h a f a c t o r i a l solution w a s o b t a i n e d , a n e w t e s t is g i v e n to t h e s a m e s u b j e c t s . T h e n the f a c t o r solution c a n be e x t e n d e d to include t h i s t e s t b y t h e m e t h o d to be described. All t h a t need be d o n e is to o b t a i n the i n t e r c o r r e l a t i o n s of t h i s n e w t e s t w i t h t h e old tests, b u t no n e w f a c t o r a n a l y s i s need be p e r f o r m e d . A n o t h e r o c c u r r e n c e of t h i s p r o b l e m will be in the a n a l y s i s of a b a t t e r y o f t e s t s i n v o l v i n g s o m e g e n e r a l intelligence tests. G e n e r a l intelligence t e s t s c o m p l i c a t e a f a c t o r i a l solution in a g e o m e t r i c or a l g e b r a i c sense. T h e c o m p l e x i t y of such a t e s t is equal to the n u m b e r o f f a c t o r s , while "in a s i m p l e s t r u c t u r e e v e r y t r a i t is of c o m p l e x i t y less t h a n r [ n u m b e r of f a c t o r s ] . " t Such t e s t s o f g r e a t c o m p l e x i t y can be o m i t t e d f r o m t h e f a c t o r a n a l y s i s a n d t h e n t h e f a c t o r solution c a n be ex-
* Dwyer, P. S., "The Determination of the Factor Loadings of a Given Test from the Known Factor Loadings of Other Tests," Psych~met~ka, 1937, 2, PP. 173-178. Thurstone, L. L., The Vectors of Mind. Chicago: The University of Chicago Press, 1935, p. 155. --75--
76
PSYCHOMETRIKA
tended to include all the tests. The descriptions of the general intelligence tests may then be in terms of every factor. The technique of extending a factorial solution which will be described is not limited to any particular type of factor analysis. Assume t h a t a factor analysis has been made of the n - - 1 tests 1, 2, . . . , n - - 1 and t h a t then the correlation of test n with a factor F is desired. The necessary statistics are the intercorrelations of the n tests 1, 2, ... , n---1,n and the correlations of the original n---1 tests with F. The determinant of intercorrelations between tests, augmented in the first row and column by the correlations of the tests with the f a c t o r F, is written as follows: I 1 ~lP
A(x)
rrx
rr~ .--rr,n-~
x
1
raz "'" ra,n-1
~'an
.r:,. r21. . 1.. ":'r2:n-: .r:~
=
i
rn_l, F r n _ l , x r.1
1 rn_l,
r. ~
2 • - •
1 rn_a,n
.... r~n_~
1
where x is the desired r r , and all other correlations are known. Whence A is a quadratic function in x. We propose to solve this problem under the condition t h a t the value of x shall make the multiple correlation coefficient RF(I:.... ) a minimum.* In the following section we shall present the t h e o r y basic to this assumption which yields the desired solution. Then we shall modify the procedure to obtain a practical solution, and finally, in the last section, we shall present an example of actual d a t a illust r a t i n g the method.
2. Theoretical Trea.tment In order to simplify the following work, we make one definition. Let A~j (i, j = F, 1, 2, ..- , n) be the cofactor of r~j in A. Thus, Are is the cofactor of the element in the first row and column of A and Arm is the cofactor of x = rr~. The multiple correlation coefficient, in terms of determinants, is given by the following f o r m u l a t * A side condition o f some s o r t is r e q u i r e d to solve t h e problem. D w y e r (.op. vit.} a s s u m e s t h a t t h e o b s e r v e d c o r r e l a t i o n s of t h e n e w t e s t w i t h t h e o r i g i n a l t e s t s a r e equal to t h e r e s p e c t i v e c o r r e l a t i o n s o b t a i n e d f r o m t h e f a c t o r p a t t e r n , w i t h o u t a n y residuals. I t is t h u s seen t h a t h i s t e c h n i q u e differs f r o m t h e one in this paper from original assumptions. F o r a discussion of t h e solution of p a r t i a l a n d m u l t i p l e c o r r e l a t i o n s b y m e a n s of d e t e r m i n a n t s see Holzinger, K. J., Statistical Methods for Students zn Ed~cation. B o s t o n : G i n n a n d C o m p a n y , 1928, pp. 312-313.
77
HARRY H. H.a~RMAN 1
(1)
Rr(12..... ~
v'A~x \/A,.~.--
A.
This R is a function of x since it involves 3 which is a function of :¢'. Now the problem is to find a value of x which will minimize R. Applying the methods of m a x i m a and minima of functions,* we first differentiate R with respect to x, set this derivative equal to zero and solve f o r x, and t h e n p u t this value of x into the second derivative of R to see if it is positive or negative. Differentiating R with respect to x, we have dA dR
(2)
dx
dx
2\/Ary \ / ~ 4 F r - A -
Now the derivative of A with respect to x is given byt iO rv~
dA
r~.~
. . . rr,,_~
x
0 1 rio ...r~,~_~ r,~ _ '"0 ~ r2~ . . .1 - . . . . . . r~ i,~_l . . . . r,~
11 r , ~ '
r,2'
~ 1
rF,., ".. r r ,,-~ 1
rr,
i ray
1
r,~
. . . r~.,,_, 0
-}- ~ r.2~.
r21
1
. . . ro ,._, O
r,,.~
. . . r,, n_, O
.-. r .... ~ 1"
x
r,,
which, on e x p a n d i n g the determinants, reduces to (3)
dA
d x -- A , v -}- AF~ ---- 2 A t , dA
since A is symmetrical. Substituting the value of ~
f r o m (3) into
(2), we havo dR
--AF,
T h e derivative of R with respect to x is thus seen to vanish when (5)
AF,, = O,
which is a linear equation in x and thus has only one root. * A g e n e r a l discussion of the m a x i m a a n d m i n i m a of f u n c t i o n s of a single i n d e p e n d e n t v a r i a b l e is p r e s e n t e d in Granville, W. A., S m i t h , P. F., a n d Longley, W. R., Elecr~en~s of the Differential and Integral Calculus. B o s t o n : G i n n a n d C o m p a n y , 1934, pp. 182-184. t T h e d e r i v a t i v e of a d e t e r m i n a n t of t h e n t h o r d e r is t h e s u m of t h e n d e t e r m i n a n t s o b t a i n e d b y d i f f e r e n t i a t i n g each column, in t u r n , a n d l e a v i n g all t h e r e m a i n i n g columns u n c h a n g e d .
78
PSYCHOMETRIKA We may set
(6)
Arn = - - a x + b ,
where a is the complementary minor obtained by deleting the first and last rows and the first and last columns from A, and hence is positive. The coefficient of x is always negative as can readily be seen. I f the order of A is odd then the sign attached to Ap, is plus, the order of A~, is even, and the sigal attached to the m i n o r of x in A ~ is minus, so t h a t the sign attached to a is minus. If the order of A is even then the sign attached to AF, is minus, the order of AF~ is odd, and the sign attached to the minor of x in Ap~ is plus, so t h a t again Arn ----- - - a x +b. If we put the value of AF~ f r o m (6) into (4), we finally have dR dx
(7)
ax ~ b V-A~ ~AF~ -- A
Differentiating this expression, we obtain #R
(8)
2a(A~r ~
dA ( - - a x + b) d x
A) ~
2 \/~_~rF\ / _ ( 7 4 ~ )
dx ~ =
~
Now we have to inspect the second derivative to see whether it is positive or negative f o r x ---- b/a, which value makes the first derivative vanish. I t is sufficient for this purpose to merely look at the numerator in (8). F o r this value of x the second t e r m is zero. I t has been remarked in the preceding p a r a g r a p h t h a t a is positive. F u r t h e r more, (A~r - - A) > 0 or else the multiple correlation coefficient would be i m a g i n a r y as is evident from formula (1).* Hence the second derivative is positive and R is a m i n i m u m for x ---- b/a. A multiple correlation coefficient is increased on the addition of a test whether this new test correlates positively or negatively with the preceding tests. Hence to ask f o r the multiple correlation coefficient to be a m i n i m u m on the addition of a new test is the same as to ask for the multiple correlation to remain the same with the new test included as it was without this test. This is actually, the requirem e n t we shall make in a practical problem. Instead of inquiring directly for the m i n i m u m value of the multiple correlation coefficient we shall set the multiple correlation of the f a c t o r with respect to all the tests equal to the multiple correlation coefficient of the f a c t o r w i t h * I f A m, - mun'l.
A ~
0 then
Rr(~2 ....)
~
0 and
there
is n o m a x i m u m
or mini-
HARRY
79
H. HARMAN
respect to the tests in the original analysis, i.e., with the new test e:.:cluded.
3. Example To illustrate the technique of extending a factor solution, we consider an example of 355 cases who have taken forty-one tests and three group intelligence tests, namely Otis (advanced), KuhtmannAnderson, and Morgan, which we shall designate by O, K, and M, respectively.* The intercorrelations of all tests are known and the factor pattern (or structure, since the factors are uncorrelated) of all but the intelligence tests is also known.~ Then the problem is to find the factor weights, or correlations with the factors, of the three intelligence tests. The extension of the original factor pattern to all three tests 0, K, and M can be done simultaneously. Since the calculation of regression weights is extremely laborious for forty-one variables, we first reduce the number of variables without seriously lowering the multiple correlation coefficients. This reduction in the number of variables is effected by adding the tests that best measure each factor and dividing this sum by the standard deviation of the sum to get new standardized composite tests. In this example the groups of tests (1, 6, 20, 26, 27, 28, 29, 34, 35, 36a, 74, 75), (2, 3-4, 8), (11, 13, 17, 18, 22, 25b, 77, 78, 80, 81, 82, 94), (53, 54, 55, 60), and (61, 62, 67, 68) were reduced to the composite tests TABLE I I n t e r c o r r e l a t i o n s of T e s t s E
v
_iT
A E V T D
.570 .562 .593 .328
.697 .579 .619
.611 ,577
.421
K M
.591 .507
.672 .722
.8~.839 .824
.585 .601 .617
I
D
t
.614 .596 .655
* T h e d a t a of t h i s e x a m p l e a r e t a k e n f r o m Preliminary Report on Spea~manHolzinger Unitary Trait Study, No. 9. P r e p a r e d a t the S t a t i s t i c a l L a b o r a t o r y , D e p a r t m e n t of E d u c a t i o n , The' U n i v e r s i t y of Chicago, 1936, T a b l e s 3 a n d 8. The f a c t o r w e i g h t s o f the intelligence t e s t s as g i v e n in Table 8 were n o t o b t a i n e d d i r e c t l y b y b i - f a c t o r a n a l y s i s ( t h e p r o c e d u r e is described on p. 3 of R e p o r t 9). t B y a f a c t o r p a t t e r n is m e a n t the set of l i n e a r e q u a t i o n s e x p r e s s i n g t h e t e s t s in t e r m s of t h e f a c t o r s , a n d b y a f a c t o r s t r u c t u r e is m e a n t t h e t a b l e of corr e l a t i o n s of t h e t e s t s w i t h the facors. T h e elexnents of a f a c t o r s t r u c t u r e a r e identical w i t h t h e coefficients in t h e f a c t o r p a t t e r n if, a n d only if, the f a c t o r s a r e o r t h o g o n a l or u n c o r r e l a t e d .
~0
PSYCHOMETRIKA
A, E, V, T, and D, respectively.* The remaining six tests were not employed since they did not measure any group factor significantly. The intercorrelations of the five composite tests together with their correlations with the three intelligence tests are presented in Table I. The reduced factor p a t t e r n is given in Table II, w h e r e the general, mental speed, motor speed, spatial, verbality, attention, and mechanical ability factors are represented b y u, a, m, e, v, t, and d, respectively. TABLE I I Reduced F a c t o r P a t t e r n Test
u
A E V T D
.678 .845 .835 .766 .599
Tft
.625
e
.160 .381 .484 .411 .536
The regression weights will be computed b y the Doolittle method,t which will be modified slightly so that these weights m a y be obtained first without the intelligence tests and then with them. In Table III the preliminary w o r k in getting the regression weights is outlined. Slight discrepancies m a y be found in Table III due to the fact that all the w o r k w a s carried to f o u r decimal places and then rounded to three for printing purposes. The x's appearing in the lower half of Table III stand for the particular correlation of the intelligence test and the factor of the block in which they are located. F o r example, the very last x in the table stands for rzd. F r o m the upper half of Table III, we can w r i t e the regression coefficients for estimating each of the factors in terms of h, E, V, T, and D. F o r the factor u these are as foltows: fl~D.ABVT= .021 fl~r.~V, = .269 (9)
fl,,V.AErD = .330 fl~E.~vro = .386 fl.,,A.E,rO = .106. * F o r a d e s c r i p t i o n of t h e s e t e s t s see Preliminary Repart on Spearman-Hol-
zin[;er Unitary Trait Study, No. 1.
~" F o r a discussion of t h e Doolittle m e t h o d see Holzinger, K. J., S w l n e f o r d , F., a n d H a r m a n , H. H., Student Manual of F a c t o r Analysis ( P l a n o g r a p h e d ) . P r e p a r e d a t the S t a t i s t i c a l L a b o r a t o r y , D e p a r t m e n t of E d u c a t i o n , T h e U n i v e r s i t y of Chicago, 1937, pp. 32-36.
HARRY H. HARMAN
81
TABLE I I I Work-sheet for Computing Regression Coefficients d
A q
1.
.£~ .59-~
--1.
.7a31 .672 -~14t-.337
.335 _.6.41911-.497 .891~ .839 -.309]-.332 -.234l-.187 .3481 .320 -.734l-.673
.5851 .601 -.326]-.351 -.1501-.120 -.1051-.096 .004 / .034 :.008/-.066 .614 / .596 -.181/-.194 -.277 -.268~-.215 -.048 - . l l l I - . l g 2 -.001 i-.ooo}-.oo2 .566 .0541 .083 --1. -.096i-.147
.536
.536 -.947
E~tensions to the Intelligence Tests O L -.303 -.260 -.256 -.000 -.005 .176 --1.
a
m
I TOe
-.344 .221 .112 .002 -.008 x-.017 .095 -5.669x
-.088 .057 .029 .000 -.002 x-.004 .024 -5.669x
TKa -.370 .177 .103 .013 -.012 x-.089 .349 -3.931x
TKm -.095 .045 .026 .003 -.003 x-.023 .089 -3.931x
TAra -.317 .229 .096 .028 -.016 x+.019 -.075 -4.067x
TMm -.081 .059 .024 .007 -.004 x+.005 -.020 -4.067x
-.236 .156 .001 .016 :-.063 .360 ;.669x TKe
-.~ .143 .005 .025 x-.016 .064 -3.931x TMe
"fOr
~'Ot
~Od
-.355 .001 .014 ;-.340 1.927 -5.669:
-.003 .002 x-.001 .009 -5.669r
-.051 x-.051 .290 -5.669x
TKv
TKt
q'Kd
::::i -.326 .010 .022 c-.295 1.158 -3.931: qaMv
.133 -.304 .010 .020 .004 .0,29 x-.067 p~-.254 .272 1.03'5 -4.067x -4.067x
_.5~ .003 x-.024 .095 -3.931x TMt
-.~ .004 x-.053 .216 -4.067x
::~:: -.079 x-.079 .310 -3.931x TMd
:21: -.107 x-.107 .4.25 -4.067x
82
PSYCHOMETRIKA
The fl weights for the factors a, m, e, v, t, and d are similarly obtained. The multiple correlation coefficient for estimating a factor k from A, E, V, T, and D is given by D
R2k~EvT~) -= ~ fl~jrj~
(10
( k -= u, a, m, e, v, t, d, ),
j=A
where fl~A.zrr, has been w r i t t e n fl~A,and similarly for the other letters E, V, T, D. The squares of the multiple correlation coefficients thus obtained are as follows:
(11)
Rfu(AEVTfl = I~2a(AEVTD = Rfm(AEVTD = Rfe(AEVTD RfV(AEVTD = R2t(AEVTD = Rfd(AEVTD
.8925, .7161, .0471, .3719, .5732, .3269, .5075.
I t m a y be r e m a r k e d t h a t if the original factor analysis had been carried to completion, i.e., including the appraisals of the factors, then the work of the upper half of Table III would practically be done,* and the correlations in (11) would be known. The work of extending the factorial solution to the intelligence tests would then begin at this point. F r o m the lower half of Table III, we obtain the regression coefficients for estimating each of the factors in terms of six tests A, E, V, T, D, and O, K, or M. F o r each of the seven factors t h e r e are six beta coefficients in the extension to O, K, or M. Thus to get the seven factor weights for each intelligence test, 126 beta weights must be computed. To save space, we shall c a r r y t h r o u g h the complete deo tails only in getting the correlation of the test 0 with the u factor. The regression coefficients for estimating the factor u from A, E, V; T, D, and O are as follows:
(12)
fl,,o.aE,r~ = --4.5680 ~- 5.6689x , fl,,AEVrO = . 4 5 8 1 q .5419X, fl~'~.A~V,O = .2856 q .0204X, ~,,,'.~T~O = 3.5376~3.9811X, fl~,E.AVr,O ---- 1 . 1 4 8 3 - - .9457X, fi~4.~vr~o = . 2 2 8 8 ~ .1526v,
where x = ro~. Applying formula (10) with k = u and j r a n g i n g over A, E, V, T, D, O, we have * Except for the three columns headed O, K, and M.
HARRY H . H A R M A N
83
R%,~rr~o~ = 4.5732 - - 9.1355x + 5.6689x ~ . Setting
R2,,AEVTVO~equal to R2,,A~rr~,, we have 5.6689x 2 ~ 9.1355x + 4.5732 = .8925,
or
(13)
x 2 ~ 1.6115x + .6493 = 0 .
T h e d i s c r i m i n a n t of equation (13) is zero, as would be expected f r o m the discussion in Section 2, so t h a t the two roots of this equation coincide. Solving equation (13) we obtain
to,, = x = .806. AS a check on the preceding work, we note t h a t the value o f x which makes R,,(a~vr~o) a m i n i m u m m u s t reduce ~uO,AEVTDto zero and t h e r e m a i n i n g fl values in (12) to those in ( 9 ) . This is true, within e r r o r s of rounding, as m a y readily be verified. On the o t h e r hand, we could set fl~o~rvr, equal to zero and obtain the required correlation x. Thus the actual computation of the multiple correlations would be eliminated unless the complete method o f equating the multiple correlations (with and without the new test) and solving a q u a d r a t i c equation in x w e r e desired as a check. Now setting fl~O.AEVr~ = ~ 4 . 5 6 8 0 ,+ 5.6689x = 0 is equivalent to setting t h e negative o r any multiple of this /? equal to zero. Hence f r o m Table III, we m a y set x ~ .806 = 0 and directly obtain x = .806. Thus the desired correlations m a y be read off directly f r o m Table III. I n a similar m a n n e r we obtain the six r e m a i n i n g f a c t o r toadings f o r test 0 , and also the f a c t o r Ioadings f o r tests K and M. These new a p p r o x i m a t e d f a c t o r weights a r e presented in Table IV. TA~I~ I V Extended Factor Pattern Test
u
O K M
.806 .773 .784
a
m
-7
1
--.019
--.005
.089
.023
e
1
1.016 .067
v
.-TG .29
.254
t
I
/.024 .053
d
I .o51 1.079
.107
By the method outlined in this section, we can insert a test into a n y factorial solution (orthogonal or oblique f a c t o r s ) and obtain t h e s t r u c t u r e weights, o r correlations of the t e s t w i t h the factors. These weights are also the coefficients in the f a c t o r p a t t e r n in t h e
84
PSYCHOMETRIKA
case of uncorrelated factors. In case the factors are correlated, it is also possible to obtain the linear equation expressing the new test in terms of the oblique factors if desired.* One advantage of the method of approximating the correlation of a test with a f a c t o r as presented in this paper is t h a t the correlation obtained is never greater than the correlation t h a t would have arisen from the complete f a c t o r analysis including this test. This method of obtaining the desired correlation m a y then be considered as a conservative one.t The error in the correlation, which is bound to appear in any approximate scheme, is at least fixed in sign or direction. * See Holzinger, K. J., a n d H,arman, H. H., " R e l a t i o n s h i p s b e t w e e n F a c t o r s O b t a i n e d f r o m C e r t a i n A n a l y s e s . " The Journal of Educational Psychology, Vol. X X V I I I , 1937, pp. 339-341. t To i l l u s t r a t e h o w t h e t e c h n i q u e developed in t h i s p a p e r differs f r o m D w y e r ' s technique, t h e p r e s e n t m e t h o d w a s a p p l i e d t o h i s e x a m p l e (op. cir., pp. 174-175). H i s v a l u e s rsA 1 ~ .3708 a n d rsA 2 - - - - . 7 8 2 6 become ~sA 1 ~ .3347 a n d rsA 2 ~ - - - . 6 3 1 7 , D w y e r ' s values yield zero r e s i d u a l c o r r e l a t i o n s of t h e n e w test, w h i l e t h e v a l u e s o b t a i n e d by the m e t h o d of t h i s p a p e r yield s m a l l negligible r e s i d u a l s as n o r m a l l y a p p e a r in a f a c t o r a n a l y s i s .