PS]rCHOMETRIKA--VOL. ~0t 1~O. SEPTEMBER, 1955
LEAST SQUARES ESTIMATES AND OPTIMAL
CLASSIFICATION
HUBERT E . BROGDEN PERSONNEL RESEARCH BRANCH THE ADJUTANT GENERAL'S OFFICE DEPARTMENT OF THE ARMY$ A simple algebraic development is given showing that criterion estimates derived by usual multiple regression procedures are optimal for personnel classification. It is also shown that, for any assignment of men to ]obs, the sum of the multiple regression criterion estimates will equal the sum of the actual criterion scores. I n earlier p a p e r s (1, 2), t h e a u t h o r c o n t e n d e d t h a t e s t i m a t e s of j o b proficiency d e r i v e d b y l e a s t s q u a r e s e s t i m a t e s will p l a c e m e n in j o b s in t h e m o s t efficient w a y p o s s i b l e w i t h t h e given p r e d i c t o r b a t t e r y a v a i l a b l e , a n d t h a t t h e a v e r a g e e s t i m a t e d j o b proficiency o b t a i n e d b y t h e use of such l e a s t s q u a r e s e s t i m a t e s will e q u a l t h e a v e r a g e a c t u a l j o b proficiency of assigned personnel. T h i s p a p e r will seek to e s t a b l i s h t h e s e t w o p o i n t s in a m o r e rigorous fashion.
Definition of Symbols c~i = the performance of individual i in lob j. C~i = estimates of the C~i, each derived by regression equations from the same battery of tests and the same universe of individuals. I t is assumed that the zero-and higher-order regressions involving the tests and the C~i are linear, t ~ii = the average C~i value for a subset of individuals having the same pattern of scores on the battery of tests. X = an allocation matrix with elements, x~f, taking on values of zero and one. The a:~i entries for any individual have a single entry of one, and the x~i entries for job j have Qt entries of one. The remaining entries are zeros. The arrangement of ones in X corresponds to the placement of men in jobs. The use of X to symbolize any possible allocation of men to jobs is convenient and facilitates algebraic manipulation. In computing an allocation sum (to be defined), the cross-products of C~i and x~i are summed. When x~. is one, the corresponding C~i is included in the sum; when z~i *The opinions expressed are those of the author and are not to be construed as reflecting official Department of the Army policy. tin practice, the C~" would obviously not be" available for all individuals in each lob. Regression equations applying to the same universe can be estimated through a series of validation studies with a separate study being necessary for each job. In actual use the ( ~ could then be computed for each applicant in each job. 249
250
ZCi
PSYCHOMETRIKA
is zero the corresponding Cii is excluded. Thus, X represents any arrangement of zeros and ones, good or poor, consistent with the limitations already imposed, except that such an arrangement must be based solely upon the scores on the battery of classification tests. In other words, X represents any allocation of men to lobs consistent with the conditions of the problem. a set of constants, one for each ]ob. The Ki's are assumed to have numerical K i values such that, with allocation of each individual to the job in which (0~i + Ki) is highest, the number allocated to each job will correspgnd to the number specified by the quota for that job. X' = a particular X, with the x$i for each individual taking on a value of one for the iob in which (C~t + Ki) is highest. X' otherwise conforms to limitations imposed on X. ~ i i X i i = the allocation sum. From the definition of an allocation matrix, it is evident that the allocation sum is equivalent to a simple sum, across all individuals, of the C~i's for the job to which each is assigned by a given allocation matrix. Q i = the quota for job j.
The Proof We seek to d e m o n s t r a t e t h a t
C,,z. i,i
C,,z. > i,i
C,,x. i,i
Consider a subset of individuals h a v i n g an identical p a t t e r n of scores on the b a t t e r y of tests basic to the Ci; . Since we have specified t h a t x . and x~ are t o be based solely upon the test scores, it follows t h a t b o t h will remain c o n s t a n t in s u m m i n g across individuals within such a subset. T h e n , for such a subset
(C'. + K,)x. --- ~_, ( ~~ C . x . + ~_. K.x.) i.i
i
i
= E
(1)
i
0,, + E K,z,).
(2)
(x~, ~ C,i + ~ K,x~,).
(3)
E
Similarly, it follows tr at (C,; + g,)x~, = ~ i,i
i
i
i
As N approaches infinity, the n u m b e r in t h e subset approaches infinity. N o w ~ e criterion m e a n s of subgroups with identical score p a t t e r n s are the basic d a t a for graphic plotting of zero- a n d higher-order regression lines. I f t h e regression s y s t e m is linear, points representing t h e criterion m e a n s will fall on or near t h e regression lines. As the n u m b e r in the subgroups approaches infinity t h e difference between C , , the criterion m e a n for the subgroup, and C~i , the predictor value derived f r o m a linear regression equation, will a p p r o a c h zero. Consequently, it is also true that, for the subset, ~-~'~, C . , t h e s u m of the criterion scores, approaches equality to )-':.~ C~i •
H.E.
BROGDEN
251
The basis for the equivalence of C,; and C,i within a subset having an identical pattern of scores might also be stated as follows: It is a basic principle of least squares prediction t h a t the mean is the point at which the sum of the squares of the deviations is minimal. C,; , hence, is the best least squares estimate of the criterion scores of individuals with an identical pattern of test scores. If the regression system is linear, C;i also provides the best least squares estimate. Hence, as N approaches infinity, the two must coincide. From our definition of X', we know that, for such a subset (C,, + gi)x'~ > ~ (C,i + g,)x, . i,i
(4)
i,i
From equations 2 and 3,
(z~, ~ C,, + ~ g,x$i) > ~ (x,; ~ C,~ + ~ g,x,). i
i
i
i
i
(5)
i
Substituting ~"~.,C, for )-'~., C,; , we obtain
(x:, E C,, + ~ g,x~,) > ~ (x, ~ C,, + E g,x,). i
i
i
i
i
(6)
i
We may also write X ( X " i
C,,x,,' W ~ g,z:,)
i
--
~ ( ~~ C,,x,,' W Z g,x',)
i
i
i
i
>-- ~ ( X C,,x,, + X Kix,,).
(7)
Since (7) holds for any subset, it holds in summing over all individuals. In summing over individuals within any job, Ki is a constant and may be factored out. Both ~., x'i and ~., x , are, from the definition of X' and X, equal to Q; . Hence, we have X ( X
C,,x,,' + K,Q,)
--
~_, ( ~-~.C, ix,,' + K,Q,)
> ~ (~~ C,,x~, + K,Q,) i
(8)
i
or E i,i
A ! C,x,, + ~. K~Q, ~ ~ C,~x,i + ~ K~Q~ >_ ~ C,~x,~ + ~ K,Q~ (9) i
i.i
i
i,i
i
and, consequently, E i,i
~ C, ix,,t
~
~, C,ix,pi > ~ C,ix,i . i,i
(10)
i,i
We have, then, established two generalizations. First, we have shown that, as N approaehes infinity, the predicted criteria for a set of jobs derived by the use of linear multiple regression equations yields, upon assignment of men to jobs, an allocation sum that is equal to or higher than that obtained
252
PSYCHOMETRIKA
b y any other assignment of individuals to jobs t h a t is based on the test scores. Second, we have shown that, for any given assignment of men to jobs, the allocation sum obtained when regression estimates of the criterion are used becomes, as N approaches infinity, identical with t h a t obtained when the criterion scores themselves are used. REFERENCES 1. Brogden, H. E. An approach to the problem of differential prediction. Psychometrika, 1946, 11, 139-154. 2. Brogden, H. E. Increased efficiency of selection resulting from replacement of a single predictor with several differential predictors. Educ. psychol. Meas., 1951, 11, 173-195. Manuscript received 9]$9/54 Revised manuscript received I I / I 5/5<,