Quality and Quantity, 7 (1973) 1 8 9 - 1 9 6 9 Elsevier Scientific Publishing C o m p a n y , A m s t e r d a m - Printed in The Netherlands
SOME PROBLEMS OF DATA STANDARDIZATION ZYGMUNT GOSTKOWSKI Polish A cademy of Sciences, Warsaw
ABSTRACT
Procedure of aggregation of component indicators into one composite index of a multi.-dimensional phenomenon under study requires previous standardization of the original raw data in terms of which the component indicators are expressed. The standardization is inevitably connected with an arbitrary choice of a mathematical formula by which the same measurement units, for all component indicators, are established. It is argued that by choosing one of the several possible standardization formulae, a researcher, most often unknowingly, ascribes differential weights to particular component indicators. Two approaches, in solving the problem of arbitrariness of the choice of the standardization formula, are possible: (1) assessment of the margin of "error of arbitrariness," or "standardization effect," and the interpretation of substantive results obtained after the aggregation, within that margin; (2) evaluation of equivalencies of things, established by the alternative standardization formulae, in order to choose the formulae ensuring the most acceptable equivalence as judged by certain objective norms or criteria. The two approaches are discussed, and numerical examples illustrating each of them are presented using two kinds of standardization: the one based on percentages, and the other on the deviation from the mean divided by standard deviation.
Q u i t e o f t e n researchers, in o r d e r to describe fully a c o m p l e x p h e n o m e n o n u n d e r s t u d y , p e r f o r m an aggregation o f its c o m p o n e n t indicators i n t o o n e c o m p o s i t e i n d e x c o n s i d e r e d a s y n t h e t i c m e a s u r e . It is believed t h a t studies b a s e d on such s y n t h e t i c indices yield m o r e c o m p r e h e n s i v e results, i.e. are m o r e i n f o r m a t i v e . M o s t f r e q u e n t l y , in aiming at a higher degree o f i n f o r m a t i v e n e s s , o n e does n o t a t t a c h e n o u g h imp o r t a n c e to the f a c t t h a t the p r o c e d u r e o f aggregation o f c o m p o n e n t i n d i c a t o r s m u s t be p r e c e d e d b y t h e i r s t a n d a r d i z a t i o n , i.e. such a transf o r m a t i o n as w o u l d r e d u c e the original r a w d a t a t o s o m e c o m p a r a b l e f o r m w h i c h m a k e s the a g g r e g a t i o n possible. T h e usual a p p r o a c h to the p r o b l e m o f s t a n d a r d i z a t i o n o f original ( " r a w " ) variables consists in a m o r e o r less r o u t i n e a p p l i c a t i o n o f certain m a t h e m a t i c a l f o r m u l a e . T h e p u r p o s e o f s t a n d a r d i z a t i o n is to m a k e the v a r i a b l e s c o m p a r a b l e ( a n d " s u m m a b l e " ) , i.e. to express t h e m in " t h e s a m e " m e a s u r e m e n t units. In o t h e r words, s t a n d a r d i z a t i o n pro189
cedure is to enforce comparability on originally incomparable things like hospital beds, physicians, tv sets, electricity consumption, etc. Put differently, standardization is performed "for different measurement units." However, this definition is oversimplifying the whole matter. Let us take a concrete example to illustrate the point. Imagine there are two variables relating to health services: physicians and pharmacists. Apparently, both variables are expressed in "the same" measurement units (persons). But even intuitively one feels that a comparison and aggregation, without previous standardization, cannot be made sensibly. The variable "pharmacists" has much smaller values as pharmacists are always less numerous than physicians. This could also be expressed by saying that the range of the former variable is much smaller than that of the latter. Thus, if the two variables are to be ascribed equal weight, 1 one physician cannot be considered as equivalent, or comparable, to one pharmacist. The problem o f standardization, then, boils clown to the question: how many physicians should be taken as a unit, i.e. equivalent quantity corresponding to how m a n y pharmacists (a unit for pharmacists)? From this example one can see that standardization is necessary not only for different things in terms o f expressing original, raw variables, but also for different absolute values and ranges. Of course, this point is even more relevant to cases where the variables to be standardized are originally expressed in terms of quite different things and, necessarily, quite different measurement units. There are several methods o f transformation o f raw variables in order to obtain standardized values. Most often, the competitive methods at a researcher's disposal are rank or decile, percentage and mean-onstandard deviation transformations. In this connection the crucial question is how to establish the objective criteria for choosing the standardization m e t h o d most valid for the data and/or substantive problem at hand? The question is crucial because, depending on the standardization method chosen, a researcher may obtain different substantive results. This fact may be immediately revealed by comparing, for example, the two rank orders of subjects under investigation, established on the basis of two different composite indices, each derived from differently standardized component variables. Quite often such orders do not coincide. Another way of assessing the effect of standardization methods would be to analyse the standardization functions (the shape of curves) or to compare the differences in intervals between subjects in terms of raw as against the standardized variables as well as in terms of several alternatively standardized variables. 190
What are the possible approaches when a researcher wants to assess to what extent his standardization has influenced the results of the study? This is, at the same time, a question about the implicit unequal weights being attached to particular variables as a result of the choice of a given standardization method. Two types Of general approach are possible in solving such problems: (1) to accept the fact that the choice of a standardization method must be rather arbitrary and to assess only the margin of the "error of arbitrariness," or "standardization effect," caused by the necessity to standardize the data; or (2) to evaluate the equivalencies established by the alternative standardization methods in order to choose the method that ensures the most acceptable and plausible equivalencies. Let us give some examples illustrating the two approaches. (Ad. 1) In the present writer's research of 50 countries, the dimensions studied were: economic potential as measured by two component indicators, viz. GNP per capita in US dollars and energy consumption per capita in coal equivalents; mass media exposure (a proxy for attitudinal modernization) measured by the number of radios, newspaper copies, and televisions per 1000 population; high-level manpower potential (a proxy for the future high-level manpower) measured by the number of secondary-level students and third-level students per 100,000 population. Three points in time: 1955, 1960, and 1965 were selected for diachronic analysis. Aggregation of the component indicators i-to composite indices for each of the three dimensions was performed using the "ideal country," in the Euclidean space, as a reference point. The purpose of the study was to answer the question whether the developmental gaps between the "rich" and the "poor" countries in each of the three fields have been increasing or decreasing, and at what rate, in the two quinquenniums. 2 Two alternative methods of standardization were used, and for each one somewhat different results were obtained. The first method (we shall denote it henceforth "version I") consisted in the application of the usual formula: (x - X-)/o. This means that, for a given variable, all its original values, for 50 countries and the three selected years, were averaged; then, deviations from that mean were computed and divided by the standard deviation. The second method (denoted henceforth "version II") was performed on a simple percentage basis. For each component variable, its maximal value was identified (from among 5 0 . 3 = 150 values) and used as a 100%. It appeared that, except for the variable "newspaper copies" whose maximum, located in Great Britain, fell in the year 1955
191
all the remaining maxima occurred in the terminal year, i.e. 1965, and were located in the United States with the single exception o f secondlevel students whose variable's maximum was realized by Japan. After the aggregation of the standardized c o m p o n e n t indicators (variables) into the three composite indices measuring the attained development levels, and the calculation of the gaps, the results obtained were as shown in Table I. It can be seen there that the absolute figures expressing the magnitude of the gaps, for the same year and same dimension are quite different, depending on the standardization method. This is understandable: version I uses, as a unit, the standard deviation whereas the unit used in version II is a percentage point. Despite this difference, b o t h versions show the same general picture: on all three dimensions the gap between rich and p o o r countries has increased. But rather u n e x p e c t e d are the differences in the rate of that increase as shown by the percentage figures in between the two quinquennia in Table I. These differences must be considered as a standardization effect. Interestingly this effect is very small in the case of mass media exposure and quite considerable in the two remaining fields. However, more i m p o r t a n t l y the dynamics o f gap increase in both the quinquennia show one feature, regardless o f the standardization m e t h o d used: the rate of that increase is always higher in the period 1 9 6 0 - 1 9 6 5 than for 1 9 5 5 - 1 9 6 0 . This means that there has been an acceleration o f the gap increase in the decade 1 9 5 5 - 1 9 6 5 . TABLEI Results of the Measurement of DevelopmentalGaps Between Rich and Poor Countries, Depending on the Method of Data Standardization, in Three Fields of Development,and for 50 Countries Gap Measuresin Type of Standardization
Economic Potential
Exposureto Mass Media
High-levelManpower Potential
1955
1955
1955
1960 1965
18% VersionI
1,902 2,249 2,755 25%
VersionlI
22%
30%
7%
1960 1965 22%
1,996 2,145 2,625 9%
22%
1960 1965
16%
39%
1,481 1,723 2,386 17%
29%
34,815 43,369 56,179 42,194 46,065 56,149 28,407 33,283 42,914
Percentage figures given in between the years 1955-1960 and 1960-1965 express the increase of the gap for each quinquennium. 192
If we wanted, however, to assess more specifically, the intensity of that acceleration, we should again encounter difficulties resulting from the standardization effect: according to version II, the acceleration of gap increase in the field of economic potential was m u c h more pronounced than according to version I; contrarywise, the rate of increase of the gap for high-level manpower in the period 1960-1965 was m u c h higher according to version I, than to version II. The above analysis of figures in Table ! shows how can one discount, in interpreting the research results, the effect of standardization. The analysis related to very general measures of a central tendency in two groups of countries. Similar problems of interpretation of research results arise when one is interested in more concrete knowledge, e.g. the relative positions of particular countries on the ladder of development levels calculated on the basis of differently standardized data. The methodological question that must be answered is: what is the margin of translocations, or shifts, of countries on the rank scale owing to the different standardizations? This is especially important when a researcher wants to follow a country's movements on the development scale in successive points in time. Some results of this type of analysis were obtained in the research referred to above. Rank shifts as well as their magnitude were recorded by the dimensions and the years; the results of this analysis are shown in Table II. The information contained in Table II is very interesting; the dimensions (or rather the constellations of component variables making the composite indices) differ very m u c h with respect to sensitivity to standardization effect. So, the dimension "economic potential" (with its two component variables) is most stable and insensitive as the shifts T A B L E II S h i f t s o f C o u n t r i e s o n t h e R a n k Scale o w i n g t o S t a n a a r d i z a t i o n E f f e c t , b y t h e R a n g e o f Shifts, Dimensions and Years Range of Shifts (Ranks)
Number of Shifts Economic Potential
Exposureto
1955
1960
1965
1955
I960
1965
1955
t960
1965
21 2 3 1
16 9 1 3
15 6 10 5
28 6 2
18 8 3
18 8 1
-
1
1
-
1
1
6
4
8
2 3 4
-
1 -
2 -
5
.
6 7
-
.
.
.
-
-
Mass
-
Media
High-level M a n p o w e r
-
1
4
-
1
--
1
--
-
1
-
193
owing to different standardizations are very few and never exceed 3 ranks (one case only!). Next, with respect to insensitivity to standardization, goes the dimension "mass media exposure," with 9 cases of 4 rank shifts and 3 cases of 5 - 6 rank shifts. Even smaller insensitivity is shown by the dimension "high-level manpower." The full explanation of these facts cannot be given here because it requires a more detailed analysis. Most probably, however, differentiating influence of the two standardization methods results from varying weights, unknowingly ascribed to particuNr component variables depending on the standardization method used. The analysis should reveal these weights. But a researcher can do something about the standardization difficulties even without the results of such analysis. He may try to discount the "error of arbitrariness" in interpreting and evaluating the standardized indices. Thus, in studying the changes in the positions of various countries on the dimension "economic potential," he may decide to consider only as actual, and not artifactual, those shifts which exceed 3 ranks. Of course, his situation in cases Of the other two dimensions is much worse as the margin of error appears to be much wider. (Ad. 2) A different approach to standardization difficulties would consist in "equivalence analysis." For instance, using two alternative standardization formulae, a researcher might compare the "portions" of things or persons established as units by both standardizations. This comparison would lead to some kind of evaluation of quantitative relations between such "portions" in the light of certain technological norms and/or empirical knowledge. Such norms exist in certain fields such as health services or high-level manpower (e.g., the optimal relation between the number of physicians and nurses or between the number of engineers and middle-level technicians). The standardization, which yields the unitary "portions" that are closer to the normative or otherwise reasonable relations, could be considered more acceptable. This approach would be, however, restricted to a smaller number of variables, viz. such ones as can be compared and evaluated in the light of some objective norms and criteria. An example illustrating this approach may be given in connection with percentage standardization. Deciding that a percentage basis (i.e. 100%) of the variable X will be the highest value attained in a certain country means that this value (or this "portion" of something) is considered as equivalent to the highest value of another variable Y attained by some other (or sometimes the same) country. Now, such a solution, if applied mechanically, may lead 194
TABLE III Countries
A B C D E
GNP per Capita in US Dollars
Energy Consumption per Capita in Coal Equivalents
Raw Values
Standardized Values
Raw Values
Standardized Values
250 1000 300 270 290
25 100 30 27 29
680 610 705 800 593
85 76 88 100 74
to unintended and unjustified weighting on behalf of one of the variables. A simple example using artificially cooked-up figures will illustrate this. Imagine one has obtained the following raw and standardized values of two variables, GNP per capita and energy consumption per capita, for countries A, B, C, D, and E (see Table III). The inspection of the standardized values for both variables reveals that GNP is generally expressed in much smaller figures than energy consumption. If one were to aggregate these two variables into one composite index (for example, by computing developmental distances from the two 100% figures), the share of GNP in the values of the composite index would be much smaller than that of energy consumption. This would mean that GNP has been ascribed less weight. This effect would become apparent because country B, setting the percentage basis, happened to have a particularly high relative value of GNP, i.e. outdistancing the remaining countries. Everything, then, depends on whether this value may be accepted as "seriously" as the equivalent 100% value for energy consumption in country D. This question must be solved by a researcher on the basis of his judgement and evaluation of the two 100% "portions." If, for example, country B were a sort of "anomaly" (like Kuweit or Venezuela) and country D a "normal" country, the equivalence would be questionable. Country B should be, then, rather eliminated from this analysis and a different percentage basis established for GNP. It should be mentioned in this connection that a similar effect might also appear if a standard deviation were used as a basis of standardization. A few extremely high values could heighten the standard deviation to be used as a denominator, which fact would entail smaller values of the variable, i.e. its lesser weight. 195
Final Remarks The problems discussed in the present paper seem to be rarely realized by researchers. The problem of weights is usually posed in isolation from the standardization difficulties. In trying to elaborate the methods of derivation of weights, one should not forget that even the most sophisticated and mathematically elegant methods will not be of any use if uncontrolled weighting is introduced at the basic level of data transformation and processing.
1 Standardization is also necessary when unequal weights are foreseen - unless these weights happen to be identical with the "natural," original proportion between the raw units which is highly unlikely. 2 The method of construction of a general measure of the gap between two groups of countries is described in the previous paper by the present author. See Z. Gostkowski (1972), "How to Measure Developmental Gaps Between Rich and Poor Countries. A Methodological Proposal," in Toward a System of Human Resources Indicators for Less Developed Countries. Papers prepared for a UNESCO Research Project, edited by Z. Gostkowski (WarsawWroclaw: Ossolineum), 1972.
196