ESTIMATION AND PREDICTION IN THE SPATIAL LINEAR MODEL O. BERKE Institut für Biometrie, Epidemiologie und Informationsverarbeitung, Tierärztliche Hochschule Hannover, D-30559 Hannover, Germany (E-mail:
[email protected])
(Received 25 February 1997; accepted 15 January 1998)
Abstract. Often in environmental monitoring studies interesting ecological factors will be observed at several locations repeatedly over time. Generally these space-time data are subject to a sequential spatial data analysis. In geostatistics, spatial data describing an environmental phenomenon like the pH value in precipitation at several locations are regarded as a realisation from a stochastic process. Component models are used to interpret the spatial variation of the process. Decomposing the spatial process into single components is based on the theory of linear models. Trend surface analysis is seen to be the geostatistical method for best linear unbiased estimation (BLUE) of the trend component, whereas universal kriging is equivalent to best linear unbiased prediction (BLUP) of the realisation of the spatial process. Furthermore trend surface analysis and universal kriging are shown to agree with the estimation of fixed effects and prediction of fixed and random effects in mixed linear models. Since estimation and prediction for spatial data result in different interpolations the differences are explained also graphically by example. The example uses acid-precipitation monitoring data. The extension of these spatial methods for application to space-time problems by combination with dynamic linear models is treated in the discussion. Keywords: acid-precipitation, best linear unbiased estimation, best linear unbiased prediction, environmental monitoring, geostatistics, mixed linear models, trend surface analysis, universal kriging
1. Introduction In this article, the pH value in precipitation observed at several locations is spatially analysed for demonstration of certain geostatistical methods useful for application on monitoring data. Environmental protection and sustainable development provide the need for reduction and control of man-made environmental pollution. For that reason, chemicalphysical monitoring of the ecosystem by continuing collection of data to establish whether explicitly stated quality control conditions are being met is of basic interest. Urfer (1993) declares ‘the major challenge for research on forest decline over the next decade is the interpretation of the effects of multiple stress factors on the functioning of forest ecosystems’. By use of the proportional odds model to describe the effects of location, tree height, and tree social status on changes over time in the health of individual trees, Urfer et al. (1994) find striking patterns of temporal change that differ across location. Hence in the discussion of the article by Urfer et al., Ware (1994) hopes that the results will lead to ‘identification of the specific air Water, Air, and Soil Pollution 110: 215–237, 1999. © 1999 Kluwer Academic Publishers. Printed in the Netherlands.
216
O. BERKE
pollutants or other environmental insults that threaten European forests’. Furthermore in the context of a forest ecological monitoring project, Busch, Brechtel and Urfer (1994) point out that investigations on the hydrological site conditions and possible change over space and time are of vital interest for environmental decision making. Geostatistics were mainly developed for application in geological sciences. Now the use of geostatistical techniques is increasingly becoming more widespread. Zimmerman and Harville (1991) give an example for the spatial analysis of agricultural field experiments. Other applications belong to atmospheric environmental problems (Christakos and Thessing, 1993) as well as soil pollution problems (cf. Stein, 1994). Haslett and Raftery (1989) and Sampson and Guttorp (1992) are references for the assessments of wind power resources and solar radiation, respectively. The behaviour of spatial data is similar to that of time series data. At a first view they look totally erratic. But observations taken at near by locations are more alike than observations further away. Hence, the data are assumed to be realisations from random variables that are positively correlated in space. In contradiction to time series models, where the autocorrelation extends only in one direction: backwards. In the spatial case correlation is present in all directions. Therefore, spatial data can be thought of as resulting from observations on a stochastic process Z = { Z(s) : s ∈ D } . It is worth to mention that there are three broad types of spatial data: geostatistical data, lattice or regional data, and point pattern. From these geostatistical-type problems are distinguished most clearly from lattice-type problems by the ability of the spatial index s to vary continuously within the spatial domain D. Point pattern arise when the variable of interest is the location of an event. To interpret spatial data it is common to use stochastic processes. On the other hand, spatial inferences are drawn from linear models and in particular from the general linear model or Aitken model. In geostatistical terminology the method of linear estimation is called trend surface analysis and the corresponding method of linear prediction is called universal kriging. Both methods, trend surface analysis as well as universal kriging, are often said to be optimal spatial interpolation techniques. Trend surface analysis results in estimating the deterministic mean structure E(Z(s)) of the process under study, which is also called the large scale variation. Universal kriging aims to predict an unobserved value of a certain realisation of the process, i.e. the large and small scale variation. Beside this it is common practice to decide between estimation of fixed effects and prediction of random effects in the context of mixed linear models. Therefore the aim of this article is to connect the geostatistical methods of trend surface analysis and universal kriging with the estimation of fixed effects and prediction of random effects within mixed linear models. This may be a starting point for
ESTIMATION AND PREDICTION IN THE SPATIAL LINEAR MODEL
217
practitioners who are familiar with the mixed linear model to get into the notions of geostatistics. The development proceeds as follows. Section 2 gives details on geostatistical component models. Section 3 and 4 reviews the methods of trend surface analysis and universal kriging, respectively. Whilst section 5 relates these geostatistical methods to methods usual in the analysis of mixed linear models; section 6 illustrates them by example. Section 7 concludes this article with a discussion, including the problems of unknown second order moments and spatiotemporal prediction.
2. Geostatistical Component Models In time series analysis the observed data are regarded as representing one realisation of a stochastic process. For an interpretation of the data, often the classical decomposition model is used. With that the data are the sum of a trend, seasonal, and random noise component. The time series components are introduced to model deviations from the assumption of mean stationarity. In geostatistics component models are also in use. But here the components are modelling proportions of the total process variation. Therefore the spatial process Z is mostly modelled by models consisting of three or two components, viz.: Z(·) = µ(·) + η(·) + ε(·) and Z(·) = µ(·) + δ(·) respectively, where µ(·) represents the deterministic trend component or large scale variation, η(·) is a mean square continuous, zero-mean stationary process for the small scale variation, ε(·) is a zero-mean white noise process modelling the nugget effect (which consists of the micro scale variation and/or measurement errors) that is uncorrelated with η(·), and δ(·) = η(·)+ε(·) is a stochastic component which is a zero-mean stationary process. See Cressie (1993, p. 113) for a more comprehensive discussion of this topic. To measure the spatial dependence structure or second order properties of the process one may use the covariogram (sometimes called the covariance function) or the variogram which are functions of translation d, or in the case of isotropy, i.e. direction invariance, they are functions of the distance kdk, between any two observable random variables at the locations s and s + d. The covariogram of the Z process is defined as follows (cf. Cressie, 1993, p. 84): σZ (d) := Cov(Z(s), Z(s + d)) = σδ (d) = ση (d) + σε (d).
218
O. BERKE
Because ε(·) is white noise it follows immediately that 2 d=0 σε σε (d) = . 0 d>0 The variogram 2γ (d) is defined by 2γZ (d) := V ar (Z(s) − µ(s)) − (Z(s + d) − µ(s + d)) . When the trend component is constant, i.e. µ(s) = µ(s + d) ≡ µ, the variogram of the Z process equals the variogram of the residual process 2γZ (d) = 2γδ (d) = 2γη (d) + 2γε (d). The reason for use of the variogram 2γ (d) or the semivariogramm γ (d) is, that they are better graphical tools for the practitioners and generally variogram estimators are less biased than the covariogram estimators (cf. Cressie, 1993, Sec. 2.41). Geostatisticians prefer the variogram over the covariogram because it is defined in cases (intrinsic stationary processes) when the covariogram is not. But this is just a theoretical reason. For estimation of µ(·), σ (·) or γ (·) one must assume the observed process to be ergodic, which is an stronger assumption than second order stationary and hence contains the case of intrinsic stationary (cf. Cressie, 1993, p. 57f.). However, for a second order stationary process the variogram and covariogram are equivalent structure functions possessing the relationship 2γ (d) = 2 σ (0) − σ (d) . The aim of the statistical analysis described here is to make linear inferences about Z. Hence, one has to describe the distributional characteristics of Z at least by its first and second order moments, e.g. µ(·) and σ (·), respectively. This is done within the framework of linear models and needs the distinction between the µ(·) and δ(·) component. For this reason define: Z = ( Z(s1 ), ..., Z(sn ) )0 µ = ( µ(s1 ), ..., µ(sn ) )0 6 η = ( Cov(η(si ), η(sj )) )ni,j =1 and let I denote the n-dimensional identity matrix. Then write E(Z) = µ Cov(Z) = 6 δ = 6 η + σε2 I.
ESTIMATION AND PREDICTION IN THE SPATIAL LINEAR MODEL
219
3. Trend Surface Analysis For estimation of the mean structure µ(·) several methods have been proposed. Trend surface analysis is a spatial estimation method used to interpolate given observations Z(si ) at locations s1 , ..., sn within a region D by a multidimensional generalisation of polynomial regression models (cf. Watson, 1971; Ripley, 1981, p. 29; Agterberg, 1984; Haining, 1987). Often practitioners of geostatistics use trend surface analysis for simple linear regression of the data on the sample location co-ordinates only. Then the small scale variation is taken to be white noise, thus imposing no spatial correlation structure. This is not necessary and may be even false. The approach given here is based on general linear models. Two competitors of the trend surface method are median polishing and twodimensional spectral analysis (cf. Cressie, 1993, p. 116; Renshaw and Ford, 1983). Median polish is resistant to outliers. This method is most suitable for gridded spatial data. For nongridded data the method can be adopted in connection with an appropriately chosen regular lattice overlay (cf Cressie, 1993, p. 193). Unfortunately, the results of this method may then hardly depend on the practitioners choice. On the other hand, two-dimensional spectral analysis is most suitable for spatially periodic phenomena in connection with large data sets. Consider a two component model Z(s) = µ(s) + δ(s),
s = (x, y)0 ∈ D ⊂ IR 2
(1)
with µ(s) = 6jk=1 fj −1 (s) βj −1 being an unknown linear combination of known functions f0 (s), ..., fk−1 (s), s ∈ D, and unknown but fixed parameters β0 , ..., βk−1 . Although the functions f (s) are written as a function of spatial co-ordinates x = (x1 , ..., xn )0 and y = (y1 , ..., yn )0 , any other non-spatial explanatory variable may be associated with them. For example, by use of a Geographical Information System (GIS) additional covariables may become available for all relevant locations. Further, the δ(·) component is a zero-mean, stationary process with known covariogram. However, in practice the covariogram is generally unknown and has to be estimated. Examples of the most often used trend surfaces µ(·) for processes in the plane (IR 2 ) are the polynomials of degree up to two:
flat linear quadratic
β0 1 β0 1 + β1 x + β2 y β0 1 + β1 x + β2 y + β3 x2 + β4 xy + β5 y2
In particular the ‘linear’ trend surface is given by three known functions of the spatial co-ordinates of the sampling locations f0 (s) = 1, f1 (s) = x, and f2 (s) = y, and three unknown trend parameters β = (β0 , β1 , β2 )0 .
220
O. BERKE
To estimate the unknown parameters β and hence to estimate the trend component µ(·) is a standard least squares or multiple regression problem within the framework of the general linear model: Z = F β + δ,
E(δ) = 0,
Cov(δ) = 6 δ
where F is a n×k matrix whose (i, j )th element is fj −1 (si ), β = (β0 , , ..., βk−1 )0 and δ = (δ1 , , ..., δn )0 . The large scale variation parameters β are of interest and the covariance matrix 6 is assumed to be known and structured accordingly to the covariogram. Now assume the matrices F and 6 to have full column rank, i.e. 6 is nonsingular. Then the generalised least squares (GLS) estimator of β is the best linear unbiased estimator (BLUE), which is given by the Gauss-Markov-Aitken theorem (cf. Rao and Toutenbourg, 1995, p. 97) as −1 0 −1 b β GLS = (F0 6 −1 δ F) F 6 δ Z.
Then the best linear unbiased estimator of the trend component or mean structure µ(·) at any interesting location s0 ∈ D is given with f(s0 ) = (f0 (s0 ), ..., fk−1 (s0 ))0 by b β GLS . µ(s0 ) = f(s0 )0b E(Z(s 0 )) = b As mentioned above, in practise the covariance matrix 6 is unknown as well b in the formula for the BLUE b as β. Replacing 6 by an estimate 6 β GLS gives the ‘empirical’ best linear unbiased estimator: EBLUE. This aspect is treated in the discussion.
4. Universal Kriging Kriging is a minimum-mean-squared error method of prediction based on observations from a single realisation of a spatial process. In other words, the predicted values belongs to the same realisation of the spatial process like the observed sample data. Three types of linear spatial prediction are simple, ordinary, and universal kriging. The distinction between these is given with respect to several generalisations on the large scale component. In simple kriging the trend component is assumed known. On the other hand, in ordinary and universal kriging the trend component is unknown, however it is assumed constant or polynomial of higher order, respectively. For universal kriging the process Z is modelled by the two component model (1) with polynomial trend component µ(s) = 6jk=1 fj −1 (s) βj −1 and assumed known covariogram as in the section above. The concept of optimality is that of best linear unbiased prediction (BLUP).
ESTIMATION AND PREDICTION IN THE SPATIAL LINEAR MODEL
221
Finding the BLUP or universal kriging predictor is a solution to a constrained optimisation problem. This problem is solved by Goldberger (1962) using the Lagrange method and results in (cf. Matheron, 1969; Cressie, 1993, p. 151) b0 = f(s0 )0 b Z β GLS + σ 0 6 −1 (Z − F b β GLS ) where σ 0 = Cov(Z(s0 ), Z). The universal kriging predictor can be expressed in terms of the variogram instead of the covariogram as well. So the spatial process needs only to be intrinsic stationary if the variogram is known. If the used structure function, covariogram or variogram, is unknown replacing the corresponding quantities in the BLUP by estimates gives the ‘empirical’ best linear unbiased predictor: EBLUP. Implications of this approach are examined in the discussion. Universal kriging is not the only way to predict or interpolate spatial observations (cf. Cressie, 1993, Sec. 5.9). Among several alternatives median polish kriging, intrinsic random function (IRF) kriging and the thin-plate spline approach are worth mentioning. Median polish kriging (cf. Cressie, 1986) provides an outlier resistant or robust way of kriging in the presence of trend, yet yield results similar to those of universal kriging. Median polish kriging is given by the combination of estimating the trend surface using the median polish algorithm first and then perform universal kriging with constant trend polynom (i.e. ordinary kriging) for the median polish residuals. Intrinsic random functions (IRF) are an spatial modelling approach similar to the Box-Jenkins time series models. Christensen (1990) shows that the predictions based on intrinsic random functions are identical to those obtained from universal kriging. The methods differ only in how the structure function is estimated. Especially in connection with restricted maximum likelihood (REML) estimation of the structure function, universal kriging is analogous to IRF kriging. Furthermore, under certain conditions universal kriging and the thinplate spline approach are equivalent to one another. Nevertheless, universal kriging offers in principle a more flexible modelling strategy than splines (cf. Kent and Mardia, 1994). However, Laslett (1994) used kriging and splines on several real data sets to compare their predictive performance. In these case studies kriging never performs worse than splines, and sometimes kriging outperforms splines. Furthermore, many comparative studies found that universal kriging was superior to other methods, see for example Brus et al. (1996) and their references.
5. Mixed Linear Models The acronym BLUP for best linear unbiased predictor is part of the terminology of mixed linear models and refers to the ‘estimation’ of random effects, which is then called ‘prediction’ as opposed to the estimation of fixed effects (cf. Robinson, 1991). In what follows, to call universal kriging a method of prediction, one has to
222
O. BERKE
transfer the universal kriging method from the framework of general linear models to that of mixed linear models. The mixed linear model is a linear model in which some of the parameters are unobservable and random. The model can be written: Z = F β + H u + ε,
E(Z) = F β,
Cov(Z) = σ 2 (HGH0 + R),
where Z is a vector of observable random variables, F and H are known matrices, β is the vector of unknown parameters having fixed values (fixed effects), u is a vector of unobservable variables (random effects), and ε is a vector randomnoise of 2 G 0 (measurement errors) such that E(u) = 0, E(ε) = 0 and Cov u = σ 0 R ε where G and R are known positive definite matrices and σ 2 is a positive constant. The so called mixed model equations are similar in spirit to normal equations, however they simultaneously provide estimates b β and predictors b u. The mixed model equations are defined by Henderson (1950) as " b # 0 −1 0 −1 β FR Z F R F F0 R−1 H = . 0 −1 0 −1 u H R F H R H +G b H0 R−1 Z The solution to the mixed model equations are b β = (F0 WF)−1 F0 WZ −1 β) b u = (H0 R−1 H + G−1 )−1 H0 R (Z − F b where W = R−1 − R−1 H(H0 R−1 H + G−1 )−1 H0 R−1 . The estimator of the fixed effects b β is equivalent to the BLUE or Aitken estimator for β. This result follows from an identity for the inverse of the observational covariance matrix σ −2 (HGH0 + R)−1 = σ −2 W (cf. Henderson et al., 1959). Therefore, the estimator for fixed effects F b β is identical to the trend surface estimator b F β GLS used in geostatistics. The equivalence between the mixed linear model BLUP and the geostatistical universal kriging predictor is mentioned by Christensen (1987, p. 227). A proof for this result is implicit in the arguments given in Christensen (1991, p. 293) for the special case of covariogram models that are linear in the elements of θ. However, most of the widely used covariogram models are non-linear in the elements of θ. Based on the ideas by Robinson (1991), a proof for the non-linear case can be found in Berke (1996a). However, Campbell (1991) points out that the geostatistical BLUP is derived for more general than the classical mixed linear models by use of the variogram, which is the geostatistical innovation in this area.
ESTIMATION AND PREDICTION IN THE SPATIAL LINEAR MODEL
223
6. Example: Monitoring the Acid Precipitation in Lower Saxony
The emission of certain chemicals by fossil energy consumption leads to air pollution and further far reaching damage. Particularly, the deposition of air pollutants on the ground, measured as wet deposition or acid rain, results in reduced quality and quantity of drinking water resources. The discussion of water resources acidification leads to the foundation of many environmental research projects. In 1985 a monitoring project started to monitor the monthly open area precipitation, i.e. the wet deposition, of Lower Saxony. The aim of the project is to describe the spatial and temporal variation of the precipitation quality (cf. Hölscher et al., 1994). The data were collected from the Lower Saxony Precipitation Network of 39 sites in north-western Germany. Beside other parameters of water quality, the acid precipitation is measured by the pH value, which is the negative logarithm of hydrogen ion concentration. (This is one of those cases in which two scientific disciplines are sharing the same term for different objectives, i.e. statistical parameters are to be distinguished from hydrological parameters.) From the database a sample of 39 pH measurements for April 1987 is taken to describe the currently discussed spatial methods of trend surface analysis and universal kriging by example. Table I provides the site location and site numbers as well as corresponding pH data and their ranks for April 1987. Each observation is indexed by si = (east, north)0 = (xi , yi )0 ∈ IR 2. For example at site 1, Bad Rothenfelde, s1 = (0.30, 0.29)0 and the realisation of Z(s1 ) is z(s1 ) = 5.03 with rank 18. The network map of all sites is given in Figure 1. From that map it is seen that the entire monitoring region is covered by the network.
Figure 1. Lower saxony precipitation network.
224
O. BERKE
TABLE I pH measurements and their ranks for the 39 sites of the Lower Saxony Precipitation Network, for April 1987 Site 1 4 5 11 12 13 14 15 16 17 18 23 24 25 26 28 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 50 52 53 54
Bad Rothenfelde Dörenberg I (hillside) Dörenberg I (hilltop) Osnabrück (city) Damme-Harlinghausen Trillen-Berg Kreuzberg Lingen-Baccum Bad Bentheim Neuenhaus-Itterbeck Rütenbrock Edewecht Hude-Hasbruch Ovelgönne-Colmar Friedeburg Emden-Knock Ahlhorn Bückeberge Schwaförden Brockel Osterholz-Scharmbeck Stade-Mulsum Drangstedt Stuvenwald Scharnebeck Siemen Knesebeck Unterlüß Berkhof Norderney (city) Norderney (airport) Holzminden-Schießhaus Hann.-Münden-Hemeln Northeim-Westerhof Riefensbeek Seesen-Hohestein Diekholzen-Petze Königslutter Buer-Osterwalde
East
North
pH
Rank
0.30 0.28 0.28 0.28 0.26 0.24 0.22 0.15 0.10 0.06 0.11 0.28 0.37 0.36 0.24 0.10 0.31 0.50 0.45 0.57 0.44 0.54 0.42 0.64 0.78 0.91 0.82 0.73 0.63 0.13 0.15 0.57 0.61 0.71 0.78 0.74 0.64 0.84 0.32
0.29 0.31 0.31 0.36 0.43 0.41 0.45 0.42 0.38 0.43 0.53 0.63 0.61 0.70 0.73 0.72 0.56 0.33 0.49 0.61 0.71 0.76 0.78 0.71 0.68 0.59 0.48 0.53 0.46 0.83 0.83 0.20 0.10 0.16 0.16 0.21 0.29 0.32 0.35
5.03 4.53 4.47 6.00 6.00 6.07 5.77 6.13 5.93 6.50 6.60 6.43 5.73 5.83 5.97 6.27 6.77 4.93 5.13 5.33 5.90 5.87 5.70 4.83 5.03 4.63 4.63 4.87 4.97 5.03 4.35 4.43 4.63 4.67 4.60 4.67 5.10 5.63 4.57
18 4 3 31.5 31.5 33 24 34 29 37 38 36 24 26 30 35 39 14 20 21 28 27 23 12 18 8 8 13 15 18 1 2 8 10.5 6 10.5 19 22 5
ESTIMATION AND PREDICTION IN THE SPATIAL LINEAR MODEL
225
TABLE II Percentile statistics 100% 75% 50% 25% 0%
6.1. E XPLORATORY
maximum 3rd quartile median 2nd quartile minimum
z(39) z(30) z(20) z(10.5) z(1)
= = = = =
6.77 5.97 5.13 4.67 4.35
SPATIAL DATA ANALYSIS
To get some idea of the data the statistical analysis starts with an exploratory spatial data analysis. Figure 2 shows the stem-and-leaf plot. Behind the exploratory spatial data analysis is an assumption that the data are a sample from a spatial process, therefore the stem-and-leaf plot should not be interpreted as estimating the distribution from which the data are sampled.
Figure 2. Stem-and-leaf plot for pH values using data of Table I.
Some helpful explanatory summary statistics for this sample of size n = 39 are the range R = 2.42 and the interquartile range I QR = 1.30 which follow from the percentiles, given in Table II, based on the ordered data of Table I. At first look, one may think of the data being skewed to the right. If this is the case, transforming the data to achieve a symmetrically or Gaussian distribution is often possible. Box and Cox (1964) suggested a systematic method of finding an appropriate transformation. In environmental applications the data often represent concentrations which typically follow skewed distributions. On the other hand, the skewness may arise from a spatial trend. This maybe true here, because the pH value is an index for logarithmic transformed concentration data that is one of the transformations proposed by Box and Cox (1964). Thus to get a view of the spatial distribution of the pH values, Figure 3 shows a triple scatterplot (cf. Anscombe, 1973) of the data given in Table I. The threshold values used in defining the triple scatterplot are the percentiles listed above. This plot indicates the presence of spatial correlation, i.e. the data exhibit the tendency to come in spatial clusters for the various pH levels. Further, one may see some kind of spatial trend with
226
O. BERKE
decreasing pH levels from the north-west to the south-east. Thus a linear trend surface of polynomial order 1 is assumed to fit adequately the trend component of the precipitation process.
Figure 3. Triple scatterplot for pH in open area precipitation, for April 1987.
Exceptions from this trend are very low observations on the island of Norderney (no. 43 and 44) and one observation in the east of Hanover at site no. 53 Königslutter, which is relatively high. With respect to the stem-and-leaf plot these three observations are not far away from the rest of the data and hence, are not indicated as outliers now. Nevertheless, these observations are of special attention, they are included in the following analysis since statistical modelling is a process of learning about the data. An analysis of the residuals may give an answer to the outlier problem. 6.2. S PATIAL
MODELLING , ESTIMATION , AND PREDICTION
It is assumed that the sample Z = (Z(s1 ), ..., Z(s39 ))0 at locations s1 , ..., s39 ∈ D ⊂ IR 2, from which the observations z(s1 ), ..., z(s39 ) are taken, represent a potentially infinite number of random variables indexed by the location index si , i ∈ IN. The set of all random variables Z(si ), si ∈ D form a spatial process Z = { Z(s) : s ∈ D ⊂ IR 2 }. This process is modelled by a two component model as in (1). Let the spatial trend being a linear function of the site co-ordinates only, si = (xi , yi )0 ∈ IR 2 , i = 1, ..., n: E(Z) = µ = F β using matrix notation for β= (β0 β1 β2 )0 and F = (f(s1 )0 , ..., f(sn )0 )0 with f(si )0 = (1 xi yi )0 , or equivalently F = (f0 (s), ..., f2 (s)) with f0 (s) = 1, f1 (s) = x, and
ESTIMATION AND PREDICTION IN THE SPATIAL LINEAR MODEL
227
f2 (s) = y. From the explanatory spatial data analysis it is decided to model the mean structure by a polynomial trend surface of order 1. A constant mean assumption is seen to be unrealistic. On the other hand, a higher order trend surface would result in overfitting caused by the small number of data at hand and with respect to the triple scatterplot, see Figure 3. This is supported by an ordinary least squares analysis as well. Linear statistical inferences like BLUE and BLUP are also based on the second order moments of the process under study. To estimate the vector of unknown trend parameters β by generalised least squares, one must know the covariance matrix for the sample vector Z. Here the covariogram σZ (d, θ) of the process Z and hence the covariance matrix 6(θ) of the sample Z is not known a prior. To complete the model identification a valid parametric structure function (e.g. a model for the covariogram or semivariogram) has to be selected. A general class of parametric structure functions is the Matérn class (cf. Handcock and Stein, 1993). Furthermore, any linear combination of the members of this class result in valid models of structure functions. Only a few of them (for special values of the parameters) are used in most of the applications. For a review of these models see Cressie (1993, Sec. 2.3 and 2.5). Often the spherical model is recommended to represent the spatial dependence structure of deposition data (see, Cressie, 1993, p. 265; Berke and Busch, 1994). Another structure model with similar characteristics to the spherical model is the exponential model. The formula for the exponential model with nugget effect is as follows σ (d, θ) =
θ1 + θ2 θ2 exp{−d/θ3 }
d=0 . d>0
The parameters θ1 and θ2 are called nugget effect and partial sill respectively, θ3 can be called the asymptotic range. Since the use of the spherical model is increasingly being discouraged (cf. Haslett, 1989; Zimmerman and Zimmerman, 1991) an exponential model is fitted to the data. For estimating the unknown model parameters several methods are available (cf. Cressie, 1993, Sec. 2.6). Some may be called the geostatistical methods, e.g. weighted least squares estimation (WLSE). These are used to fit a chosen structure model to empirical estimates of the function at several distances and need large samples. The choice of the distance classes may affect the parameter estimates seriously when based on few data (n ≤ 50). Other methods are the traditional variance components methods for mixed linear models, e.g. minimum norm quadratic estimation and maximum likelihood estimation. Variance components methods fit the model parameters directly to the data. Further, they do not depend on an arbitrary chosen distance class. Zimmerman and Zimmerman (1991) compare several estimators for stationary isotropic Gaussian processes. From that WLSE seems to be preferable.
228
O. BERKE
When the data are affected by an polynomial trend, Zimmerman and Harville (1991) propose using the restricted maximum likelihood (REML) method in connection with a grid search procedure. The REML estimator b θ has the property not to depend on β. However, one must assume the process Z to be Gaussian and check this after fitting the model. On the other hand, simulation studies with nonGaussian data turned out (cf. Kitanidis, 1985) that the REML approach – which assumes normality – gave much better results than other variance components estimation methods. Hence REML is more robust against departures from the Gaussian assumption than other methods. Further, assume the process Z is isotropic and the model of the covariogram is known. To find an appropriate structure model one of the general model selection criteria can be applied, e.g. Akaike information criterion. In addition, examination of the empirical semivariogram or covariogram of the detrended data together with the estimated model is advantageous. To solve the problem of maximising the profile loglikelihood for θ the search procedure was restricted to find the REML parameter estimates in the grid of θ2 ≤ 0.4 and 0.1 ≤ b θ3 ≤ 0.5 with increments equal to 0.05 ≤ b θ1 ≤ 0.15, 0.1 ≤ b 0.01. The REML estimates are b θ = (b θ1 , b θ2 , b θ3 )0 = (0.07, 0.4, 0.25)0 . Estimation of β is performed simultaneously with that of θ. The trend parameter estimates are b β = (βb0 , βb1 , βb2 )0 = (5.63, −1.17, 0.30)0 . Compared to the covariogram, the semivariogram has characteristics which makes it a better graphical tool. Figure 4 shows the isotropic empirical semivariogram for the trend surface residuals Z − Fb β, based on the method-of-moments estimator (Cressie, 1993, p. 69), and the REML estimated exponential semivariogram model. Substantial deviation between the two are not observable, i.e. the empirical estimates are close to the REML estimated parametric model. So one hopes that the grid search produced good parameter estimates. The suspicion of a hole effect is treated in the discussion.
Figure 4. Empirical semivariogram (o) for the detrended data and REML estimated exponential model (—).
The estimated trend surface or the EBLUE for the fixed effects Fb β is given in Figure 5. This is based on the empirical generalised least squares estimate b β. Figure 6 shows the map of the EBLUP of the fixed and random effects for 1134
ESTIMATION AND PREDICTION IN THE SPATIAL LINEAR MODEL
229
Figure 5. Empirical trend surface of polynomial order 1 for pH in open area precipitation, for April 1987.
locations of a regular grid laid over the sampling domain D, which result from empirical universal kriging point predictions for the observed realisation of the acid precipitation process in April 1987. By comparison of the predictions (EBLUP) with the triple scatterplot (Figure 3) one may conclude that kriging works well here. However, in Figure 7 the half prediction error map is given. It is based on one half of the nominal 95% prediction interval for Z(s0 ), s0 ∈ D, under the assumption of normality, see e.g. Cressie (1993, p. 155) for the formula. The error map shows the quality of the predictions, they are not better than 0.74 in degrees of pH, which is the range of the smallest prediction interval. The range of the prediction interval increases with the distance to the nearest monitoring site. This results in inaccurate predictions at the edge and in sparsely sampled regions of Lower Saxony.
230
O. BERKE
Figure 6. Empirical universal kriging prediction for pH in open area precipitation, for April 1987.
Figure 7. Error map of one half of the 95% prediction interval for the predicted pH data.
ESTIMATION AND PREDICTION IN THE SPATIAL LINEAR MODEL
6.3. D IAGNOSTIC
231
CHECKING
After the model for the data is proposed (model identification) and the unknown parameters are estimated (model specification) one must check the fit of the model (model validation). Crossvalidation is a prefered method to check the fit of the model. This is done by deleting some of the data and using the remaining data to predict the deleted observations. This method is documented elsewhere; see for example Cressie (1993, p. 101). However, by crossvalidation only the predictive properties of the model are evaluated. Furthermore, the statistical assumptions about the observed process have to be examined. Here the data are assumed to come from an isotropic Gaussian process. To validate the modelling assumptions all together is rather difficult. Hence, checking them one by one seems reasonable. But to do so is a conditional procedure, conditioned on the remaining assumptions to hold. Now consider the two component model Z(s) = µ(s)+δ(s) and call b δ the vector of (estimation or trend surface) residuals. To check for normality the Shapiro-Wilk W-Test (cf. Shapiro and Wilk, 1965) is applied to the prewhitened residuals, i.e. the prewhitened stochastic part of the model, that is β ). δ = 6(b θ)−1/2 ( z − Fb 6(b θ)−1/2 b The corresponding value for the Shapiro-Wilk test statistic is W = 0.975 which is larger than W39, 0.5 = 0.971 the 50 percentage point for a sample of size n = 39. Based on the W value and the normal probability plot, see Figure 8, the normality assumption holds. The normal probability plot is based on the Blom scores. Blom scores are approximations to the exact expected order statistics for the normal distribution, which appear to fit better than other normal scores (cf. Christensen, 1987). Further, the normal probability plot can be used to explore the presence of outliers. In such cases reveal outliers as extreme points in the plot lying off the linear relationship exhibited by the mass of the points in the plot (cf. Barnett and Lewis, 1994, p. 304). With respect to Figure 8 there are no untypical data. Thus observations from the sites 43, 44 and 53 which are in section 6.1 suspected to be outlying are conform with the model. Next the stationarity of the process will be examined. From the exploratory data analysis it is deduced that the observed values are not a sample from a stationary process, because non-stationarity in the mean structure was detected. Therefore the detrended data, i.e. the residuals b δ must be checked for stationarity. For linear inferences (BLUE, BLUP) the residual process is assumed to be only weak stationary, i.e. stationary in the mean and covariance. To check the mean stationarity of the trend surface residuals one can calculate the correlation of b δ with the corresponding spatial co-ordinates and inspect the residual plots. In addition, for a mean-stationary process the semivariogram must level out and reach a sill, which can be seen from Figure 4. Therefore it
232
O. BERKE
Figure 8. Normal probability plot for the prewhitened residuals against the Blom normal scores.
is concluded that the estimated residuals are mean stationary. Further, the trend surface residuals are uncorrelated with the estimated mean structure. Hence, the polynomial trend surface is assumed to fit.
Figure 9. Directional empirical semivariograms for the east-west (4) and north-south (2) directions.
The isotropy assumption will be checked by visual inspection of some empirical semivariograms. Following Clark (1979, p. 14), isotropy or direction invariance can be deduced if the semivariograms for several directions coincide. The directional empirical semivariograms for the main directions (north-south, east-west) are shown in Figure 9. In general, the directional semivariograms do not differ too much and also agree with the estimated isotropic exponential model. In Figure 10 the directional semivariograms for the northeast-southwest and northwestsoutheast directions show a slightly different pattern. Just as for the other two
ESTIMATION AND PREDICTION IN THE SPATIAL LINEAR MODEL
233
Figure 10. Directional empirical semivariograms for the northeast-southwest (4) and northwest-southeast (2) directions.
directional semivariograms these two may be well represented by an exponential model with the same range and partial sill, but they exhibit different nugget effects. Maybe, this is a hint for anisotropy, but not for an easy to handle geometric anisotropy. Guttorp and Sampson (1994) review several different methods for structure function estimation from monitoring data when the underlying spatial process is not isotropic. For convenience this behaviour is regarded as an effect of estimating the empirical semivariograms with only a few data and the assumption of isotropy is adopted.
7. Summary and Discussion In geostatistics spatial data are assumed to represent a sample from a spatial process. As in time series analysis, the spatial data are modelled by using component models, but now, the components represent proportions of the total process variation. The two component model is useful in the context of linear regression models, whereas the three component model is more appealing for interpreting the spatial data from a mixed model point of view. Trend surface analysis and universal kriging are the geostatistical methods for estimation and prediction within the framework of linear models. Both methods are demonstrated by an application to acid-precipitation data. The link between universal kriging and best linear unbiased prediction in mixed linear models is already mentioned by Christensen (1987) and Robinson (1991). Here the link between trend surfaces and best linear unbiased estimation is elaborated. Because the defining matrix H in the mixed linear model is unknown, the mixed model is of little practical relevance. But the mixed model brings up the geostatistical methods
234
O. BERKE
in a new light and allows for further interpretation and deeper understanding of the results derived by geostatistics. For example, methods like REML which are useful for variance components estimation in mixed linear models are from this point of view of natural choice to estimate the unknown structure parameters of a spatial process. Particularly in such cases where the spatial sample is small so that traditional geostatistical methods fails, e.g. in environmental monitoring data sets. BLUE and BLUP are based on the knowledge of the spatial structure parameter θ of the observed random variables. However, in practice neither β nor θ is known. Therefore trend surface estimation as well as universal kriging suffer from the circular problem that optimal trend estimation and optimal spatial structure estimation are dependent on one another. However, it is common practice to substitute the unknown structure parameter by an estimator, yielding the so called estimated or empirical results: EBLUE and EBLUP, respectively. As a consequence the mean square error of estimation as well as the mean square error of prediction is affected by this approach and will in general be subject to underestimation (cf. Zimmerman and Cressie, 1992). Since the geostatistical methods are frequently in use to environmental data which come from phenomena distributed in space as well as in time the extension of the pure spatial methods for application on space-time gproblems should be considered. In the monitoring of an environmental or ecological phenomenon, as in the example just described, the spatial observations are taken repeatedly over time. Hence, the data are regarded as part of a realisation from a space-time process or more precisely as a temporal sequence of spatial processes. With that the data may be analysed within the framework of dynamic linear models (cf. Thiébaux, 1991; Goodall and Mardia, 1994). By combining the Kalman filter recursions (cf. West and Harrison, 1989, p. 108), which give the optimal temporal predictor, with the trend surface estimator and universal kriging predictor Berke (1996b) develops optimal linear spatiotemporal predictors. These spatiotemporal predictors can be interpreted as prior and posterior predictors from the Bayesian point of view as well as in the settings of mixed linear models (cf. Berke, 1996a). A recent paper by Huang and Cressie (1996) gives similar results together with an example. The analysis of acid precipitation data often starts by transforming the pH data using the Box-Cox power transformation method. Here the Shapiro-Wilk W-test assured normality on the original scale. Hence transformation of the data was not needed. Further, in many applications the data will be weighted by the amount of precipitation. This is useful if the objective of the analysis is to calculate the deposition rather than the concentration. The example given here was motivated by forest decline which is regarded as an effect of acid rain, i.e. low hydrogenion concentrations or equivalently low pH values. Charlson and Rodhe (1982) investigate the variation of rainwater acidification. They explain how the pH value is controlled by certain chemicals. Further it is concluded that the pH value will vary in space and time according to the concentrations of the controlling factors.
ESTIMATION AND PREDICTION IN THE SPATIAL LINEAR MODEL
235
Estimation of the structure function from only 39 data points may seem difficult for practitioners of geostatistics. In environmental monitoring the data sets are often of such small sizes, see Cressie (1993, Sec. 4.6) for the same application with only 19 data points. The approach taken here is based on the result of the preceding explorative data analysis and adaptation of variance components methods, i.e. REML estimation, which are traditionally used in the settings of mixed linear models. The spatial trend of decreasing pH level from the north-west to the south-east is an effect reflecting the generally north-westerly winds in connection with anthropogenous air pollution in Lower Saxony. On the other hand one may wonder that the directional semivariograms did not identify anisotropy with the trend, i.e. wind direction. A reason for this may be the fact, that samples from several rain events during one month were averaged. The local wind direction can change considerably for different rain events. As a result from the exploratory data analysis three observations at boundary sites are suspected to be outliers. With respect to the normal probability plot this suspicion could be rejected here. But more work should be spent on this problem. However, there has been less work on the subject of outliers, influence and residuals in regression models when observations are spatially correlated. An exception is the paper by Martin (1992). Furthermore with respect to Figure 4 and the Figures 8 and 9 one may argue that the data show the pseudo periodic structure called hole effect. This suspicion is due to the drop of all the empirical semivariograms between distances d = 0.50 and 0.75. This may reflect a violation of the assumptions, e.g. mean stationarity. However, the estimated trend surface residuals are not correlated with the estimated trend surface. Hence they are not trend contaminated. To best of the authors knowledge a hole effect has never been observed in acid-precipitation data. Anyway and with reference to Journal and Huijbregts (1978, p. 170) the experimental hole effect that is open to a douptful interpretation can simply be ignored for estimation purposes. Further, one of the referees points out, that the empirical semivariogram estimates would be within simulation based tolerance intervalls for the fitted theoretical semivariogram model. These are the reasons why this phenomenon is referred here to be an effect of the small sample size.
Acknowledgement The author is indebted to Dr. R. J. Martin at the University of Sheffield for helpful comments and suggestions and Professor W. Urfer at the University of Dortmund for many discussions. Furthermore the author wishes to thank Dr.-Ing. J. Hölscher and A. Gertsmeier at the Department for Ecology of Lower Saxony in Hildesheim (NLÖ) for providing the acid precipitation data. This work was sup-
236
O. BERKE
ported by the Graduate College on ‘Applied Statistics’ of the German Research Foundation (DFG).
References Agterberg, F. P.: 1984, ‘Trend Surface Analysis’, in G. L. Gaile and C. J. Willmott (eds.), Spatial Statistics and Models, D. Reidel Publ. Co., Dordrecht, pp. 147–171. Anscombe, F. J.: 1973, The American Statistican 27, 17. Barnett, V. and Lewis, T.: 1994, Outliers in Statistical Data, third edition. Wiley, New York. Berke, O.: 1996a, ‘Linear Estimation and Prediction Based on Environmental Monitoring Data with an Application to Acid Precipitation’, Research Report 96/1, University of Dortmund, Department of Statistics. Berke, O.: 1996b, ‘On Spatiotemporal Prediction for Online Monitoring Data’, Research Report 96/9, University of Dortmund, Department of Statistics. Submitted for publication to Communications in Statistics – Theory and Methods. Berke, O. and Busch, U.: 1994, ‘Statistische Analyse von Räumlich und Zeitlich Verteilten Daten’, in E. Kublin and R. Lasser (eds.), Proceedings of the Study Group on Ecology. German Region of the International Biometric Society 6, pp. 17–32. Box, G. E. P. and Cox, D. R.: 1964, Journal of the Royal Statistical Society B 26, 211. Brus, D., De Gruijter, J., Marsman, B., Visschers, R., Bregt, A., Breeuwsma, A. and Bouma, J.: 1996, Environmetrics 7, 1. Busch, U., Brechtel, H. M. and Urfer, W.: 1994, ‘Statistical Methods for the Evaluation of Hydrological Parameters in Landuse Planning’, in V. Barnett and K. F. Turkman (eds.), Statistics for the Environment 2: Water Related Issues, Wiley, Chichester, pp. 319–344. Campbell, K.: 1991, Statistical Science 6, 32. Charlson, R. J. and Rodhe, H.: 1982, Nature 295, 683. Christakos, G. and Thessing, G. A.: 1993, Atmospheric Environment 27A, 1521. Christensen, R.: 1987, Plane Answers to Complex Questions: The Theory of Linear Models, Springer, New York. Christensen, R.: 1990, ‘The equivalence of predictions from universal kriging and intrinsic random function kriging’, Mathematical Geology 22, 655. Christensen, R.: 1991, Linear Models for Multivariate, Time Series, and Spatial Data, Springer, New York. Cressie, N.: 1986, Journal of the American Statistical Association 81, 625. Cressie, N.: 1993, Statistics for Spatial Data, revised edition. Wiley, New York. Clark, I.: 1979, Practical Geostatistics, Applied Science Publishers, London. Goldberger, A. S.: 1962, Journal of the American Statistical Association 57, 369. Goodall, C. and Mardia, K. V.: 1994, ‘Challenges in Multivariate Spatio-Temporal Modelling’, Proceedings of the 17th International Biometric Conference 1, pp. 1–17. Guttorp, P. and Sampson, P. D.: 1994, ‘Methods for Estimating Heterogeneous Spatial Covariance Functions with Environmental Applications’, in G. P. Patil and C. R. Rao (eds.), Handbook of Statistics XII: Environmental Statistics. North-Holland, Amsterdam, pp. 661–689. Haining, R. P.: 1987, Technometrics 29, 461. Handcock, M. S. and Stein, M. L.: 1993, Technometrics 35, 403. Haslett, J. and Raftery, A. E.: 1989, Applied Statistics 38, 1. Henderson, C. R.: 1950, The Annals of Mathematical Statistics 21, 309. Henderson, C. R., Kempthorne, O., Searle, S. R. and von Krosigk, C. N.: 1959, Biometrics 13, 192. Hölscher, J., Rost, J. and Walther, W.: 1994, Wasser und Boden 1, 20. Huang, H.-C. and Cressie, N.: 1996, Computational Statistics and Data Analysis 22, 159.
ESTIMATION AND PREDICTION IN THE SPATIAL LINEAR MODEL
237
Journal, A. G. and Huijbregts, C. J.: 1978, Mining Geostatistics, Academic Press, London. Kent, J. T. and Mardia, K. V.: 1994, ‘The Link Between Kriging and Thin-Plate Splines’, in F. P. Kelly (ed.) Probability, Statistics and Optimization, Wiley, New York, pp. 325–339. Kitanidis, P.: 1985, Water Resources Bulletin 23, 557. Laslett, G. M.: 1994, Journal of the American Statistical Association 89, 391. Martin, R. J.: 1992, Communications in Statistics. Theory and Methods 21, 1183. Matheron, G.: 1969, ‘Le Krigeage Universel’, Cahiers du Centre de Morphologie Mathematique, No. 1. Fontainebleau, France. Rao, C. R. and Toutenburg, H.: 1995, Linear Models: Least Squares and Alternatives, Springer, New York. Renshaw, E. and Ford, E. D.: 1983, Applied Statistics 32, 51. Ripley, B. D.: 1981, Spatial Statistics, Wiley, New York. Robinson. G. K.: 1991, Statistical Science 6, 15. Sampson, P. D. and Guttorp, P.: 1992, Journal of the American Statistical Association 87, 108. Shapiro, S. S. and Wilk, M. B.: 1965, Biometrika 52, 591. Stein, A., 1994, Geoderma 62, 199. Thiébaux, H. J.: 1991, Environmetrics 2, 5. Urfer, W.: 1993, ‘Statistical Analysis of Climatological and Ecological Factors in Forestry: Spatial and Temporal Aspects’, in V. Barnett and K. F. Turkman (eds.), Statistics for the Environment, Wiley, Chichester, pp. 335–345. Urfer, W., Schwarzenbach, F. H., Kötting, J. and Müller, P.: 1994, Environmental and Ecological Statistics 1, 171. Watson, G. S.: 1971, Mathematical Geology 3, 215. Ware, J. H.: 1994, Environmental and Ecological Statistics 1, 196. West, M. P. and Harrison, P. J.: 1989, Bayesian Forecasting and Dynamic Models, Springer, New York. Zimmerman, D. L. and Cressie, N.: 1992, Annals of the Institute of Statistical Mathematics 44, 27. Zimmerman, D. L. and Harville, D. A.: 1991, Biometrics 47, 223. Zimmerman, D. L. and Zimmerman, M. B.: 1991, Technometrics 33, 77.