J. Appl. Math. & Computing Vol. 21(2006), No. 1 - 2, pp. 477 - 483
Website: http://jamc.net
ARM USING INDIVIDUAL ESTIMATOR FOR VARIANCE JONG-CHUL OH∗ , YUN LU AND YUHONG YANG
Abstract. Different nonparametric procedures in regression analysis perform well under different conditions. A combining method, adaptive regression by mixing (ARM), was proposed for random design. ARM was introduced as well in case of the fixed design (ARMC). In this article, we focus on the individual estimate for variance in ARMI algorithm. Prediction performance and individual σ ˆ 2 need to be estimated in order to assign weight to different procedures. Simulation results show that the ARMI performs better or similarly compared to CV estimator. AMS Mathematics Subject Classification : 62G05 62G08 Key words and phrases : ARM, ARMI, CV, Performance Criteria
1. Introduction Regression estimation includes parametric and nonparametric approaches. Quite a number of different regression estimators have been proposed for each of regression function. Yang [5] proposed a practical algorithm, which is the combination method, adaptive regression by mixing (ARM). It is used to combine estimators of a regression function based on the same data. Yang’s results showed that under mild conditions, ARM performs optimally in rates of convergence. Oh et al. [3] provided a practically feasible weighting method ARMC for one-dimensional regression with fixed design to combine the three regression approaches. Oh et al.’s [3] results suggest that ARMC is good choice for model combination in case of common estimator for σ 2 for all different procedures. In this article, we propose an algorithm ARMI, which is to combine the three regression procedures like ARMC except method for estimating variance. σ ˆ2 need to be estimated in order to assign weight to different three procedures. ARMC is to have one common estimator for σ 2 for all different procedures. However, ARMI is for each regression procedures, to estimate σ 2 differently. Received September 24, 2005. ∗ Corresponding author. c 2006 Korean Society for Computational & Applied Mathematics and Korean SIGCAM.
477
478
Jong-Chul Oh, Yun Lu and Yuhong Yang
In the next section, some preliminaries, polynomial regression, smoothing spline and local regression, are introduced. Section 3 contains the algorithm ARMI and CV for model selection. Finally we present the practical performances through simulation studies and the conclusions. 2. Preliminaries 2.1 Regression Analysis Regression estimation includes parametric and nonparametric approaches. A parametric regression model assumes that the form of regression function f is known except for finitely many unknown parameters. Parametric models can depend on the parameters in a linear or nonlinear fashion. Using an appropriate estimation methodology, such as least-squares, it is possible to utilize the data to estimate parameters and thereby estimate f . Another collection of procedures is nonparametric regression techniques. In nonparametric regression, it is not assumed that regression function f takes parametric form which gives us great flexibility. Nonparametric regression techniques rely more heavily on the data for information about than their parametric counterparts. Parametric regression techniques require very specific, quantitative information about the form of f . Parametric regression is more appropriate when theory, past experience are available that provide detailed knowledge about the process under study. However, Nonparametric regression methods are best suited for inference in situations where there is little or no prior information available about the regression curve. 2.2 Polynomial Regression Polynomial regression estimators represent an important cornerstone in the theory of nonparametric regression. [2] A polynomial regression model would have x, x2 , x3 and so forth as independent variables. The problem with polynomial regression is that it runs into perfect multicollinearity quite quickly as terms are added. In low dimensions, polynomial regressions are not flexible enough to capture sudden changes in slope, especially at irregular intervals. In high dimensions, polynomial regressions tend to fail due to perfect mulicollinearity. 2.3 Smoothing Spline Splines are generally defined as piecewise polynomials [2] in which curve (or line) segments are constructed individually and then pieced together. In a spline model, a turning point is represented by a spline knot. There are different types of splines. Spline regression models have substantially greater flexibility than polynomial regression models in low dimensions and are generally less likely to generate perfect multicollinearity in higher dimensions.
ARM using invidual estimator for variance
479
2.4 Local Regression Local regression was called locally weighted regression. ([1], [4]) In local regression the size of a neighborhood is referred to as the bandwidth and the neighborhoods are overlapping. The objective of local regression is to identify the model that is appropriate for each data segment. Local regression is used to model a relation between a predictor variable (or variables) X and response variable Y , which is related to the predictor variables. For a fitting point x and a bandwidth h(x), only observations within window (x − h(x), x + h(x)) are used to estimate f (x). The weights for the xi depend on their distance from x. The bandwidth must be chosen to compromise bias-variance trade-off. The simplest case is to choose a constant bandwidth h(x) = h. 2.5 Performance Criteria Suppose there are several families C(Λ) = {fλ : λ ∈ Λ} of estimators for the regression function, where Λ represents some index set. The problem to be considered is selection of a best estimator fˆλ of f from among the elements of {fλ : λ ∈ Λ}. There are certain critera which are widely accepted and used. The loss in estimating f on [0, 1] is defined as Z 1 L(λ) = (f (x) − fˆλ (x))2 dx. 0
L(λ) represents a natural measure of the closeness of fˆλ to f . The expected value of L(λ) is called the risk, i.e. R(λ)= E L(λ). Both L(λ) and R(λ) provide assessments of an estimator’s performance with smaller values of the criteria being indicative of better estimation. A value of λ that minimizes the loss provides a best estimate of f among those considered λ ’s for the particular data set in question while the value of λ that minimizes the risk can be viewed as the best for prediction of future responses or estimation of f in repeated sampling. 3. ARM Using Individual Variance Estimator In this section, the regression setting Yi = f (xi ) + i with fixed design points xi = i/n (i = 1, 2, . . . , n) are considered. The error term is assumed to follow normal distribution with mean 0 and unknown variance σ 2 . Our goal is to estimate the regression function f based on data Z n = (xi , Yi )ni=1 . Adaptive regression by mixing (ARM) is used for combining different procedures. ([3], [5]) We consider three regression methods: polynomial regression (j = 1), smoothing spline (j = 2) and local regression (j = 3). Cross-validation is used as criteria to do model selection. 3.1 ARM Algorithm Using Individual σ ˆ2 .
480
Jong-Chul Oh, Yun Lu and Yuhong Yang
We split the data into two parts according to the x values. The first part is used for estimation by each regression procedure (j = 1, 2, 3) and the second part is used to assess the prediction performance and assign weights to regression procedures. σ 2 need to be estimated in order to assign weights to different procedures. In this section, we use the method which is for each regression procedure, to estimate σ 2 differently. We assume the observations are ordered in x. The detailed combining algorithm is as follows. Step 0. Step 1.
Obtain estimates fˆn,j (x; Z) based on data Z n = (xi , Yi )ni=1 , i = 1, . . . , n, using regression procedure j, j = 1, 2, 3. Split the data into two parts n/2 n/2 Z (1) = x2l−1 , Y2l−1 , Z (2) = x2l , Y2l . l=1 l=1 0 0 0 n , where Rearrange the data so that Z = xi , Yi i=1 0 0 n/2 0 0 n Z (1) = xi , Yi , and Z (2) = xi , Yi . i=1
Step 2.
2 σ ˆn/2,j (x; Z (1) ) =
Step 3.
i=n/2+1
Obtain estimates fˆn/2,j (x; Z (1) ) based on Z (1) for 1 ≤ j ≤ 3. Estimate the individual variance function σ 2 by n/2 2 X 0 0 1 Yi − fˆn/2,j (xi ) . n/2 − df i=1 0
For each j, evaluate predictions. For Z (2) , predict Yi by fˆn/2,j (xi ). Compute
(2π)−n/4 exp − Ej =
n X
0
0
2 ((Yi − fˆn/2,j (xi ))2 /(2ˆ σn/2,j )
i=n/2+1
n/2
.
σ ˆn/2,j
Step 4.
Compute the current weight for each procedure j. Ej Wj = 3 . X El
Step 5.
The final estimator is
Let
l=1
ˆ f(x) =
3 X
Wj fˆn,j (x).
j=1
This combined estimator is compared with CV estimator by simulations in section 4.
ARM using invidual estimator for variance
481
3.2 Cross-validation (CV) Algorithm for Model Selection Instead of using ARM to obtain combined estimator, we can use cross - validation to do model selection. Prediction residual sum of squares is used as a criterion, and the model generates the smallest prediction residual sum of squares is selected. Same as the ARM algorithm, half of the data is used for estimation and the other half for prediction. Step 0. Step 1.
Obtain estimates fˆn,j (x; Z) based on data Z n = (xi , Yi )ni=1 , i = 1, . . . , n, using regression procedure j, j = 1, 2, 3. Split the data into two parts n/2
Z (1) = (x2l−1 , Y2l−1 )l=1 ,
n/2
Z (2) = (x2l , Y2l )l=1 . 0
Step 2. Step 3.
0
0
Rearrange the data so that Z = (xi , Yi )ni=1 , where 0 0 n/2 0 0 Z (1) = (xi , Yi )i=1 , and Z (2) = (xi , Yi )ni=n/2+1 . Obtain estimates fˆn/2,j (x; Z (1) ) based on Z (1) for 1 ≤ j ≤ 3. 0 For each j, evaluate predictions. For Z (2) , predict Yi by fˆn/2,j (xi ). CV is used as a criterion to select the best regression procedure. n X 0 0 CV (fˆj ) = (Yi − fˆn/2,j (xi ))2 i=n/2+1
Step 4.
The final estimator is fˆn,j (x; Z), with procedure j generating the smallest CV .
4. Experiments In this section, we demonstrate the performance ARMI of in simulations. ARMI stands for ARM using individual σ ˆ 2 . We consider the following function as the true underlying regression function on [0, 1]. √ 2 2 f (x) = (e−200(x−0.2) + e−200(x−0.8) )/ 0.005π.
(1)
To simplify the presentation, assume that the design points are equally spaced on the unit interval. Hence dependent variables Yi are taken at point xi = ni . Here the sample size is taken to be 100 and σ = 0.5 or 1.0 for the constant variance condition. A typical realization of data for (a) σ = 0.5 and (b) σ = 1.0 are plotted in Figure 1. The squared L2 risk is used as a measure of discrepancy in estimating the regression function. The risk is unknown since it depends on the unknown regression function f . Monte Carlo methods can be used to simulate the risk. In our case, a large number of new x values (n.xnew = 500), say , xi , 1 ≤ i ≤ 500
482
Jong-Chul Oh, Yun Lu and Yuhong Yang
8 6 y
4 2 0
b bbbbb b b b bb bb b b b bb bb bb bb bb b b b bb bbb bb bbbb b b b b b bbbbb b bb b bb bbbbbbbb bbb bb bb b b bbbb bb b b bbbb b b bbbbbbb bbbbb b
0.0
0.2
0.4
0.6
0.8
8 6 4
y
2 0
1
bb bb bb b bb bbbb bb b bbb b b b b b b bb b b bb b bbb b b bbb bb bb b bbbb bb bb bbbb bbb bbbbb b b bb bb b b bb b b b bb b bb b b b b bb bbbbb bb b b
0.0
0.2
0.4
0.6
x
x
(a) σ = 0.5
(b) σ = 1.0
0.8
1
Figure 1. Typical realization of data for the true underlying regression function (1)
are generated independently from the uniform distribution on [0, 1]. The loss of the estimator can be calculated by n.xnew X 1 L= (f (xi ) − fˆn,λ (xi ))2 . n.xnew i=1 The squared L2 risk is computed based on runct = 100 replications. All the simulations are conducted using Splus 2000. runct X
R=
Lk
k=1
runct
. Table 1 is listed below to illustrate the performance of model selection criteria and ARMI for different σ values. “poly” stands for polynomial regression, “locreg” stands for local regression, and “smsp” stands for smoothing spline. The results suggest that in average ARMI generates smaller or similar risks compared with model selection criteria CV. Table 1. Comparing model selection CV with ARMI for different σ values : n = 100, n.xnew = 500, runct = 100.
σ 0.5 1.0
risk s.e. risk s.e.
poly 7.2992 0.0458 7.3980 0.0463
locreg 0.0732 0.0018 0.1806 0.0054
smsp 0.0506 0.0020 0.1808 0.0059
CV 0.0593 0.0023 0.1803 0.0057
ARMI 0.0649 0.0021 0.1792 0.0055
ARM using invidual estimator for variance
483
Table 1 shows that polynomial regression generates much higher risk than the other two methods and it has been assigned very low weight. The results match intuition since the true underlying regression function (1) has two peaks. Therefore polynomial regressions are not supposed to perform well. Because local regression and smoothing spline have similar risk, CV and ARMI perform similarly. Moreover, the risk is close to the risk of the best regression procedure. When there is no extreme case, the result suggests that ARMI is a good choice for model combination. 5. Conclusions In this article, we proposed two algorithm ARMI and CV, and compared the performance of ARMI and model selection criteria CV. One-dimensional regression with fixed design and the three regression procedures (poly, locreg and smsp) are considered. The data are split into two parts. The first part is used for estimation and second part is used for prediction. The results suggest that in average ARMI generates smaller or similar risks compared with model selcetion criteria CV. Moreover, the risk is close to the risk of the best regression procedure. Therefore, when there is no extreme case, the result suggests that ARMI is a good choice for model combination.
References 1. W. S. Cleveland, Robust Locally Weighted Regression and Smoothing Scatterplots, Journal of the American Statistical Association, 74 (1979), 829-836 2. R. L. Eubank, n Spline Smoothing and Nonparametric Regression, 1st edition, Marcel Dekker, Inc., 1988 3. J. C. Oh, Y. Lu and Y. Yang, Adaptive Regression by Mixing for Fixed Design, The Korean Communications in Statistics, 12 (2005), 713-727 4. C. J. Stone, Consistent Nonparametric Regression, Annals of Statistics, 5 (1977), 595-620 5. Y. Yang, Adaptive Regression by Mixing, JASA, 96 (2001), 574-588 Jong-Chul Oh received his MS and Ph.D at Korea Advanced Institute of Science and Technology (KAIST) in 1989 and 1995, respectively. Since 2001 he has been an associate Professor at the Kunsan National University. His research interests are concerned with Kernel estimation, Bayesian estimation and Fuzzy theory. Faculty of Mathematics, Informatics and Statistics, Kunsan National University, Kunsan 573-701, Korea e-mail:
[email protected] Yun Lu received her MS at Iowa State University in 2002. Yuhong Yang received his Ph.D at Yale University in 1996. Since 2003 he has been an associate Professor at University of Minnesota. His main research interest is on nonparametric function estimation. He has worked on several topics in this field concerning issues such as intrinsic capability of statistical estimation and classification, model selection and combining procedures to share individual strengths. School of Statistics, 313 Ford Hall, 224 Church Street S.E., Minneapolis, MN 55455 e-mail:
[email protected]