Statistics and Computing 13: 221–226, 2003 C 2003 Kluwer Academic Publishers. Manufactured in The Netherlands.
Improved generalized Fourier amplitude sensitivity test (FAST) for model assessment SHOUFAN FANG∗ , GEORGE Z. GERTNER† , SVETLANA SHINKAREVA∗∗ , GUANGXING WANG∗ and ALAN ANDERSON‡ ∗ Department of Natural Resources and Environmental Sciences, University of Illinois at Urbana-Champaign, W503 Turner Hall, Urbana, IL 61801, USA † Department of Natural Resources and Environmental Sciences, Department of Statistics, University of Illinois at Urbana-Champaign, 1102 S. Goodwin Ave., Urbana, IL 61801, USA
[email protected] ∗∗ Department of Psychology, University of Illinois at Urbana-Champaign, 603 E. Daniel Street, Champaign, IL 61820, USA ‡ U.S. Army Construction Engineering Research Laboratories, P.O. Box 9005, Champaign, IL 61826-9005, USA
Received June 2002 and accepted December 2002
The Fourier amplitude sensitivity test (FAST) can be used to calculate the relative variance contribution of model input parameters to the variance of predictions made with functional models. It is widely used in the analyses of complicated process modeling systems. This study provides an improved transformation procedure of the Fourier amplitude sensitivity test (FAST) for non-uniform distributions that can be used to represent the input parameters. Here it is proposed that the cumulative probability be used instead of probability density when transforming non-uniform distributions for FAST. This improvement will increase the accuracy of transformation by reducing errors, and makes the transformation more convenient to be used in practice. In an evaluation of the procedure, the improved procedure was demonstrated to have very high accuracy in comparison to the procedure that is currently widely in use. Keywords: Fourier amplitude sensitivity test (FAST), model assessment, sensitivity analysis, uncertainty assessment
Introduction Process models often involve very complex and nonlinear mathematical expressions that attempt to simulate the underlying mechanisms of different modeling systems. Many of those models are not amenable to analytically derived uncertainty relationships. The Fourier amplitude sensitivity test (FAST) is one of the commonly used computational methods to investigate the sensitivity of uncertainty sources in predictions derived from complex functional models. It provides a method for the partitioning of the variance of model predictions due to uncertainty in input parameters of the model. It allows the systematic assessment of the influence of model input parameters on model predictions and provides a foundation for error management. It can be used to asC 2003 Kluwer Academic Publishers 0960-3174
sess how uncertainty due to input parameters propagates through the modeling system. Because of the advantages of FAST, it has been applied in sensitivity analysis of highly nonlinear models for complicated dynamic systems, such as climatic and meteorological models (Collins and Avissar 1994, Liu and Avissar 1996, Rodriguez-Camino and Avissar 1998); chemical rate reaction models (Cukier et al. 1973, Cukier et al. 1975, Schaibly and Shuler 1973); topographic models for spatially explicit soil erosion models (Crosetto and Tarantola 2001, Wang et al. 2000, Wang et al. 2001); and transportation models for underground nuclear waste disposal (Saltelli and Bolado 1998, Saltelli et al. 1999). FAST is based on the Fourier series and the distribution of model input parameters. It was originally developed for
222
Fang et al.
functional models whose input parameters were uniformly distributed. Cukier et al. (1973) and Schaibly and Shuler (1973) developed the procedure to transform the sine waves based on Fourier series expansion into the uniform distributions of input parameters. Their procedure has been improved in order to obtain true uniform distributions of the input parameters (Cukier et al. 1978, Koda et al. 1979, Saltelli et al. 1999). In order to extent FAST for models with non-uniform distributions, a transformation procedure has been developed (Cukier et al. 1978, Collins and Avissar 1994). Theoretically, the generalized transform procedures can be used for all continuous distributions. However, this generalized transformation procedure is based on the probability density functions (PDF) for the input parameters and a linear approximation. In practice, this procedure can cause errors when the probability density functions are highly nonlinear. Such errors will cause FAST to produce incorrect results in model uncertainty assessment. When FAST is used to assess complex modeling systems, it is not always intutitive when these errors occur, which can result in an erroneous uncertainty assessment. There is a need to improve the generalized transformation procedure in order to make it more accurate, practical and reliable in practice. The objective of this study is to improve the generalized tranformation procedure of FAST and to evaluate the improved procedure.
Review of FAST FAST was developed by Cukier et al. (1973). The principle of FAST is that a model (a function) can be expanded into a Fourier series. The Fourier coefficients can be used to estimate the mean and variance of the model. Further, combined with specific frequencies for input parameters, Fourier coefficients can also be used to estimate the partial variance (contributed model proportional variance) of individual input parameters of the model. The most important tasks in the use of FAST are choosing frequencies for input parameters and determining the values of the input parameters corresponding to the original sampled points of the search variable s (transformation process). Cukier et al. (1973) and Schaibly and Shuler (1973) have developed an equation to choose proper frequencies according to the number of input parameters in the model being assessed. According to the selected frequencies, they also suggested the minimum sample size for the search variable s. Initially, the transformation procedure is generally established based on the following transformation function (Cukier et al. 1973): (0)
xi = xi eu i [sin(ωi s)]
(1)
where xi is the (to be transformed) sample of the ith input pa(0) rameter of a model, xi is the nominal (“best”) value of xi , u i is a function to be determined, ωi is the frequency assigned to xi , and s is the search variable and its values are determined for
sample size N : 2π k , k = 1, . . . , N (2) N Due to the use of odd frequencies and the properties of Fourier series, in application of FAST, s = πr ( 2 j−r2 −1 ), j = 1, . . . , r and r = N /2. Based on equation (1), different tranformation functions have been specified for uniformly distributed input parameters (Schaibly and Shuler 1973, Koda et al. 1979, Saltelli et al. 1999). Considering the distribution function, Cukier et al. (1978) presented a general differential equation for solving the transformation function: du i (v) π(1 − v 2 )1/2 pi [u i (v)] =1 (3) dv where v = sin(ωi s), u i has the same meaning as it does in equation (1), and pi is the distribution function. Theoretically, equation (3) can be used to solve transformation functions associated with any continuous distribution. However, other than for simple distributions, the transformation can be difficult to solve. Based on the work of Cukier et al. (1978), Collins and Avissar (1994) developed a transformation procedure for non-uniform distributions. They introduced another parameter S, which has the following relation with the search variable s: s=
π Si = arcsin(sin ωi s) For one sampling point of the search variable s, such as, s j , it has a corresponding Si, j , an S value for the ith input parameter xi . Let Si, j be the difference between Si, j and Si, j+1 , determined respectively by s j and s j+1 . The relationship among Si, j , Si, j , the corresponding xi ’s, and PDF of xi ’s is (Collins and Avissar 1994): Si, j+1 = Si, j + Si, j = Si, j + (xi, j+1 − xi, j )[ p(xi, j+1 ) + p(xi, j )]/2
(4)
where p(xi, j ) is the probability density of xi at its jth value. Assuming that the change of probability density of xi is linear between a pair of its values, each value of xi is determined based on both its immediate preceding value and the increment of the corresponding S. Therefore, Collins and Avissar (1994, equation (B25) on p. 701) derived the following equation to transform xi from Si : 2 xi, j − [ p(xi, j ) − p (xi, j ) + 2a(Si, j+1 − Si, j )]/a xi, j+1 = if a = 0 xi, j − (Si, j+1 − Si, j )/ p(x) a=0 (5) where a = [ p(xi, j+1 ) − p(xi, j )]/(xi, j+1 − xi, j ). Providing the search variable s and p(xi ) are known, the linear assumption makes transformation of non-uniform distributions possible. However, the linear assumption is not reasonable when the curvature of the PDF is high. When the PDF has considerable curvature, at the same start point the change rates of PDF can vary for different endpoints (see Fig. 1(A)). Based on
Fourier amplitude sensitivity test (FAST) for model assessment
223
where P(x·, j ) is the cumulative probability of x·, j , the jth sampled point of an input parameter. When integrating the probability increments, the cumulative probability of x·, j is equal to SM(S j ): j [P(x·,k ) − P(x·,k−1 )] P(x·, j ) = P(x·, j ) − P(x·,0 ) = k=1
=
j
P(x·,k ) =
k=1
Fig. 1. Difference between using a probability density and using a cumulative probability in transformation. When probability density is used in the transformation (A), at point a, slopes of probability density between line segments ab, ab , and ab are different when the linear approximation is applied. When the cumulative probability is used in the transformation (B), given probabilities P and P can determine their corresponding values of the random variable through its cumulative probability function without any assumption about linearity
equation (5), when xi, j+1 is to be determined, the difference between this value and the value of its previous neighbor is unknown. Thus, the average change rate (a in equation (5)) of PDF is also unknown. When search techniques are used to find the acceptable average change rate of PDF, instability can occur. Another problem is that sampling error will be accumulated in the transformation process, because the previous transformed value of xi is used in the transformation of the new value of xi . In addition, with equation (5), it is also possible that the term under the radical can be negative.
Improvement Parameter S has an interval of [−0.5, 0.5] (Collins and Avissar 1994). Considering the integration of increments of S, the sum of S until the jth point, SM(S j ), is: SM(S j ) =
j k=1
Sk =
j
j
Sk = SM(S j )
(7)
k=1
where P(x·,0 ) = P(X ≤ min{x· }) = 0. Combining equations (6) and (7), the relationship between the cumulative probability at a sampled point of an input parameter and its corresponding S value is: P(x·, j ) = SM(S j ) = S j + 0.5
(8)
Based on equation (8), it is not difficult to obtain the value of the input parameter corresponding to a value of S, which can be converted from a sampled point of the search variable s. Figure 1(B) illustrates the process to find the values from their given corresponding cumulative probabilities. Based on S j and F(x· ), CDF of an input parameter, x·, j is determined by: x·, j = F −1 [P(x·, j )] = F −1 (S j + 0.5)
(9)
where F −1 [P(x· )] is the inverse function of the CDF of x· . When the inverse function in equation (9) is explicit, the exact value of x·, j can be computed. Otherwise, the use of some optimal search technique is necessary to determine the approximate value of x·, j using its CDF. Because the CDF is a monotonic increasing function, its error has only one (global) minimum when search techniques are used. There is no local minimum to disrupt the search direction. In searching the points with a specific cumulative probability, the error will always be smaller than the pre-determined threshold.
Evaluation (Sk − Sk−1 ) = S j − S0 = S j + 0.5
k=1
(6) where S0 is set to be the lower bound of S, which is −0.5. Based on equation (6), the largest and smallest sums of S are respectively: max{SM(S j )} = max{S j } − S0 = 0.5 + 0.5 = 1.0 and min{SM(S i )} = min{Si } − S0 = −0.5 + 0.5 = 0 The interval of the sum of S is equal to the interval of a cumulative probability function (CDF), which is [0, 1.0]. In equation (4), the increment of S (S) is the increment of probability based on the linear assumption used for the probability density. Using the CDF, S is the difference of two CDF values: S j = P(x·, j+1 ) − P(x·, j ) = P(x·, j )
A five-parameter (toy) model with known statistical properties was used to compare the original and improved generalized transformation procedures.
Experimental design A five input parameter model was analyzed using FAST: y=
5
xi
(10)
i=1
In this model, the theoretical partial variance (proportional variance) contributed by each input parameter can be explicitly calculated. The five input parameters were assumed to have beta, exponential, gamma, normal, and Weibull distributions, respectively. In order to avoid infinity, bounds were also given to the unbounded distributions. The distribution parameters and the variances of these distributions are listed in Table 1.
224
Fang et al.
Table 1. The distributions of parameters and their frequencies used in sampling with FAST Bounds Model parameter x1 x2 x3 x4 x5 ∗
Distribution
Distribution parameter(s)
Variance
Lower
Upper
Frequency (ω)
Exponential Weibull∗ Normal Beta Gamma∗
0.5 0.0, 1.5, 3.0 0.0, 0.25 1.5, 2.5 0.0, 0.5, 3.5
0.25 0.237 0.25 0.0469 0.875
0.0 0.0 −4.0 0.0 0.0
20.0 20.0 4.0 1.0 20.0
11 21 27 35 39
The three parameters of the distribution are of location, scale, and shape.
The frequency of each input parameter and sample size were determined based on Cukier et al. (1975, refer to Table VI). Sample size was set to be 196 (= 5 max {frequencies} + 1). When using the improved generalized procedure (equation (9)), for all distributions except the exponential distribution, Golden Section (Fibonacci Number) was used to search the approximate inverse CDF’s. The threshold for acceptable approximate values in the search process was set to be 0.0001 (i.e., its error in probability is smaller than 0.0001 when a value is accepted as the transformed value.) For the exponential distribution, the explicit inverse of the CDF was used. In transformation using the original generalized procedure (equation (5)), the slope of the probability density function was estimated using the average slope of the 4 segments, that is, between the previous value and its 1/20, 2/20, 3/20, and 4/20 of the remaining interval.1 When the value transformed based on the average slope was larger/smaller than the upper/lower bound, the bound was taken to be the transformed value. When the term
under the radical in equation (5) was negative, the absolute value of that term was used.
Results and analysis The histograms of the imput parameters transformed using equation (9) were similar to their corresponding PDF’s (Fig. 2). Sample variances of the input parameters had a very small difference from their corresponding theoretical variances (Table 2). The relationship between the samples of the input parameters and their corresponding S values clearly demonstrated the difference of the two transformation procedures (Fig. 3). In all five distributions, samples transformed using the improved procedure had a very clear and distinct relationship between xi ’s and their corresponding S values. Transformation in the exponential distribution had no error using the improved procedure, since the transformed values are computed from their explicit inverse CDF. The similarity of the relationship between input parameters and their S implies that the transformed samples of other distributions had very small errors. With the original generalized transformation procedure, the relationship between xi ’s and their S are less well defined in Fig. 2. The errors due to the linear assumption and the accumulation of these errors appeared in the relationship between the transformed samples and S. When S has similar values, the transformed values could vary over a wide range. Assessment of equation (10) using FAST was totally different using the two transformation procedures (Table 2). Partial variances obtained based on the sample transformed using the Table 2. Sample variances and proportional variance contribution (partial variances) based on equation (10) obtained using the two generalized transformation procedures with FAST Sample variance
Fig. 2. Histograms (.1) of the input parameters in equation (8). [(a) to (e) are figures of x1 to x5 , respectively] transformed using equation (7), and their corresponding theoretical probability density functions (.2). Sample size used to generate histograms: 196
Partial variance
Parameter Original
Improved Original
Improved
Truth
x1 x2 x3 x4 x5
0.246 0.231 0.250 0.047 0.869
0.1351 0.1498 0.1591 0.0399 0.5160
0.1507 0.1429 0.1507 0.0283 0.5275
33.670 0.858 1.018 0.049 1.283
0.8150 0.0266 0.0403 0.0002 0.1180
Fourier amplitude sensitivity test (FAST) for model assessment
225
Fig. 3. The relationship of samples of each parameter drawn using both the original and improved sampling procedures for non-uniform distributions in FAST versus parameter S
original procedure were not close to their true partial variances, while the partial variances with the improved procedure were close. When the improved transformation procedure was used, the sample variances of all parameters are very close to the true values (Tables 1 and 2). Other than for the parameter x4 , the original transformation procedure provides very poor variance estimates. Based on the sample from the improved procedure, partial variances were very similar to their corresponding true partial variances. The largest difference of partial variances was smaller than 0.016 (1.6%) (Table 2).
to the analysis of the approximation of FAST using the uniform distribution (Cukier et al. 1975). In practice, FAST is used for assessment of complicated nonlinear process models where reliable methods for variance partitioning are needed. The improved generalized transformation procedure of FAST for all distributions targets those process models. Although not presented here, the uncertainty partitioning has been found to be more realistic for different complex nonlinear modeling systems using the improved procedure over the original procedure.
Discussion and conclusion
We are grateful to the U.S. Army Construction Engineering Research Laboratory (USACERL) and the Strategic Environmental Research and Development Program (SERDP) for providing support for the study.
When the input parameters of a model have different types of distributions, the quality of model assessment using FAST depends on the accuracy of transformation. High accuracy of transformation provides more reliable uncertainty analysis. When the sample is transformed using the improved generalized procedure, FAST partitioned uncertainty in prediction is similar to the true uncertainty distribution. Our FAST results are comparable
Acknowledgment
Note 1. Remaining interval equals the difference between upper/lower bound of the last sampled point when increment of S is positive/negative.
226
References Collins C. and Avissar R. 1994. An evaluation with the Fourier Amplitude Sensitivity Test (FAST) of which land surface parameters are of greatest importance in atmospheric modeling. Journal of Climate 7: 681–703. Crosetto M. and Tarantola S. 2001. Uncertainty and sensitivity analysis: Tools for GIS-based model implementation. International Journal of Geographical information Science 15(5): 415–437. Cukier R.I., Fortuin C.M., Shuler K.E., Petschek A.G., and Schaibly J.H. 1973. Study of the Sensitivity of coupled reaction systems to uncertainties in rate coefficients. I. Theory. Journal of Chemical Physics 59: 3873–3878. Cukier R.I., Levine H.B., and Shuler K.E. 1978. Nonlinear sensitivity analysis of multiparameter model systems. Journal of Computational Physics 26: 1–42. Cukier R.I., Schaibly J.H., and Shuler K.E. 1975. Study of the Sensitivity of coupled reaction systems to uncertainties in rate coefficients. III. Analysis of the approximation. Journal of Chemical Physics 63: 1140–1149. Koda M., McRae G.J., and Seifeld J.H. 1979. Automatic sensitivity analysis of kinetic mechanisms. International Journal of Chemical Kinetics 11: 427–444. Liu Y. and Avissar R. 1996. Sensitivity of shallow convective precip-
Fang et al. itation induced by land surface heterogeneities to dynamical and cloud microphysical parameters. Journal of Geophysical Research 101(D3): 7477–7497. Rodriguez-Camind E. and Avissar R. 1998. Comparison of three land-surface schemes with the Fourier amplitude sensitivity test (FAST). Tellus 50A(3): 313–332. Saltelli A. and Bolado R. 1998. An alternative way to compute Fourier amplitude sensitivity test (FAST). Computational Statistics and Data Analysis 26: 445–460. Saltelli A., Tarantola S., and Chan K.P.-S. 1999. A quantitative modelindependent method for global sensitivity analysis of model output. Technometrics 41(1): 39–56. Schaibly J.H. and Shuler K.E. 1973. Study of the sensitivity of coupled reaction systems to uncertainties in rate coefficients. II. Applications. Journal of Chemical Physics 59: 3879–3888. Wang G., Fang S., Gertner G.Z., and Anderson A. 2000. Uncertainty propagation and partitioning in spatial prediction of topographical factor for RUSLE. In: Proceedings of Fourth International Conference on Spatial Uncertainty. Amsterdam, Holland. pp. 717–722. Wang G., Fang S., Shinkareva S., Gertner G.Z., and Anderson A. 2001. Uncertainty propagation and error budgets in spatial prediction of topographical factor for revised universal soil loss equation (RUSLE). Transactions of American Society of Agricultural Engineer 45(1): 109–118.