Anal Bioanal Chem (2005) 383: 3–5 DOI 10.1007/s00216-005-3408-9
A NA L YT I C A L C H A L LE N G E
Manfred Reichenba¨cher Æ Ju¨rgen W. Einax
Solution to the Quality Assurance Challenge 1
Published online: 29 July 2005 Springer-Verlag 2005
sx:o 100% x
Solution
RSD ¼
The objective of regression analysis is to use the mathematical expression relating response to concentration to predict the concentrations of unknown samples. This means calculating the intercept a0 and the slope a1 of a straight line regression by means of the least-squares method, after the linearity, and other validation data, of the mathematical model have been tested. To do this, first, we must work out the total number n of independent variables. In Table 1, which lists the results from the calibration measurement, there is a data set of 36 measurements which is then reduced to n=18 independent values. The double measurement of a standard does not yield two independent results, because the main error comes from the step in which the azo compound is created (chemical equilibrium!), not from measuring the absorbance. Therefore, for all further calculations the mean values of the double measurements listed in Table 1 (yij values) are used. The replicates of each standard yield independent results, however, so the number of degrees of freedom, dfn, is 16. Data for the straight line regression, calculated using Excel’s standard functions: Intercept, a0: 0.00270714 A, slope, a1: 3.3071429 A L mg1 N, residual error, sy.x: 0.0017695 A The analytical error sx.o is calculated by use of the equation: sy:x sx:o ¼ ð1Þ a1
gives the result RSD=0.94%.
with the result sx.o=0.0005351 mg L1 N. The precision of the analytical method, expressed as the relative standard deviation RSD, which is calculated by use of the equation: M. Reichenba¨cher Æ J. W. Einax (&) Institute of Inorganic and Analytical Chemistry, Friedrich-Schiller-Universita¨t, Lessingstraße 8, 077 43 Jena, Germany E-mail:
[email protected]
ð2Þ
Validation of the mathematical model Validation of the regression model is necessary to verify that the chosen model adequately describes the relationship between the two variables x and y. This means that, for our example, we need to verify that the best model is a straight line or whether the data are better described by a curve. Some procedures must be performed to verify the linearity of the regression model. Although frequently used, the correlation coefficient r is not the criterion used to verify the linearity! The value of r calculated for quadratic regression is always higher (and therefore closer to 1.0) than the value of r calculated for a linear regression. Because replicate measurements are made, ANOVA (ANalysis Of VAriance) must be used to check for lack of fit to a regression, to verify whether the model chosen is the correct one. The lack-of-fit test is a onesided test that is performed by comparing the ratio: MSLOF F^ ¼ ð3Þ MSPE with the F distribution for (k2) and (nk) degrees of freedom. MSLOF represents the lack of fit, whereas MSPE describes the pure error (the measurement error). If this ratio is significant at the chosen significance level, one concludes that the model is inadequate, because the variation of the group means about the line cannot be explained in terms of pure experimental uncertainty. If MSLOF and MSPE are comparable, the model is justified. The mean squares, MS, are obtained by dividing the respective sums of squares, SS, by their corresponding degree of freedom df. The degree of freedom of the pure error, dfLOF, is given by k–2 where k is the number of levels (the number of different x values), and the degree of freedom of the lack of fit dfPE is given by n–k where n is the total number of observations, including all independent replicate measurements.
4 Table 1 ANOVA table and data for the Cochran test xi
0.017
yij
0.05215 0.10715 0.05115 0.10955 0.05295 0.10785 0.0520833 0.1081833 0.0535143 0.1064286 1.6267·106 3.0467·106 7 8.1333·10 1.5233·106 ^y ¼ 0:00270714 þ 3:307143x n=18 dfPE=12 SSLOF=3.78643·105 5 P 2 MSPE=1.01958·10 sj ¼ 6:1175 106 j F(P=99%; df1=dfLOF; df2=dfPE)=5.412
yi ^yi SSj s2j k=6 dfLOF=4 MSLOF=9.46607·106 F^ ¼ 9:284
0.033
0.049
The pure error sum of squares, SSPE, and the sum of squares because of the lack of fit, SSLOF, are given by Eqs. (4) and (5), respectively, SSPE ¼
ni k X X
ð4Þ
j
i
SSLOF ¼
yij yi
2
k X
ni ðyi ^y1 Þ2
ð5Þ
i
0.065
0.16085 0.20965 0.16045 0.21020 0.16105 0.21005 0.1607833 0.2099667 0.1593429 0.2122571 1.8667·107 1.6167·107 8 9.3333·10 8.0833·108 ni=3 r=0.99983 SSPE=1.2235·105
s2max ^ ¼P C s2j
0.081
0.097
0.26675 0.26340 0.26655 0.2655667 0.2651714 7.0667·106 3.5308·106
0.31820 0.31850 0.31795 0.3182167 0.3180857 1.5167·107 7.5833·108
ð6Þ
j
^ is then compared to the critical value of C at a parC ticular significance, P 2for instance P=95%. Using the ^ is calcuvalues of s2i and sj presented in Table 1, C j lated to be 0.5772. The critical value of C for P=95%, k=6, and nj=3 is 0.6161. It follows that the data can be regarded as homoscedastic. Answer to question (b): The variances are homogeneous in the working range.
where yij is one of the ni replicate measurements made at xi, ni is the number of replicate measurements made at xi, SSi is the sum of the squares of the data values at each i, yi is the mean value of the replicates yij at xi, and ^yi is the value of yi at xi, as estimated by the regression function. All replicates at xi have the same estimated value ^yi . The results from the ANOVA are presented in Table 1. The test value F^ is somewhat higher than the F-table value with P=99%, so the straight line model is inadequate for describing the relationship between y and x, because the variation of the group means about the line cannot be explained solely by pure experimental uncertainty. Note that the linear regression model is confirmed if the mean values of the replicates ðyi Þ and the residuals are used along with the Mandel test for mathematical linearity. Answer to question (a): The linear regression function does not adequately describe the relationship between the variables x and y.
^xf ¼
Homoscedasticity Homogeneity of the variances in the working range is an important validation test. If the condition of homogeneity of variance or homoscedasticity is violated, the simple least-squares procedure cannot be used without reducing the reliability of the estimation. The homoscedasticity may be verified using various tests. If all groups have the same number of values (in our example nj=3), Cochran’s test [ISO Standard 5725-2 (1994)] is appropriate for multiple comparison of variances. Cochran’s criterion is based on comparing s2max with all the other variances:
In the absence of systematic errors the values 1.0 and 0, respectively, are included in the confidence interval of the slope (CI a1,f) and the confidence interval of the intercept (CI a0,f), respectively, of the recovery function ^xf ¼ f ðxc Þ: CI a0,f and CI a1,f are calculated by use of Eqs (8) and(9), respectively. sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi x2 1 þ CIa0;f ¼ sa0;f tðP ; dfÞ sa0;f ¼ sy:x;f ð8Þ n SSxx
Accuracy of the mean In analytical chemistry, a calibration line cannot be used to determine an analyte in a sample when the sample matrix is known to interfere with the analyte. In this case, the matrix causes a systematic error and the analytical result is wrong. Therefore, to validate the analytical method it is important to rule out any possible matrix effect (which affects the accuracy of the mean). Various methods can be used to reveal the presence of matrix effects. If the matrix is known exactly or the interfering elements are known (for instance, Fe(III) ions in our example), the recovery function is the best method for determining possible matrix effects. This is performed by adding the interfering elements (2 mg L1 Fe(III) ions) to all calibration standards and analyzing them in the same manner as before. The mean values ^y f ; are obtained from the data, as are the regression constants from the calibration, a0 and a1. Using these, the concentration ^xf can be calculated: ^y f a0 a1
ð7Þ
5
sy:x;f CIa1;f ¼ sa1;f tðP ; dfÞ sa1;f ¼ pffiffiffiffiffiffiffiffiffi SSxx
ð9Þ
Results from the recovery experiments: – Terms of the linear regression of the recovery function (all terms from the recovery experiments are marked with the subscript f): a0:f ¼ 0:0005573 mg=L; a1:f ¼ 1:07852; x ¼ 0:057 mg=L; n ¼ 18; df ¼ 16 sy:x;f ¼ 0:00062 mg=L; SSxx;f ¼ 0:00448 ðmg=LÞ2 ; tðP ¼ 95%; dfÞ ¼ 2:120sa0;f ¼ 0:0005478 mg=L; CI for a0;f ¼ 0:00056 0:00116 mg=L sa1;f ¼ 0:0092615;
CI for a1;f ¼ 1:07852 0:01963
– Test for a constant systematic error. Range of CI for a0,f: 0.0017 to 0.0006. Because zero is included in the confidence interval of the intercept, the matrix does not cause a constant systematic error. – Test for a proportionally systematic error. Range of CI for a1,f: 1.059 to 1.098. Because 1.0 does not fall within the confidence interval of the slope, the matrix causes a proportional systematic error. Answer to question (c): The iron-containing matrix results in a systematic error. The calibration line cannot be used to determine nitrite ions in the iron-containing matrix. Precision of the method The test for accuracy requires that the matrix does not significantly affect the precision
of the method. This test is performed by means of an F-test. Because they are in the same measurement units, the residual error of the recovery function sy.x,f is compared with the analytical error of the calibration function sx.o. F^ ¼
s2y:x;f s2x:o
ð10Þ
with F^ ¼ 1:3423 The F^ - value calculated is compared with the critical F-value at the chosen significance level: F(P=95%, df1=df2=16) = 2.334. Note: F^ is smaller than F. Answer to question (d): The iron-containing matrix does not affect the precision of the analytical method. Alternative analytical methods Because we have detected a significant matrix effect from the iron in the matrix, the analytical method chosen cannot be used to determine nitrite ions in factory wastewater. How should we proceed? There are various approaches we could take. We can isolate either the interfering elements or the analyte itself, by modern solid-phase extraction techniques (SPE), for instance. But these methods are time-consuming. A better solution is to use a matrixindependent analytical method, for example the standard addition method for this example, or HPLC (ion chromatographic method). In the first case, we can use a simple photometric method. Answer to question (e): The standard addition method should be used.