Accred Qual Assur (2011) 16:99–102 DOI 10.1007/s00769-010-0744-z
DISCUSSION FORUM
Key metrological issues in proficiency testing––response to ‘‘Metrological compatibility-a key issue in further accreditation’’ by K. Heydorn Ilya Kuselman • Ales Fajgelj
Received: 21 December 2010 / Accepted: 23 December 2010 / Published online: 9 January 2011 Ó Springer-Verlag 2011
Abstract A discussion of proficiency testing (PT) topics started by Heydorn (Accred Qual Assur 15:643–645, 2010) is continued in the present paper. The role of PT in the accreditation of testing/analytical laboratories, the use of consensus values (average or weighted average, median, observed standard deviation, etc.) and a metrological background of PT schemes are discussed. It is shown that metrological traceability, comparability, and compatibility, as well as commutability of a reference material, are the key issues of any PT scheme that applies certified reference material as test items. Metrological compatibility of PT results in such schemes is a property demonstrating the closeness of the PT results to the certified value in comparison with the measurement uncertainty of their difference. The metrological background is especially important for the selection and use of PT schemes for a limited number of participants (fewer than 30) as detailed in IUPAC/CITAC Guide on the topic published in 2010 in Pure Appl Chem 82(5):1099–1135. Keywords Proficiency testing Measurement uncertainty Metrological traceability Comparability and compatibility Commutability of a reference material Papers published in this section do not necessarily reflect the opinion of the Editors, the Editorial Board and the Publisher. A critical and constructive debate in the Discussion Forum or a Letter to the Editor is strongly encouraged! I. Kuselman (&) National Physical Laboratory of Israel (INPL), Danciger ‘‘A’’ Bldg, Givat Ram, 91904 Jerusalem, Israel e-mail:
[email protected] A. Fajgelj International Atomic Energy Agency (IAEA), Department of Nuclear Sciences and Applications, Vienna International Centre, 1400 Vienna, Austria
Introduction In his paper [1], Heydorn refers to VIM [2] and standard ISO/IEC 17025 [3] to show that accreditation authorities should require correct evaluation of measurement uncertainty from testing/analytical laboratories. The laboratories are obliged to participate in proficiency testing (PT) because, the author of the paper supposes, in this way the laboratories will demonstrate their competence in the evaluation of measurement uncertainty. The competence is interpreted in the paper as the metrological compatibility of a set of PT results, which could be tested with the T-statistic based on the deviations of the results from their weighted average (a consensus value), where measurement uncertainties are used for the weight calculations [1]. The author has no complaint about the T-statistic when the number N of laboratories participating in the PT (and the size of the set) is limited to two. The metrological compatibility of PT results with an available certified value (of a reference material) is tested ‘‘simply by including this measurement result in the set’’ [1]. Thus, the author also has no complaint about the difference between the information contained in a laboratory test result of such a set and in the certified value, with their metrological traceability and comparability, and with the commutability of the reference material. Therefore, it seems that the author of the paper [1] has a problem only with understanding the recently published IUPAC/CITAC Guide on the selection and use of PT schemes for a limited number of participants [4]. In the IUPAC/CITAC Guide [4], certified reference materials (CRMs) are applied in PT as test items, a laboratory’s performance is evaluated based on deviations of the laboratory’s result from the certified value of the test items, the measurement uncertainty declared by the
123
100
laboratory is used for its internal performance evaluation, and the metrological compatibility of the PT results with the certified value is interpreted as a parameter characterizing collective/joint performance of the laboratories participating in the PT. What is really required by accreditation authorities from the laboratories participating in PT? Which key metrological issues should be taken into account in the selection and use of a PT scheme? What can be demonstrated using PT results, and what cannot be? Which contingencies follow a limited number N of laboratories (fewer than 30) that are able to participate in a PT scheme? These are the questions discussed in the present response to Heydorn’s paper [1].
The role of PT results and their measurement uncertainties in accreditation An accredited testing/analytical laboratory or a laboratory applying for accreditation shall have in place procedures for evaluating measurement uncertainty and for participation in PT programs [3]. The PT results of the laboratory should be taken into account by the accreditation body as evidence of the laboratory’s competence. The acceptability criteria used by the accreditation body according to ILAC guidance [5] may be based on the z-score or on the f-score and on the En number formulated using the measurement uncertainty evaluated by the laboratory. On the other hand, the measurement uncertainty can also be evaluated from the PT results [6]. However, neither standard ISO/IEC 17025 [3] nor ILAC guidance [5] and policy [7] establish requirements for the competence of a testing laboratory for correct evaluation of the measurement uncertainty of PT results. Moreover, new PT standard ISO/IEC 17043 [8] ‘‘explicitly avoids any mention of a means of recognition of competence’’ [9]. Since ILAC policy [7] and standard ISO/IEC 17043 [8] were both recently adopted and published, the role of PT results and their measurement uncertainties in accreditation are unlikely to be changed in the near future, contrary to Heydorn’s prediction [1]. One of the reasons is that the evaluation of measurement uncertainty in testing/analytical laboratories is still not a trivial task. Thus, the development of the 3rd edition of the EURACHEM/CITAC Guide for quantifying uncertainty in analytical measurements will be discussed in the forthcoming workshop [10].
Use of consensus values A consensus value is a value reached by a group of laboratories as a whole and acceptable for every laboratory/ member/participant of this group. For example, there are
123
Accred Qual Assur (2011) 16:99–102
average or mean of laboratory results, weighted average (as in T-statistic), median, observed standard deviation, etc. In general, a consensus value reached by the participants of one group is not necessarily acceptable for participants of another group. The worldwide application of consensus values in PT is based on the two following hypotheses. The first (statistical) hypothesis supposes that a large enough number of laboratories are a representative statistical sample of a population, i.e., of an infinite number of such laboratories. The sample may be characterized by the average or median and the observed standard deviation of the laboratory measurement results, which deviate insignificantly from the population values. The second (chemical metrological) hypothesis is that a population of laboratories with different staff will use different measurement methods, equipment, and reagents at different times, transforming any systematic error into a random error. At the same time, there is no evidence that these hypotheses are correct in a particular case. For example, even the population mean of measurement results (of an infinite number of laboratories) may not coincide with the ‘‘true’’ value of an analyte concentration. Therefore, the question is when and for what purpose consensus values can be applied [11]. For PT schemes with a limited number N of participants (fewer than 30), a sample value may be significantly different from the population value. For example, for a sample of the size N = 30 from a normal distribution of PT results, the sample average can differ from the population mean by up to 0.36 standard deviations at a confidence level of 0.95. Simultaneously, the sample standard deviation can differ from the population value by over 25% rel. at a confidence level of 0.95. Naturally, when N decreases, the difference between the corresponding sample and population parameters increases. Thus, consensus values are less reliable for schemes with N \ 30 than for schemes with N C 30 [12].
Metrological background Metrological background for the selection and use of PT schemes is important for any number N of participants [8], especially when N \ 30 [4]. Global acceptance of measurement/analytical/test results is linked to their metrological traceability, comparability, and compatibility, as well as to the commutability of an applied reference material. Each of these properties is a necessary, but not sufficient, condition for the concept ‘‘tested once, accepted everywhere’’. Simultaneous satisfaction of all of the conditions is required for this concept. A traceable assigned/certified value of test items (portions of a CRM unknown for PT participants) is one of the
Accred Qual Assur (2011) 16:99–102
key metrological issues in PT. The reason is that measurement results traceable to the same reference are comparable independent of N, the result values, and their associated measurement uncertainties. When CRM is used, externally set performance criteria acceptable for all participants are calculated as a z-score, dividing the deviation of a PT result from the certified value by an external (target) standard deviation of PT results. If the information necessary to set the external performance criteria is not available, the measurement uncertainty of the certified value is not negligible, or the laboratories work according to their own fit-for-purpose criteria (a uniform criterion is inapplicable for all of the participants), the information included in the measurement uncertainties reported by the laboratories may be helpful for their proficiency assessment using the f-score or the En number [13]. An optimal PT scheme is to be selected by the PT provider, depending on existing CRMs or on the provider’s ability to develop such materials for PT purposes and on scores suitable for the participants’ performance assessment. The applicable CRM should be commutable (adequate/ match) to routine samples analyzed by PT participants. Otherwise, the difference in the chemical composition/ matrix of the CRM and routine samples analyzed by a laboratory will lead to increasing deviation of a PT result of the laboratory from the certified value of the CRM. Since laboratory performance is assessed individually for each PT participant, even in a case when the performance of the majority of participants is found to be successful, the collective/group/joint performance of the laboratories participated in PT remains not assessed. The key issue for this task is the metrological compatibility as a property of a set of measurement results characterizing their closeness to each other in comparison with the measurement uncertainty of their difference [2]. When a PT scheme in the field of radioactivity measurements is based on the use of a consensus value, the T-statistic discussed in paper [1] is probably a suitable tool for the assessment of the metrological compatibility of the PT results and of the joint performance of the laboratories participated in this PT. Note only that the T-statistic is not helpful for the assessment of laboratory competence in measurement uncertainty evaluation because, in general, T-statistic is smaller (better) when uncertainties are larger, like the f-score and the En number. When portions of CRM are used as test items in a PT scheme and a laboratory’s individual performance is assessed by the difference between the laboratory result and the metrologically traceable certified value, the performance of all PT participants as a group should be assessed in the same manner to maintain the traceability advantage of the scheme. In this way, the closeness of the results obtained in the PT to the measurement results used
101
for the certification of the reference materials is tested. Successful passing of this test allows one to expect that the PT results will be ‘‘accepted everywhere’’ [14]. Therefore, the metrological compatibility of PT results is evaluated in IUPAC/CITAC Guide [4] based on the difference between their average or median value and the certified value in comparison with the measurement uncertainty of that difference. The difference has a Student’s distribution when the PT results are distributed normally. For a case in which the hypothesis that the PT results are normally distributed is rejected, non-parametric statistics with the sign criterion are powerful enough for this evaluation [4]. More technical/mathematical details are available elsewhere [15, 16].
Conclusions 1.
2.
3.
4.
The assessment of laboratory competence in the evaluation of measurement/analytical/test uncertainty based on a laboratory’s PT results is not a topic of the current ILAC policy and/or standard ISO/IEC 17043. Consensus values (average or weighted average, median, observed standard deviation, etc.) are less reliable for PT schemes with a limited number of participants (fewer than 30). There is not any mathematical statistical tool to overcome this limitation. The use of certified reference material (unknown to participants) as test items is the metrological solution to this problem. Metrological traceability, comparability, and compatibility, as well as the commutability of a reference material, are the key issues of the metrological background of any PT scheme applying a certified reference material. Each of these issues is a necessary, but not sufficient, condition for the concept ‘‘tested once, accepted everywhere’’. The metrological compatibility of PT results in schemes using a certified reference material is a property demonstrating the closeness of the results to the certified value in comparison with the measurement uncertainty of their difference.
References 1. Heydorn K (2010) Accred Qual Assur 15:643–645 2. BIPM, IEC, IFCC, ISO, IUPAC, IUPAP, OIML (2007) International vocabulary of metrology - basic and general concepts and associated terms (VIM), 3rd edn. ISO, Geneva. JCGM 200:2008. Available online at: http://www.bipm.org/en/publications/guides/ vim.html 3. ISO/IEC 17025 (2005) General requirements for the competence of testing and calibration laboratories, 2nd edn. ISO, Geneva 4. Kuselman I, Fajgelj A (2010) IUPAC/CITAC Guide: Selection and use of proficiency testing schemes for a limited number of
123
102
5.
6.
7.
8. 9. 10.
Accred Qual Assur (2011) 16:99–102 participants––chemical analytical laboratories (IUPAC Technical Report). Pure Appl Chem 82(5):1099–1135. Available online at: http://www.citac.cc (Publications) ILAC (2004) Use of proficiency testing as a tool for accreditation in testing. G22:2004. Available online at: http://www.ilac.org/ guidanceseries.html Magnusson B, Hovind H, Krysell M, Naykki T (2008) CITAC News, pp 27–29. Available online at: http://www.citac.cc (Newsletters) ILAC (2010) ILAC policy for participation in proficiency testing activities. P9:11. Available online at: http://www.ilac.org/procseries. html ISO/IEC 17043 (2010) Conformity assessment––general requirements for proficiency testing. ISO, Geneva Tholen DW (2008) Accred Qual Assur 13:727–730 EURACHEM/CITAC Workshop (2011) Recent developments in measurement uncertainty. The revised EURACHEM/CITAC Guide. Available online at: http://eurachem2011.fc.ul.pt
123
11. Pankratov I, Elhanany S, Henig S, Zaritsky S, Ostapenko I, Kuselman I (2010) Accred Qual Assur 15:459–466 12. Belli M, Ellison SLR, Fajgelj A, Kuselman I, Sansone U, Wegscheider W (2007) Accred Qual Accur 12:391–398 13. Thompson M, Ellison SLR, Wood R (2006) The international harmonized protocol for the proficiency testing of analytical chemistry laboratories. Pure Appl Chem 78(1):145–196 14. Kuselman I, Belli M, Ellison SLR, Fajgelj A, Sansone U, Wegscheider W (2007) Accred Qual Accur 12:563–567 15. Kuselman I (2006) In: Fajgelj A, Belli M, Sansone (eds) Combining and reporting analytical data. RSC Special publication No. 307, Royal Society of Chemistry, Cambridge, pp 229–239 16. Kuselman I (2006) Accred Qual Assur 10:466–470 and 659–663