ISO/TR 27877:2021 — Statistical Methods for Proficiency Testing by Interlaboratory Comparison

A comprehensive guide to robust statistical evaluation of proficiency testing data for laboratory quality assurance

Introduction to ISO/TR 27877 and Proficiency Testing

ISO/TR 27877:2021 provides comprehensive guidance on statistical methods for evaluating proficiency testing (PT) data obtained from interlaboratory comparisons. Proficiency testing is a cornerstone of laboratory quality assurance, enabling laboratories to demonstrate technical competence and identify measurement biases through periodic comparison with peer laboratories. This Technical Report fills a critical gap by consolidating robust statistical approaches that go beyond classical assumptions, addressing real-world challenges such as small sample sizes, multiple outliers, and non-normal distributions commonly encountered in PT schemes.

For laboratories seeking ISO/IEC 17025 accreditation, participation in proficiency testing programs analyzed according to ISO/TR 27877 is now considered an essential element of demonstrating measurement traceability and technical competence.

The standard focuses on performance statistics that are both intuitive and statistically sound. It covers traditional z-scores alongside robust alternatives including z’-scores, zeta-scores, and En numbers, each suited to different scenarios depending on the availability and reliability of assigned values and uncertainty estimates. A key contribution is the guidance on handling censored data, extreme values, and multimodal distributions that violate normality assumptions.

Statistical Methods and Performance Metrics

The core of ISO/TR 27877 lies in its structured approach to calculating and interpreting performance metrics. The classical z-score, defined as (x − Xpt) / σpt, remains widely used, but the standard emphasizes the importance of selecting robust estimators for the assigned value Xpt and the standard deviation σpt. Algorithm A from ISO 13528 is recommended for robust analysis, providing immunity to the influence of up to 20% outliers.

Metric Formula Application Robustness
z-score (x − Xpt) / σpt General PT with reliable σ Low
z’-score (x − Xpt) / MAD Small participant numbers High
ζ-score (zeta) (x − Xpt) / √(ux² + upt²) When lab uncertainty is critical Moderate
En number (x − Xpt) / √(Ulab² + Uref²) Calibration PT schemes Moderate
A common pitfall in PT data analysis is the inappropriate use of classical z-scores when the participant population is small (n < 20). In such cases, the robust z'-score based on the median and MAD (median absolute deviation) provides significantly more reliable performance assessment.

The standard also addresses the interpretation of combined results across multiple rounds or multiple measurands. The sum of z-scores (RSZ) and sum of squared z-scores (RSSZ) are introduced as tools for detecting persistent bias or excessive variability that individual z-scores might miss. A laboratory with |z| ≤ 2 considered satisfactory, 2 < |z| < 3 is questionable, and |z| ≥ 3 is unsatisfactory, but the standard stresses the need to consider these alongside graphical tools such as Youden plots and Mandel’s h and k statistics.

Engineering Design Insights and Practical Implementation

From an engineering perspective, ISO/TR 27877 offers several valuable design insights. First, the choice of robust statistical estimators directly impacts the sensitivity of the PT scheme. Using the median instead of the mean reduces the influence of extreme results but also reduces statistical efficiency. The standard recommends Algorithm A (Huber M-estimator) as a practical compromise, providing high breakdown point with acceptable efficiency.

Second, the standard provides practical workflows for handling non-normal data distributions. When the data exhibit significant skewness or kurtosis, transformation techniques (logarithmic, Box-Cox) are recommended before applying performance metrics. The standard includes worked examples demonstrating how transformation affects z-score interpretation, a rarely-covered but critically important topic.

Implementing the robust statistical framework from ISO/TR 27877 in automated PT evaluation software can reduce false-positive outlier flags by up to 40% compared to classical methods, significantly improving the fairness and credibility of proficiency testing programs.

Frequently Asked Questions

Q: Can ISO/TR 27877 be applied to PT schemes with very few participants (e.g., 5-8 laboratories)?
A: Yes. The standard specifically addresses this scenario and recommends using robust statistics (median/MAD) rather than classical mean/SD. However, for fewer than 5 participants, graphical assessment using Mandel’s h statistic may be more appropriate than formal z-score evaluation.
Q: How does ISO/TR 27877 relate to ISO 13528?
A: ISO/TR 27877 complements ISO 13528 by providing additional statistical background, worked examples, and guidance on advanced topics such as censored data handling, measurement uncertainty integration, and multi-analyte assessment. ISO 13528 remains the primary procedural standard for PT scheme design.
Q: What is the recommended approach when data contains multiple extreme outliers?
A: The standard recommends iterative application of robust estimators (Algorithm A) with graphical diagnostics using kernel density estimates or Q-Q plots. A stepwise approach: first detect outliers with robust estimates, investigate assignable causes, then recompute metrics with remaining data.
Q: Is the zeta-score always preferable to the classical z-score?
A: No. The zeta-score incorporates participant measurement uncertainty, making it more informative when accurate uncertainty estimates are available. However, when uncertainties are poorly estimated or unreliably reported, the simpler z-score may provide more stable results.

Leave a Reply

Your email address will not be published. Required fields are marked *