Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
ISO/IEC TS 25058:2022 represents a landmark extension of the SQuaRE framework into the domain of artificial intelligence systems. As AI systems — particularly those based on machine learning — become embedded in critical applications ranging from medical diagnosis to autonomous vehicles to financial decision-making, the need for a structured, multi-dimensional approach to AI system quality becomes urgent. Traditional software quality models are insufficient for AI systems because AI behavior is learned from data rather than explicitly programmed, introducing unique quality considerations around training data quality, model robustness, explainability, and fairness.
TS 25058 adapts the ISO/IEC 25010 quality model framework to the unique characteristics of AI systems. It introduces new quality characteristics and sub-characteristics relevant to AI, refines existing characteristics to address AI-specific concerns, and defines measurement methods appropriate for evaluating AI system quality. The specification addresses the full AI system lifecycle — from data collection and model training through deployment, monitoring, and retraining — recognizing that AI quality is not a static property but must be continuously evaluated as data distributions shift and operational contexts evolve.
The specification is closely aligned with other ISO/IEC AI standards, including ISO/IEC 22989 (AI concepts and terminology), ISO/IEC 23053 (ML framework), and the emerging ISO/IEC 42001 (AI management system). Together, these standards form a comprehensive governance framework for AI systems.
Unlike traditional software where quality depends primarily on code correctness, AI system quality is fundamentally determined by the quality of training data. TS 25058 defines data quality characteristics that must be evaluated as part of AI system quality assessment:
| Characteristic | AI-Specific Sub-Characteristics | Evaluation Approach |
|---|---|---|
| Data Suitability | Data completeness, data representativeness, data balance, data relevance | Statistical analysis of training data distribution; comparison with target population demographics; coverage analysis for feature space |
| Data Accuracy | Label accuracy, feature accuracy, annotation consistency | Inter-annotator agreement measures (Cohen’s kappa, Fleiss’ kappa); holdout validation set for label verification |
| Data Timeliness | Data currency, concept drift detection, data freshness | Monitor prediction accuracy over time; implement drift detection algorithms (PSI, KS test); track data age distribution |
| Data Provenance | Source traceability, transformation transparency, lineage completeness | Maintain data lineage documentation; implement data version control; audit data collection and processing pipelines |
Beyond data quality, TS 25058 defines model-specific quality characteristics that address the unique properties of AI/ML models:
| Characteristic | Description | Measurement Approach |
|---|---|---|
| Model Accuracy | Degree to which model outputs match correct or expected values | Standard ML metrics (precision, recall, F1, AUC-ROC, MAE, RMSE) evaluated on representative test sets; disaggregated by relevant subgroups |
| Model Robustness | Ability to maintain prediction quality under perturbed inputs or changing conditions | Adversarial testing (FGSM, PGD); noise injection testing; distribution shift robustness evaluation; out-of-distribution detection performance |
| Explainability | Degree to which model decisions can be understood by humans | Feature importance analysis (SHAP, LIME); counterfactual explanation generation; interpretability metrics for different stakeholder groups |
| Fairness and Bias | Degree to which model decisions are free from systematic discrimination | Statistical parity, equal opportunity, equalized odds, demographic parity metrics; bias audit across protected attributes |
| Uncertainty Quantification | Degree to which the model accurately communicates confidence in its predictions |
TS 25058 provides a quality framework that should be integrated throughout the AI system development lifecycle. In the design phase, the quality model characteristics inform requirements specification — teams should explicitly document which quality characteristics are relevant, the target levels to be achieved, and the evaluation methods to be used. This proactive approach prevents the common pitfall of treating quality evaluation as a post-hoc activity.
During data preparation, teams should evaluate data quality characteristics from TS 25058, documenting data provenance, assessing representativeness, and verifying label quality. Data quality issues discovered at this stage are far less costly to address than issues discovered after model deployment.
During model development and evaluation, the model quality characteristics provide a comprehensive evaluation framework that goes beyond simple accuracy metrics. Teams should evaluate models across all relevant characteristics — robustness, explainability, fairness, and uncertainty — not just predictive performance. This multi-dimensional evaluation often reveals trade-offs: improving robustness may slightly reduce accuracy, and increasing fairness may require accepting higher error rates for some groups. These trade-offs should be documented and managed explicitly.
During deployment and operations, TS 25058 guides the implementation of continuous monitoring for model quality degradation. Key monitoring elements include data drift detection, prediction distribution monitoring, and regular retraining triggers. The specification emphasizes that AI system quality is not a one-time evaluation but a continuous process that must keep pace with changing data distributions and operational contexts.
For engineers implementing TS 25058, a practical starting point is to create an AI system quality specification document that maps each relevant quality characteristic from the model to specific measures, target values, evaluation methods, and monitoring approaches. This document serves as the quality contract between development teams, operations teams, and business stakeholders, establishing shared expectations for AI system behavior across its operational life.