ISO/IEC 25012:2008 — SQuaRE Data Quality Model

Software engineering — SQuaRE — Data quality model

Introduction to ISO/IEC 25012

ISO/IEC 25012 addresses a critical dimension of software quality that is often overlooked: the quality of data itself. As organizations increasingly rely on data-driven decision making, machine learning, and business intelligence, the quality of underlying data becomes paramount. Poor data quality leads to flawed analytics, incorrect business decisions, and regulatory non-compliance. This standard defines a data quality model that categorizes quality attributes into fifteen characteristics viewed from two complementary perspectives: inherent and system-dependent.

The data life cycle is often longer than the software life cycle. While software may be replaced every few years, critical data can persist for decades — making data quality a long-term strategic concern, not just a project-level issue. Data quality problems discovered late can be extremely costly to remediate.

The standard recognizes that data quality affects all information technology projects where data is exchanged, processed, and used between computer systems and users. Several factors drive the need for systematic data quality management: acquisition of data from organizations with unknown or weak quality processes, the existence of defective data contributing to unsatisfactory outcomes, dispersion of data across multiple owners and systems with inconsistent semantics, and the coexistence of legacy and modern systems that must interoperate. The data quality model provides a structured framework for addressing these challenges.

The Fifteen Data Quality Characteristics

ISO/IEC 25012 organizes data quality characteristics into three groups based on whether they are viewed from an inherent perspective, a system-dependent perspective, or both. This three-way classification is one of the standard’s most distinctive features, as it recognizes that some quality attributes are properties of the data itself while others emerge from the interaction between data and the systems that manage it. Understanding this distinction is essential for designing effective data quality improvement programs.

Perspective Characteristics Description
Inherent Only Accuracy, Completeness, Consistency, Credibility, Currentness Relate to data itself — its values, relationships, and business rules
Inherent & System-Dependent Accessibility, Compliance, Confidentiality, Efficiency, Precision, Traceability, Understandability Depend on both data content and the capabilities of the computer system
System-Dependent Only Availability, Portability, Recoverability Achieved through hardware, software, and infrastructure capabilities

Inherent Data Quality

Inherent data quality refers to data’s intrinsic potential to satisfy needs regardless of the system storing it. Accuracy comprises syntactic accuracy (values conforming to domain rules, e.g., “Mary” not “Marj”) and semantic accuracy (values correctly representing real-world entities, e.g., the correct name for the right person). Completeness measures whether all expected attributes have values for each entity instance. Consistency ensures data is free from contradictions across related entities. Credibility captures the degree to which users regard data as true and believable, often tied to the trustworthiness of the data source. Currentness addresses whether data is of the right age for its context — a railway timetable must be updated with sufficient frequency to remain useful.

For each characteristic, the standard provides practical measurement examples. Record field syntactic accuracy is measured as the ratio of syntactically accurate records to total records — a simple yet powerful quality metric that any data team can implement immediately.

System-Dependent Data Quality

System-dependent data quality depends on the technological domain and infrastructure. Availability ensures data can be retrieved by authorized users and applications when needed, including during concurrent access and maintenance operations like backup. Portability addresses the ability to install, replace, or move data between systems while preserving existing quality. Recoverability ensures data can be restored after failures through commit/synch point mechanisms, rollback capabilities, and backup-recovery procedures. These characteristics are heavily influenced by architecture decisions and infrastructure investments.

Engineering Design Insights

From an engineering perspective, ISO/IEC 25012 provides several critical insights for data-intensive system design. The standard’s dual-perspective classification is particularly valuable because it separates data content issues from infrastructure issues — two problem domains that require fundamentally different solutions and skill sets. Data engineers can use this classification to assign ownership appropriately: business domain experts own inherent quality, while IT infrastructure teams own system-dependent quality.

A common mistake is treating all data quality problems as data cleaning issues. Inherent quality problems like inaccuracy or inconsistency typically require domain-specific business rules, validation logic, and process improvements. System-dependent problems like poor availability or weak recoverability require infrastructure investment, architectural changes, and redundancy planning — fundamentally different remediation strategies.

The standard includes specific measurement examples for each characteristic. Confidentiality can be measured through encryption coverage as an inherent measure and through penetration test success rates as a system-dependent measure. Efficiency can be measured by comparing actual storage usage against optimized benchmarks. The Compliance characteristic is particularly relevant in regulated industries: the standard provides separate measures for inherent compliance (data content conforming to regulations like GDPR or HIPAA) and system-dependent compliance (technical architecture ensuring regulatory conformance). This distinction maps directly to real-world compliance implementation challenges.

From a practical standpoint, the standard’s measurement framework enables organizations to establish quantitative quality targets for each characteristic, monitor them over time, and drive data quality improvement initiatives with clear metrics. Organizations implementing data governance programs will find the fifteen-characteristic model provides an excellent checklist for defining their data quality dimensions and establishing measurement baselines.

Frequently Asked Questions

Q1: What is the difference between inherent and system-dependent data quality?
A: Inherent data quality refers to the data’s intrinsic properties — its values, accuracy, and consistency regardless of the system. System-dependent data quality depends on the capabilities of the computer system that stores and processes the data.
Q2: How does ISO/IEC 25012 relate to ISO/IEC 25010?
A: 25012 defines the data quality model, while 25010 defines the product quality model for ICT products. Data is both a target of its own quality model (25012) and a component of ICT products covered by 25010.
Q3: Can these data quality characteristics be measured quantitatively?
A: Yes. The standard provides example measures for each characteristic, typically expressed as ratios (e.g., accurate records / total records) or counts (e.g., number of non-conforming items).
Q4: What types of data does the standard cover?
A: It covers data retained in structured format within computer systems, including all data types (character strings, numbers, dates, images, sounds) and relationships between data. It does not cover embedded device or real-time sensor data not retained for processing.

Leave a Reply

Your email address will not be published. Required fields are marked *