ISO/IEC 25022: Measurement of Quality in Use

Systems and Software Engineering — SQuaRE Quality Measurement Division — Measuring Software Quality from the User’s Perspective

1. Understanding Quality in Use Measurement

ISO/IEC 25022 is a key standard within the SQuaRE ISO/IEC 2502n Quality Measurement Division that defines how to measure quality in use — the degree to which a product or system can be used by specific users to meet their needs to achieve specific goals with effectiveness, efficiency, satisfaction, freedom from risk, and context coverage in specific contexts of use. It replaces the earlier ISO/IEC 9126-4:2004 and aligns with the updated quality in use model defined in ISO/IEC 25010.

What distinguishes quality in use from other forms of quality measurement is its focus on the outcomes of human-system interaction rather than intrinsic product properties. While product quality metrics (ISO/IEC 25023) examine the software itself — its code complexity, response times, or defect counts — quality in use measures what happens when real users apply the system to real tasks in real environments. This outcome-oriented perspective is essential for understanding whether a system genuinely delivers value to its stakeholders.

Quality in use depends not only on the product quality of the software or computer system, but also on the particular context in which the product is being used — including user factors, task factors, and physical and social environmental factors. Comparisons are only valid when measures are made in the same context of use.

2. The Five Quality in Use Characteristics and Their Measures

The standard defines measures organized under five top-level characteristics, some with subcharacteristics, forming a comprehensive measurement framework.

2.1 Effectiveness and Efficiency

Effectiveness measures capture the accuracy and completeness with which users achieve specified goals. Typical measures include task completion rate (proportion of users who successfully complete a task), occurrence of errors during task execution, and critical error rate. Efficiency measures relate these accomplishments to the resources expended — most commonly time (task duration, time to first successful use) but also cognitive effort and material costs. For example, “Time to complete a specified task — mean” is a general (G) efficiency measure applicable across virtually all systems, while “Time to learn to use a specified function” is a specialized (S) measure relevant for training-intensive applications.

2.2 Satisfaction

Satisfaction is a multi-faceted characteristic with four subcharacteristics: usefulness (the degree to which users believe the product helps them achieve their goals), trust (user confidence that the product will perform as intended), pleasure (the degree of enjoyment from use), and comfort (physical ergonomic acceptability). Each subcharacteristic has dedicated measures, typically based on psychometric questionnaires using validated Likert-scale instruments. The standard emphasizes that satisfaction measurement requires rigorous psychometric methodology — questionnaire items must demonstrate reliability (Cronbach’s alpha >= 0.7) and validity (construct, content, and criterion-related).

Characteristic Subcharacteristic Example Measure (General) Application Domain
Effectiveness Task completion rate All interactive systems
Efficiency Time to complete a task (mean) Productivity applications
Satisfaction Usefulness User-perceived usefulness score Enterprise software
Satisfaction Trust User confidence rating E-commerce, banking
Satisfaction Pleasure Enjoyment rating Games, creative tools
Satisfaction Comfort Physical discomfort rating VR/AR, mobile devices
Freedom from Risk Economic risk Potential financial loss per incident Financial systems
Freedom from Risk Health & Safety Rate of user injury incidents Medical devices, automotive
Freedom from Risk Environmental Probability of environmental harm Industrial control systems
Context Coverage Context completeness Proportion of intended contexts supported Accessibility-critical systems
Context Coverage Flexibility Number of additional contexts usable Cross-platform products

2.3 Freedom from Risk and Context Coverage

Freedom from risk measures address the mitigation of economic, health and safety, and environmental risks arising from insufficient product quality. These measures are particularly critical in safety-related systems (ISO 26262, IEC 62304) where poor usability can directly lead to harm. Context coverage comprises context completeness (the degree to which a system works across all specified contexts) and flexibility (its ability to function in contexts beyond those initially specified). These measures are essential for accessible and inclusive design, ensuring systems serve users with diverse abilities, in varied environments, and across evolving use cases.

Unacceptable levels of freedom from risk can result from poor levels of usability, which can be caused by poor levels of product usability or by poor levels of other product quality characteristics. When designing safety-critical systems, quality in use measures must be integrated into the risk management process as defined in ISO 14971 or ISO 26262.

3. Engineering Design Insights for Quality in Use Programs

3.1 Normalization and Benchmarking Strategy

The standard identifies five approaches for interpreting quality in use measures: conformance (comparison with business requirements), benchmarking (comparison with competitor or legacy systems), time series analysis (trend tracking across versions), proficiency comparison (comparison with expert users), and population norms (using historical databases). For engineering teams, the most impactful strategy is establishing a baseline early in development. Running formative evaluations with as few as 5-8 representative users during prototyping can identify 80% of usability issues (per Nielsen’s ROI model), while summative evaluation for statistical confidence typically requires 20+ users per user group.

Formative evaluation with 5-8 representative users during prototyping identifies approximately 80% of usability issues. This early investment dramatically reduces the cost of quality in use defects — fixing a usability problem after release costs 10-100x more than addressing it during design.

3.2 Integrating Quality in Use into the Development Lifecycle

The standard explicitly links quality in use measurement to four development stages: requirements specification (setting target values), formative evaluation of prototypes (identifying problems early), summative evaluation (comparing design alternatives), and quality assurance/control (verifying the implemented system). A practical recommendation is to specify quality in use requirements quantitatively in the system requirements specification (SRS), e.g., “The system shall achieve a task completion rate >= 95% for experienced users within the first attempt, with mean task time <= 3 minutes." This transforms quality in use from a post-hoc validation activity into a design-driven engineering practice.

3.3 Psychometric Rigor in Satisfaction Measurement

Satisfaction measures typically rely on questionnaire-based instruments. The standard emphasizes that these instruments must demonstrate psychometric validity. For engineering teams building custom satisfaction questionnaires, this means: using multi-item scales (3-5 items per construct) rather than single questions, ensuring items are reviewed by domain experts for content validity, pilot-testing with representative users, and computing Cronbach’s alpha to verify internal consistency. Substituting ad-hoc single-question satisfaction ratings for properly validated instruments is a common engineering shortcut that can produce misleading results, particularly when making high-stakes decisions about product direction.

4. Frequently Asked Questions

Q: How does quality in use differ from usability?
A: In the SQuaRE quality model, usability is a subset of quality in use consisting of effectiveness, efficiency, satisfaction, and context coverage. Quality in use additionally includes freedom from risk. While ISO 9241-11 defines usability with effectiveness, efficiency, and satisfaction, the SQuaRE model extends this to encompass broader stakeholder concerns including economic, health, and environmental risk mitigation.
Q: Can quality in use be measured before the system is fully implemented?
A: Yes. Quality in use can be estimated during development using prototypes. Formative evaluation with low-fidelity or high-fidelity prototypes can identify quality in use problems early. However, the definitive measurement of quality in use requires the implemented system operating in its intended context with real users performing real tasks.
Q: What is the relationship between ISO/IEC 25022 and ISO 9241-11?
A: ISO/IEC 25022 and ISO 9241-11 share compatible definitions of effectiveness, efficiency, and satisfaction. The quality in use measures in ISO/IEC 25022 can be used as measures of usability as defined in ISO 9241-11. Annex C of the standard provides detailed guidance on this alignment.
Q: How many users are needed for a reliable quality in use evaluation?
A: For formative evaluations aimed at identifying problems, 5-8 representative users per user group are typically sufficient. For summative evaluations aimed at statistical validation against target values, the required sample size depends on the desired confidence level and effect size, but typically ranges from 20-40 users per group following ISO/TS 20282-2 guidelines.

Leave a Reply

Your email address will not be published. Required fields are marked *