Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
ISO/IEC 25022 is a key standard within the SQuaRE ISO/IEC 2502n Quality Measurement Division that defines how to measure quality in use — the degree to which a product or system can be used by specific users to meet their needs to achieve specific goals with effectiveness, efficiency, satisfaction, freedom from risk, and context coverage in specific contexts of use. It replaces the earlier ISO/IEC 9126-4:2004 and aligns with the updated quality in use model defined in ISO/IEC 25010.
What distinguishes quality in use from other forms of quality measurement is its focus on the outcomes of human-system interaction rather than intrinsic product properties. While product quality metrics (ISO/IEC 25023) examine the software itself — its code complexity, response times, or defect counts — quality in use measures what happens when real users apply the system to real tasks in real environments. This outcome-oriented perspective is essential for understanding whether a system genuinely delivers value to its stakeholders.
The standard defines measures organized under five top-level characteristics, some with subcharacteristics, forming a comprehensive measurement framework.
Effectiveness measures capture the accuracy and completeness with which users achieve specified goals. Typical measures include task completion rate (proportion of users who successfully complete a task), occurrence of errors during task execution, and critical error rate. Efficiency measures relate these accomplishments to the resources expended — most commonly time (task duration, time to first successful use) but also cognitive effort and material costs. For example, “Time to complete a specified task — mean” is a general (G) efficiency measure applicable across virtually all systems, while “Time to learn to use a specified function” is a specialized (S) measure relevant for training-intensive applications.
Satisfaction is a multi-faceted characteristic with four subcharacteristics: usefulness (the degree to which users believe the product helps them achieve their goals), trust (user confidence that the product will perform as intended), pleasure (the degree of enjoyment from use), and comfort (physical ergonomic acceptability). Each subcharacteristic has dedicated measures, typically based on psychometric questionnaires using validated Likert-scale instruments. The standard emphasizes that satisfaction measurement requires rigorous psychometric methodology — questionnaire items must demonstrate reliability (Cronbach’s alpha >= 0.7) and validity (construct, content, and criterion-related).
| Characteristic | Subcharacteristic | Example Measure (General) | Application Domain |
|---|---|---|---|
| Effectiveness | — | Task completion rate | All interactive systems |
| Efficiency | — | Time to complete a task (mean) | Productivity applications |
| Satisfaction | Usefulness | User-perceived usefulness score | Enterprise software |
| Satisfaction | Trust | User confidence rating | E-commerce, banking |
| Satisfaction | Pleasure | Enjoyment rating | Games, creative tools |
| Satisfaction | Comfort | Physical discomfort rating | VR/AR, mobile devices |
| Freedom from Risk | Economic risk | Potential financial loss per incident | Financial systems |
| Freedom from Risk | Health & Safety | Rate of user injury incidents | Medical devices, automotive |
| Freedom from Risk | Environmental | Probability of environmental harm | Industrial control systems |
| Context Coverage | Context completeness | Proportion of intended contexts supported | Accessibility-critical systems |
| Context Coverage | Flexibility | Number of additional contexts usable | Cross-platform products |
Freedom from risk measures address the mitigation of economic, health and safety, and environmental risks arising from insufficient product quality. These measures are particularly critical in safety-related systems (ISO 26262, IEC 62304) where poor usability can directly lead to harm. Context coverage comprises context completeness (the degree to which a system works across all specified contexts) and flexibility (its ability to function in contexts beyond those initially specified). These measures are essential for accessible and inclusive design, ensuring systems serve users with diverse abilities, in varied environments, and across evolving use cases.
The standard identifies five approaches for interpreting quality in use measures: conformance (comparison with business requirements), benchmarking (comparison with competitor or legacy systems), time series analysis (trend tracking across versions), proficiency comparison (comparison with expert users), and population norms (using historical databases). For engineering teams, the most impactful strategy is establishing a baseline early in development. Running formative evaluations with as few as 5-8 representative users during prototyping can identify 80% of usability issues (per Nielsen’s ROI model), while summative evaluation for statistical confidence typically requires 20+ users per user group.
The standard explicitly links quality in use measurement to four development stages: requirements specification (setting target values), formative evaluation of prototypes (identifying problems early), summative evaluation (comparing design alternatives), and quality assurance/control (verifying the implemented system). A practical recommendation is to specify quality in use requirements quantitatively in the system requirements specification (SRS), e.g., “The system shall achieve a task completion rate >= 95% for experienced users within the first attempt, with mean task time <= 3 minutes." This transforms quality in use from a post-hoc validation activity into a design-driven engineering practice.
Satisfaction measures typically rely on questionnaire-based instruments. The standard emphasizes that these instruments must demonstrate psychometric validity. For engineering teams building custom satisfaction questionnaires, this means: using multi-item scales (3-5 items per construct) rather than single questions, ensuring items are reviewed by domain experts for content validity, pilot-testing with representative users, and computing Cronbach’s alpha to verify internal consistency. Substituting ad-hoc single-question satisfaction ratings for properly validated instruments is a common engineering shortcut that can produce misleading results, particularly when making high-stakes decisions about product direction.