Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
ISO/IEC 25045 is part of the Quality Evaluation Division (ISO/IEC 2504n) within the SQuaRE series. It provides a specialized evaluation module for measuring the recoverability sub-characteristic of software reliability. What makes this standard particularly valuable for practicing engineers is its disturbance injection methodology — a systematic, repeatable approach to quantifying how well a system withstands and recovers from operational faults and unexpected events.
The standard defines two primary quality measures:
| Autonomic Level | Score | Description | Detection Example |
|---|---|---|---|
| Basic | 0 | Manual management via reports and product manuals | Help desk calls operators about user complaints |
| Managed | 1 | Management software automates IT tasks | Operators monitor a single management console |
| Predictive | 2 | Tools analyze changes and recommend actions | Autonomic manager notifies operator of a potential problem |
| Adaptive | 3 | Components collectively monitor, analyze, take action with minimal intervention | System detects and analyzes without human involvement, may initiate recovery |
| Autonomic | 4 | Fully automated management by business rules and policies | End-to-end autonomic detection, analysis, and recovery |
The evaluation methodology consists of three phases: Baseline, Test, and Check. The Baseline phase establishes normal operational characteristics without disturbances. The Test phase runs the same workload while injecting disturbances. The Check phase verifies system integrity after disturbance testing.
Each disturbance injection is organized into an injection slot with five sub-intervals: Injection Interval (steady state before fault), Detection Interval (time to detect the fault), Recovery Initiation Interval (time to begin recovery), Recovery Interval (time to perform recovery), and Keep Interval (time to re-establish steady state after recovery).
The standard defines five mandatory disturbance categories for conformance testing:
| Category | Examples | Engineering Relevance |
|---|---|---|
| Unexpected Shutdown | OS shutdown, process termination, network link failure | Simulates operator errors and software crashes — the most common class of production incidents |
| Resource Contention | CPU hog, memory exhaustion, I/O saturation, DBMS deadlock, runaway query, disk full | Simulates noisy neighbor scenarios and resource leaks — increasingly important in multi-tenant cloud environments |
| Loss of Data | Database file deletion, disk loss, table corruption | Simulates storage failures and accidental data deletion — tests backup and recovery mechanisms |
| Load Resolution | 2x and 10x user surge | Simulates traffic spikes (flash crowds, DDoS, viral events) — tests auto-scaling and flow control |
| Restart Failure | Corrupted boot configuration, missing executables | Simulates failures that occur during recovery itself — tests robustness of the recovery mechanism |
| Use Case | How ISO/IEC 25045 Applies | Modern Implementation |
|---|---|---|
| Pre-production validation | Run disturbance injection as part of system verification testing | Integrate chaos experiments into CI/CD pipelines |
| Production readiness assessment | Evaluate recoverability of production systems against test environments | Game days and controlled blast-radius experiments |
| Vendor comparison | Compare recoverability of different solutions using common workload | Standardized benchmark suites with fault injection |
| SLA validation | Verify that recovery time objectives (RTO) are met under disturbance | Automated SLA verification with fault injection scenarios |