Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
IEC 61907-2009 defines the dependability requirements for communication networks in terms of reliability (MTBF), availability (uptime percentage), maintainability (MTTR), and serviceability (quality of support). The standard applies to all types of communication networks including wired (Ethernet, SDH/SONET, MPLS), wireless (4G/5G, Wi-Fi, microwave links), and industrial networks (PROFIBUS, PROFINET, Modbus TCP). It addresses network dependability from both the user perspective (end-to-end service quality) and the infrastructure provider perspective (network element reliability).
A key contribution of the standard is its definition of dependability metrics specifically tailored to communication networks, recognizing that network dependability differs from conventional component-level reliability due to factors such as traffic-dependent failure modes, protocol-induced recovery behaviors, and the hierarchical nature of network architectures. The standard introduces the concept of “service-specific availability” — the probability that a specific network service (e.g., VoIP, video streaming, SCADA telemetry) meets its performance requirements at any given time — as distinct from infrastructure availability.
The standard defines several network-specific reliability metrics. Mean Time Between Service Outages (MTBSO) measures the average interval between service-affecting failures, accounting for the fact that a single network element failure may or may not cause a service outage depending on redundancy design. Mean Time To Restore Service (MTTRS) measures the average time to restore a service after an outage, including detection, diagnosis, repair, and verification time. The standard provides mathematical models for calculating these metrics for series, parallel, and mesh network topologies, with particular attention to the common-cause failure modes that affect redundant paths (e.g., shared cable ducts, common power sources, and software commonality).
The standard presents detailed availability calculation methods for common network redundancy architectures. For 1+1 protection (dedicated protection), availability is calculated using parallel system models with perfect switching. For 1:N protection (shared protection), the model accounts for the probability of simultaneous failures exceeding the protection capacity. For mesh-restorable networks, the standard introduces a novel metric — the “restorability ratio” — defined as the probability that a working path affected by a failure can be restored within a specified time threshold. The standard also addresses the impact of maintenance activities on availability, introducing the concept of “maintenance window availability” — the achievable availability considering planned preventive maintenance.
| Architecture | Typical Availability | MTBSO | Protection Switching Time |
|---|---|---|---|
| Unprotected point-to-point | 99.9% (3 nines) | ~8.76 hours/year downtime | N/A |
| 1+1 dedicated protection | 99.999% (5 nines) | ~5.26 minutes/year | < 50 ms |
| 1:N shared protection | 99.99% (4 nines) | ~52.6 minutes/year | < 50 ms |
| Mesh restoration (dynamic) | 99.995% (4.5 nines) | ~26.3 minutes/year | 100 ms – 2 s |
| Dual-homed (diverse routing) | 99.9999% (6 nines) | ~31.5 seconds/year | < 10 ms |
| Self-healing ring (SDH) | 99.999% (5 nines) | ~5.26 minutes/year | < 60 ms |
The standard provides a top-down dependability allocation methodology. Starting from the end-to-end service availability requirement, the designer allocates availability targets to individual network segments, subnets, and finally to individual network elements using reliability block diagrams (RBD, per IEC 61078) or fault tree analysis (FTA, per IEC 61025). The allocation must account for the criticality of each network segment: core/backbone networks are typically allocated 99.999% availability, distribution networks 99.99%, and access networks 99.9%. These segment-level targets then drive the required MTBF for routers, switches, links, and power supplies within each segment.
The standard requires continuous dependability monitoring during network operation, using both active measurements (synthetic transaction probes that measure service availability end-to-end) and passive measurements (network management system event correlation). A key metric is the “service degradation ratio” — the proportion of time that service quality falls below acceptable thresholds but not low enough to constitute an outage. The standard recommends that operational dependability data be collected over rolling 12-month windows, with monthly and quarterly reviews against design targets. When measured dependability falls below the allocated values for two consecutive review periods, a formal corrective action process must be initiated.
The standard addresses network maintainability through the Mean Time To Repair (MTTR) metric, but recognizes that in communication networks, repair time is dominated by diagnosis and logistics rather than physical repair. For fiber optic cable breaks — the most common cause of extended network outages — the typical MTTR breakdown is: fault detection and localization (10-30 minutes), dispatch and travel (1-4 hours), cable repair/splicing (2-6 hours), and service verification (30 minutes). The standard recommends that network operators maintain geographically distributed spares depots and pre-negotiated access agreements to reduce the logistics component of MTTR. For equipment failures, the standard recommends a “4-hour response, 8-hour repair” target for critical network elements, with on-site spare units for core network nodes.
A: IEC 61907 is complementary to ITU-T standards such as G.827 (availability targets for international paths) and M.2100 (performance limits for international PDH/SDH paths). Where ITU-T standards focus on performance thresholds for specific network types, IEC 61907 provides the general methodology for dependability management applicable to any network.
A: For a single network domain with fully redundant infrastructure, 99.999% (5 nines) is achievable but requires significant investment. 99.9999% (6 nines) is considered the practical maximum for terrestrial networks, equivalent to approximately 31 seconds of downtime per year. Achieving this requires protection against power failures, fiber cuts, hardware failures, and software faults simultaneously.
A: Software failures present a unique challenge because they violate the “random failure” assumption underlying traditional reliability models. The standard recommends treating software faults as systematic failures and modeling their impact through “failure mode and effects analysis” (FMEA) rather than statistical MTBF approaches.
A: The 2009 edition predates widespread cloud network adoption, but its principles apply to virtualized networks with modifications. Virtual network functions (VNFs) introduce additional failure modes such as hypervisor faults, resource contention, and orchestration failures that must be incorporated into the dependability model.