IEC 61907-2009 – Communication Network Dependability: Reliability, Availability and Maintainability

Standard: IEC 61907-2009 | Category: Network Dependability | Published: 2009

💡 IEC 61907 provides a comprehensive framework for specifying, analyzing, and verifying the dependability of communication networks — covering everything from backbone fiber infrastructure to wireless access networks and industrial fieldbuses.

1. Scope and Fundamental Concepts

IEC 61907-2009 defines the dependability requirements for communication networks in terms of reliability (MTBF), availability (uptime percentage), maintainability (MTTR), and serviceability (quality of support). The standard applies to all types of communication networks including wired (Ethernet, SDH/SONET, MPLS), wireless (4G/5G, Wi-Fi, microwave links), and industrial networks (PROFIBUS, PROFINET, Modbus TCP). It addresses network dependability from both the user perspective (end-to-end service quality) and the infrastructure provider perspective (network element reliability).

A key contribution of the standard is its definition of dependability metrics specifically tailored to communication networks, recognizing that network dependability differs from conventional component-level reliability due to factors such as traffic-dependent failure modes, protocol-induced recovery behaviors, and the hierarchical nature of network architectures. The standard introduces the concept of “service-specific availability” — the probability that a specific network service (e.g., VoIP, video streaming, SCADA telemetry) meets its performance requirements at any given time — as distinct from infrastructure availability.

⚠ A common pitfall in network dependability analysis is equating network element availability with service availability. A network with 99.999% availability at the element level may still deliver only 99.9% end-to-end service availability due to protocol interactions, congestion-dependent failures, and maintenance window accumulations. IEC 61907 emphasizes the importance of end-to-end service availability as the true measure of network dependability.

2. Dependability Metrics and Calculation Methods

2.1 Network-Specific Reliability Metrics

The standard defines several network-specific reliability metrics. Mean Time Between Service Outages (MTBSO) measures the average interval between service-affecting failures, accounting for the fact that a single network element failure may or may not cause a service outage depending on redundancy design. Mean Time To Restore Service (MTTRS) measures the average time to restore a service after an outage, including detection, diagnosis, repair, and verification time. The standard provides mathematical models for calculating these metrics for series, parallel, and mesh network topologies, with particular attention to the common-cause failure modes that affect redundant paths (e.g., shared cable ducts, common power sources, and software commonality).

2.2 Availability Models for Redundant Architectures

The standard presents detailed availability calculation methods for common network redundancy architectures. For 1+1 protection (dedicated protection), availability is calculated using parallel system models with perfect switching. For 1:N protection (shared protection), the model accounts for the probability of simultaneous failures exceeding the protection capacity. For mesh-restorable networks, the standard introduces a novel metric — the “restorability ratio” — defined as the probability that a working path affected by a failure can be restored within a specified time threshold. The standard also addresses the impact of maintenance activities on availability, introducing the concept of “maintenance window availability” — the achievable availability considering planned preventive maintenance.

Architecture	Typical Availability	MTBSO	Protection Switching Time
Unprotected point-to-point	99.9% (3 nines)	~8.76 hours/year downtime	N/A
1+1 dedicated protection	99.999% (5 nines)	~5.26 minutes/year	< 50 ms
1:N shared protection	99.99% (4 nines)	~52.6 minutes/year	< 50 ms
Mesh restoration (dynamic)	99.995% (4.5 nines)	~26.3 minutes/year	100 ms – 2 s
Dual-homed (diverse routing)	99.9999% (6 nines)	~31.5 seconds/year	< 10 ms
Self-healing ring (SDH)	99.999% (5 nines)	~5.26 minutes/year	< 60 ms

3. Dependability in the Network Lifecycle

3.1 Design Phase Dependability Allocation

The standard provides a top-down dependability allocation methodology. Starting from the end-to-end service availability requirement, the designer allocates availability targets to individual network segments, subnets, and finally to individual network elements using reliability block diagrams (RBD, per IEC 61078) or fault tree analysis (FTA, per IEC 61025). The allocation must account for the criticality of each network segment: core/backbone networks are typically allocated 99.999% availability, distribution networks 99.99%, and access networks 99.9%. These segment-level targets then drive the required MTBF for routers, switches, links, and power supplies within each segment.

3.2 Operational Phase Dependability Verification

The standard requires continuous dependability monitoring during network operation, using both active measurements (synthetic transaction probes that measure service availability end-to-end) and passive measurements (network management system event correlation). A key metric is the “service degradation ratio” — the proportion of time that service quality falls below acceptable thresholds but not low enough to constitute an outage. The standard recommends that operational dependability data be collected over rolling 12-month windows, with monthly and quarterly reviews against design targets. When measured dependability falls below the allocated values for two consecutive review periods, a formal corrective action process must be initiated.

✅ Engineering Insight: The service degradation ratio is often a more useful operational metric than strict availability, particularly for real-time applications such as voice and video. A network might achieve 99.999% availability (meaning it is never “down”), yet deliver unacceptable quality due to latency spikes or packet loss causing MOS (Mean Opinion Score) degradation for VoIP traffic. Monitoring both availability AND quality degradation metrics provides a complete picture of network dependability.

4. Maintainability and Repair Strategies

The standard addresses network maintainability through the Mean Time To Repair (MTTR) metric, but recognizes that in communication networks, repair time is dominated by diagnosis and logistics rather than physical repair. For fiber optic cable breaks — the most common cause of extended network outages — the typical MTTR breakdown is: fault detection and localization (10-30 minutes), dispatch and travel (1-4 hours), cable repair/splicing (2-6 hours), and service verification (30 minutes). The standard recommends that network operators maintain geographically distributed spares depots and pre-negotiated access agreements to reduce the logistics component of MTTR. For equipment failures, the standard recommends a “4-hour response, 8-hour repair” target for critical network elements, with on-site spare units for core network nodes.

5. Frequently Asked Questions

Q1: How does IEC 61907 relate to ITU-T reliability standards?

A: IEC 61907 is complementary to ITU-T standards such as G.827 (availability targets for international paths) and M.2100 (performance limits for international PDH/SDH paths). Where ITU-T standards focus on performance thresholds for specific network types, IEC 61907 provides the general methodology for dependability management applicable to any network.

Q2: What is the practical limit of network availability?

A: For a single network domain with fully redundant infrastructure, 99.999% (5 nines) is achievable but requires significant investment. 99.9999% (6 nines) is considered the practical maximum for terrestrial networks, equivalent to approximately 31 seconds of downtime per year. Achieving this requires protection against power failures, fiber cuts, hardware failures, and software faults simultaneously.

Q3: How should software failures be treated in network dependability models?

A: Software failures present a unique challenge because they violate the “random failure” assumption underlying traditional reliability models. The standard recommends treating software faults as systematic failures and modeling their impact through “failure mode and effects analysis” (FMEA) rather than statistical MTBF approaches.

Q4: Can IEC 61907 be applied to cloud and virtualized networks?

A: The 2009 edition predates widespread cloud network adoption, but its principles apply to virtualized networks with modifications. Virtual network functions (VNFs) introduce additional failure modes such as hypervisor faults, resource contention, and orchestration failures that must be incorporated into the dependability model.

📥 Standard Documents Download

🔒

Please wait 10 seconds, the download links will appear after the ad loads

IEC 61907-2009.pdf

1. Scope and Fundamental Concepts

2. Dependability Metrics and Calculation Methods

2.1 Network-Specific Reliability Metrics

2.2 Availability Models for Redundant Architectures

3. Dependability in the Network Lifecycle

3.1 Design Phase Dependability Allocation

3.2 Operational Phase Dependability Verification

4. Maintainability and Repair Strategies

5. Frequently Asked Questions

Q1: How does IEC 61907 relate to ITU-T reliability standards?

Q2: What is the practical limit of network availability?

Q3: How should software failures be treated in network dependability models?

Q4: Can IEC 61907 be applied to cloud and virtualized networks?

📥 Standard Documents Download

Leave a ReplyCancel Reply

Trending now