🌲 IEC 61025 Fault Tree Analysis (FTA): Systematic Top-Down Failure Reasoning from Top Event to Root Cause

IEC 61025 Fault Tree Analysis (FTA): Systematic Top-Down Failure Reasoning from Top Event to Root Cause

📖 Standard Overview
IEC 61025:2006 “Fault tree analysis (FTA)” is the international standard for fault tree methodology, prepared by IEC Technical Committee 56 (Dependability). This second edition (2006) supersedes the original 1990 edition with substantially expanded content including detailed quantitative methods, more worked examples, updated symbols, and expanded guidance on combining FTA with other dependability techniques. FTA is a top-down, deductive failure analysis method that starts from a system-level undesirable top event and systematically traces backward through causal logic to identify all combinations of basic events that could produce that outcome. It is widely deployed across safety-critical industries—nuclear power plant probabilistic safety assessment (PSA), aerospace system certification (ARP4761), railway signalling (EN 50126/50129), automotive functional safety (ISO 26262), chemical process hazard analysis, and medical device reliability assessment.

1. FTA Fundamentals: Logic Gates, Event Symbols, and Tree Architecture

1.1 The Core Mindset: Reasoning Backward from Consequence to Cause

The analytical mindset of FTA is the mirror image of FMEA. FMEA asks: “If this component fails, what happens?” (bottom-up, inductive). FTA asks: “Given this system-level failure has occurred, what combinations of events could have caused it?” (top-down, deductive). This top-down approach makes FTA uniquely suited for early-phase design analysis—you do not need to enumerate every component failure mode upfront. You start from the one unacceptable outcome that matters most, and iteratively ask “Why?” until you reach the root causes.

IEC 61025 defines a fault tree as “an organized graphical representation of the conditions or other factors causing or contributing to the occurrence of a defined outcome, referred to as the top event.” The tree is constructed as an inverted logical structure: the top event at the apex, connected through logic gates to intermediate events and ultimately to basic events at the leaves.

1.2 Core Elements of a Fault Tree

Per IEC 61025 definitions, a fault tree comprises these essential elements:

(1) Top Event (3.2) — The undesirable final event under investigation, positioned at the apex of the tree. This is the starting point and analytical focus. Examples: “Main power bus loss causing total plant shutdown,” “Braking system fails to provide deceleration.”
(2) Intermediate Event (3.11) — An event that is neither top nor primary; typically the result of one or more input events combined through a gate, and in turn an input to a higher-level event.
(3) Basic Event (3.9) — An event or state at the bottom of the tree that cannot be developed further. It represents the analysis boundary, typically corresponding to specific component failure modes, human errors, or environmental conditions with known or estimable probabilities.
(4) Undeveloped Event (3.12) — An event not further decomposed—either because detailed information is unavailable, it is developed in another analysis, or it represents a Commercial-Off-The-Shelf (COTS) item treated as a black box.
(5) Gate (3.5) — A symbol establishing the logical relationship between output and input events.

1.3 Essential Logic Gates—The Grammar of Fault Trees

Logic gates form the syntactic backbone of every fault tree. IEC 61025 Annex A provides the full symbol library. The following are the most fundamental gates every engineer must master:

**Table 1: Core Fault Tree Logic Gates (IEC 61025 Annex A)**
Gate Type	Symbol Meaning	Output Occurs When…	Probability Formula (Independent Events)
OR Gate	Any single input event produces the output	A or B or C … any one occurs	P = 1 – ∏(1-P_i)
AND Gate	All input events must occur together	A and B and C … all occur	P = ∏ P_i
PAND Gate (Priority AND)	All inputs occur in a specific sequence	A occurs before B	Dynamic gate; requires sequential probability model
Voting Gate (K/N)	K out of N input events occur	K of N (e.g. 2-out-of-3 logic)	Binomial probability distribution
XOR Gate (Exclusive OR)	Exactly one input event occurs	A or B occurs, but not both	P = P_A+P_B-2P_AP_B
INHIBIT Gate	Conditional event AND input event both satisfied	Input event + enabling condition	P = P_in × P_cond

💡 Engineering Insight: OR and AND as the Yin-Yang of Reliability
An OR gate represents series logic—any single component failure causes system failure, degrading reliability. An AND gate represents parallel redundancy—multiple independent failures must coincide to cause system failure, enhancing reliability. In design practice, identifying single-point failures hidden under OR gates and converting critical OR paths to AND configurations (adding redundancy) is the most direct reliability improvement strategy derivable from FTA. This is why FTA functions not merely as an “analysis tool” but as a design optimization instrument—it reveals precisely where the architecture is vulnerable and what kind of redundancy would close the gap.

⚠️ Common Pitfall #1: Confusing Transfer Gates with Logic Gates
In large fault trees, transfer gates (triangles) are routinely used to connect one subtree to another location, avoiding redundant drawing. Novice analysts often treat the transfer symbol as a simple “event portal” rather than a “logical bridge,” which can inadvertently extend the OR logic of a subtree into an unintended scope. The IEC 61025 standard practice is: transfer symbols must be explicitly labelled with source and destination identifiers, and the transferred subtree must preserve its complete logical structure—you cannot transfer only a partial collection of events.

2. The Full FTA Methodology: From System Familiarization to Probabilistic Quantification

2.1 The Seven-Step FTA Process (IEC 61025 Clause 7)

IEC 61025 Clause 7 describes a structured seven-step FTA workflow. This is not a casual “draw a tree” exercise—it is a rigorous engineering analysis activity requiring systematic thinking:

(1) Scope Definition — Define system boundaries, analysis depth (at what level do you stop and call it a basic event?), operating conditions, and assumptions. This step determines the “resolution” of the fault tree: too shallow and you miss critical causes; too deep and the tree becomes unmanageably large.
(2) System Familiarization — Develop a thorough understanding of the system design, including functional block diagrams, interface definitions, operating modes, and boundary conditions. The analysis team must include design engineers, reliability engineers, and system engineers familiar with the product. IEC 61025 explicitly emphasizes that FTA is a team activity—relying on a single analyst risks missing cross-functional causal chains.
(3) Top Event Definition — Precisely describe the undesirable event to be analyzed. The standard mandates that the top event must be clearly bounded, not vague. “System failure” is an unacceptable top event definition; “Main cooling loop loses circulation capability within 72 hours of rated-power operation” is an acceptable one.
(4) Fault Tree Development — Starting from the top event, iterate downward: “What direct causes could produce this event? Are these causes in AND or OR relationship?” Each layer answers this question until reaching the basic event level.
(5) Fault Tree Construction — Draw the complete fault tree using standardized graphic symbols. Events below each gate must satisfy the Immediate Cause principle—they must be the direct, not indirect, causes of the output event.
(6) Qualitative Analysis — Identify all minimal cut sets (MCS) and single-point failures. A minimal cut set is the “smallest combination of basic events that, if all occur, would cause the top event”—removing any one event from the set prevents the top event.
(7) Quantitative Analysis — When basic event probabilities are available, compute the probability of each intermediate event and the top event by propagating upward. Perform importance analysis to rank basic events by their contribution to top event probability, guiding design improvement prioritization.

2.2 Minimal Cut Sets—The Core Output of Qualitative FTA

The minimal cut set (MCS) is arguably the most important concept in fault tree qualitative analysis. IEC 61025 defines a cut set as “a group of events that, if all occur, would cause occurrence of the top event,” while a minimal cut set is “the minimum, or the smallest set of events needed to occur to cause the top event—the non-occurrence of any one of the events in the set would prevent the occurrence of the top event.”

Interpreting MCS order for engineering decisions:

First-order cut set (single-event MCS) = Single point failure. This event alone is sufficient to cause system failure—the most dangerous category. Identifying and eliminating all first-order MCS is a fundamental requirement for safety-critical system design.
Second-order cut set (two-event combination) = Requires two events to coincide. Represents some redundancy protection, but if both events share a common cause (common cause failure, CCF), the analysis will severely underestimate actual risk.
Higher-order cut sets (three or more events) = Extremely low probability but not impossible. In nuclear safety-grade systems, third-order MCS are still mandatory review items under probabilistic safety assessment (PSA) requirements.

**Table 2: Minimal Cut Set Order and Corresponding Design Decisions**
MCS Order	Engineering Significance	Required Design Response	Typical Example
1st (single event)	Single-point failure, zero protection	Must eliminate: add redundancy or independent protection layer	Emergency stop button on a single circuit
2nd (two events)	Requires two independent coincident events	Assess CCF potential; if unacceptable, add diversity	Primary pump + backup pump both fail (but if they share a cooling water source = CCF risk)
3rd and above	Requires three or more independent events	Typically probability-acceptable; still check for shared external threats	Three independent sensors simultaneously drift beyond limit

2.3 Quantitative FTA—Probability Calculation and Importance Measures

When basic event probability data is available (from testing, field data, or industry databases such as FMD/NPRD), FTA can perform quantitative analysis. IEC 61025 provides the following core calculation framework:

OR Gate Probability (Exact Formula):

P(T) = 1 – ∏(1 – P_i)

For low-probability events (P < 0.1), engineers commonly use the Rare Event Approximation: P(T) ≈ ∑ P_i. However, when cut sets contain repeated events, exact calculation requires the Esary-Proschan method or disjointing. IEC 61025 Annex B provides a detailed disjointing procedure.

AND Gate Probability:

P(T) = ∏ P_i (assuming independent events)

The Bridge Circuit—FTA’s Classic Complex Dependency Case: IEC 61025 Clause 7 (Figures 8-12) uses a bridge circuit as a worked example, demonstrating how a component (the middle bridge arm) that simultaneously participates in multiple failure paths makes simple cut-set multiplication either overestimate or underestimate the true probability. Exact calculation requires disjointing techniques to convert overlapping cut sets into mutually exclusive form.

🛑 Common Pitfall #2: Ignoring Common Cause Failure (CCF)
Common cause failure is the single largest “hidden risk” in FTA quantitative analysis. When a fault tree contains multiple AND gates (representing redundant protections), if those “redundant” paths share a common cause—identical component models, same supplier, shared environmental stress (temperature/vibration/EMI)—then the events beneath the AND gate are not independent. The actual top event probability may be hundreds to thousands of times higher than the independence-assumption calculation. IEC 61025 (3.14) explicitly requires that repeated events and common cause events be marked in the fault tree with special symbols (such as the diamond symbol), and that quantitative calculations apply CCF factors (e.g., beta-factor model, MGL method) for correction. This is a mandatory requirement under nuclear safety standards (NUREG/CR-5485) and functional safety (ISO 26262-5, Annex D).

3. FTA vs. FMEA vs. ETA: Positioning Three Complementary Reliability Analysis Methods

IEC 61025 Clause 5.4 explicitly addresses the combined use of FTA with other reliability techniques. This is one of the most consequential methodological decisions in engineering practice—choosing the right analysis tool for the problem at hand.

3.1 Core Distinctions: Reasoning Direction and Analytical Focus

**Table 3: Systematic Comparison of FTA, FMEA, and ETA**
Dimension	FTA (Fault Tree Analysis)	FMEA (Failure Modes & Effects)	ETA (Event Tree Analysis)
Reasoning Direction	Top-down (Deductive)	Bottom-up (Inductive)	Left-to-right (Forward)
Starting Point	System-level top event (consequence)	Component-level failure mode (cause)	Initiating event (trigger)
Core Question	“How could this top event occur?”	“What happens if this component fails?”	“Starting from this trigger, what happens next?”
Logic Elements	AND/OR/PAND/Voting gates	No logic gates; item-by-item evaluation	Branch logic (success/failure paths)
Output	Minimal cut sets, top event probability	Failure effect list, RPN ranking	Accident sequences, sequence probabilities
Best-Suited Scenario	Safety-critical systems, multi-factor combination failures	Design review, manufacturing process analysis	Accident evolution, emergency response assessment
Key Strength	Handles logic combinations, quantitative probability	Systematic, exhaustive coverage, easy to execute	Handles time sequences, defence-in-depth analysis
Key Limitation	Complex to construct, CCF easy to miss	Cannot handle multiple failure combinations well	Dependency handling difficult, branch explosion
Governing IEC Standard	IEC 61025	IEC 60812	IEC 62502

3.2 FTA and FMEA: The “Gold Standard” Complementary Pair

IEC 61025 Clause 5.4.1 states that the combination of FTA and FMEA is “often recommended by sector-specific standards, in particular safety standards and transportation standards.” The relationship between the two can be summarized as:

FTA (Deductive) + FMEA (Inductive) = Complete Failure Analysis Closed Loop

Specific complementarities include:

Consistency Check: Any single failure identified in FMEA that leads to the fault tree top event must also appear as a single-point failure (first-order MCS) in the FTA. Conversely, every single-point failure found in FTA must be recorded in FMEA. This cross-validation dramatically improves analysis completeness.
Coverage Complementarity: FMEA excels at exhaustively enumerating individual component failure modes but struggles with multi-failure combinations; FTA excels at handling logical combinations and multiple failures but may miss certain basic events. Together they form a complete “safety case.”
Analysis Sequencing: In practice, use FTA first to identify top events (unacceptable consequences) and the combinations of conditions that produce them, then use FMEA to systematically analyze all bottom-level failure modes that could contribute to those conditions. Finally, perform the consistency check to ensure no gaps.

🎓 Practical Engineering Guidance: The FTA-FMEA Consistency Check
IEC 61025 emphasizes that the value of the consistency check “is increased if the analyses are performed separately and independently.” For systems at Safety Integrity Level (SIL) 3/4 or Automotive Safety Integrity Level (ASIL) C/D, it is strongly recommended that different engineers or independent teams perform the FTA and FMEA, then cross-compare results. Any discrepancy—such as an FTA-identified single-point failure missing from FMEA, or vice versa—indicates a gap in one of the analyses that must be resolved before design freeze. This independent verification principle is embedded in IEC 61508, ISO 26262, and ARP4754/4761.

4. Practical Fault Tree Construction: Engineering Techniques and Common Traps

4.1 The Immediate Cause Principle—The Litmus Test for Fault Tree Quality

The single most common quality defect in fault tree construction is violation of the Immediate Cause Principle. IEC 61025 requires that each layer of input events below a logic gate must be the direct causes of the output event—not indirect or remote causes.

Unacceptable Example: Under the top event “Engine fails to start,” placing “spark plug aging,” “fuel pump failure,” and “battery discharge” directly under an OR gate. While all are possible root causes, they are not direct causes of “engine fails to start”—the direct causes are “no ignition,” “no fuel delivery,” and “no cranking power.” Spark plug aging is a sub-cause of “no ignition” and should be developed at the next level down.

Correct Approach: At each level, answer only one question: “What directly causes this event?” If your answer contains intermediate logical steps, you need to insert an intermediate event layer. This discipline produces fault trees with clear hierarchy, coherent logic, and easy reviewability.

4.2 Repeated Events—The Most Underestimated Risk in FTA

A repeated event (IEC 61025, 3.16) is an event that serves as input to more than one higher-level event. In large fault trees, this is extremely common—a single power supply module may feed the controller, communication module, and sensors simultaneously; its failure affects multiple branches at once.

Handling repeated events is one of the most complex aspects of quantitative FTA. Treating repeated events as independent for probability multiplication results in:

OR gates: underestimated risk (repeated event introduces false redundancy into the calculation)
AND gates: overestimated risk (the same event incorrectly treated as two independent occurrences)

IEC 61025 Annex B provides the detailed disjointing procedure—using Boolean algebra to convert cut sets containing repeated events into mutually exclusive, non-overlapping minimal cut sets, enabling correct probability calculation. Modern FTA software tools (Isograph Reliability Workbench, ReliaSoft BlockSim, CAFTA, RiskSpectrum) automate this process, but engineers must understand the underlying principle to correctly interpret the results.

4.3 Tree Depth—Knowing When to Stop

IEC 61025 instructs that fault tree development should proceed “to the level at which probability data for basic events is obtainable.” In practice:

Do not develop beyond the level where probability data exists—e.g., do not expand “diode short-circuit” into “PN junction metal migration” unless your organization maintains a detailed semiconductor failure mechanism database.
Do not stop before the level where probability data is available—e.g., do not treat “power module failure” as a basic event if you have independent failure rate data for its internal components (capacitor, transformer, switching transistor).
Commercial-Off-The-Shelf (COTS) items are a legitimate reason to stop developing—when a supplier provides a module-level failure rate (e.g., MTBF = 100,000 hours), treating it as an undeveloped basic event is appropriate and should be explicitly annotated.

⚠️ Common Pitfall #3: Skipping System Familiarization and Jumping Straight to Drawing the Tree
IEC 61025 designates “System Familiarization” as a distinct and mandatory FTA step. In practice, many engineers skip this phase and start drawing the tree immediately—analogous to “beginning to solder before reading the circuit schematic.” The system familiarization phase should collect and review: functional block diagrams, interface control documents (ICDs), operational profiles, historical failure data, environmental condition specifications, maintenance strategies, and any existing FMEA/ETA results. Without this foundation, the fault tree will almost certainly miss critical causal chains. The standard recommends that system familiarization include at least one cross-functional team system walkthrough session.

5. Frequently Asked Questions (FAQ)

Q1: Which comes first—FTA or FMEA? Can I do just one?: FTA and FMEA are complementary, not sequential. For safety-critical systems (SIL ≥ 2 or ASIL C/D), virtually all industry standards (IEC 61508, ISO 26262, ARP4761) require both FTA and FMEA to be provided, because they address “combination failures” and “single-point failures” as distinct analytical spaces. In practice, you can run FTA first to identify high-risk top events and their minimal cut sets, then use FMEA to exhaustively enumerate possible basic events; or start with FMEA to build a complete basic event library, then use FTA to construct the causal logic chain. Doing only FTA risks missing certain unanticipated basic events; doing only FMEA cannot properly handle multi-failure combinations.
Q2: How deep should a fault tree go? When do I stop?: IEC 61025 provides three stopping rules: (1) When the event’s probability can be directly obtained—whether through test data, field statistics, or supplier-provided failure rates; (2) When further development offers no additional analytical value—for example, the event already represents the lowest replaceable unit (LRU); (3) When the event is marked as “undeveloped”—such as COTS components or subtrees processed in another analysis. A practical heuristic: total tree depth should typically not exceed 6-8 levels. Beyond this, consider modularization using subtrees and transfer gates for manageability.
Q3: Can I perform quantitative FTA without basic event probability data?: Without accurate probability data, the scientific value of quantitative FTA diminishes significantly. However, IEC 61025 suggests two alternatives: (1) Qualitative ranking—use descriptive likelihood labels such as “highly probable,” “very probable,” “medium probability,” “remote probability,” “extremely improbable” in place of numerical values, for initial critical cut-set screening; (2) Sensitivity analysis—assign hypothetical probability ranges to basic events and observe how top event probability responds to variation in each basic event. Even if absolute values are imprecise, the sensitivity ranking is generally engineering-meaningful. For new designs without field data, industry databases (FMD-2016, NPRD-2016, MIL-HDBK-217F) provide defensible initial numerical sources.
Q4: What is the relationship between Success Tree Analysis (STA) and Fault Tree Analysis (FTA)?: IEC 61025 notes that when the top event is defined as a success rather than a failure, the fault tree becomes a success tree (STA). The two are mathematical complements—replace all AND gates with OR gates (and vice versa), and replace each event with its logical complement, to obtain the corresponding success tree from a fault tree. STA is used less frequently in engineering practice because most safety analyses aim to “prevent bad things from happening” rather than “ensure good things happen.” STA finds its primary application in availability analysis—starting from conditions that must all be “operational” and analyzing the logical combinations required to maintain system availability. In nuclear power plant Probabilistic Safety Assessment (PSA), the combined use of event trees and fault trees essentially embeds a set of success trees (at each branch node) into the forward event evolution sequence.

💡 Key Takeaway: IEC 61025:2006 provides a complete, internationally standardized methodology for fault tree analysis. The real value of FTA lies not in producing a visually impressive logic tree, but in compelling engineering teams to engage in structured “failure reasoning”—starting from the most unacceptable consequence, systematically tracing backward through every layer of causality until every possible combination of root causes is identified. Combined with the inductive thinking of FMEA, FTA forms an indispensable “dual engine” for safety-critical engineering design. As IEC 61025 itself states: “the use of both deductive and inductive reasoning is regarded as a good argument for providing assurance for the completeness of an analysis.” In an era of increasingly complex systems and tightening functional safety requirements, mastery of FTA is not merely a core competency for reliability engineers—it is a fundamental skill for every engineer designing safety-critical systems.

📥 Standard Documents Download

🔒

Please wait 10 seconds, the download links will appear after the ad loads

IEC 61025-2006.pdf