Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
ISO/IEC 29128-1:2024 is the first part of a multi-part standard dedicated to the performance evaluation of face recognition systems. It establishes a rigorous, standardised methodology for measuring how accurately a face recognition system can match probe images against gallery references across a wide range of operational conditions.
The standard arrives at a critical moment. Face recognition systems are deployed in high-stakes applications — law enforcement, airport border control, financial know-your-customer (KYC) checks — where both accuracy and fairness are under intense scrutiny. ISO/IEC 29128-1 provides the technical framework for transparent, reproducible, and bias-aware performance evaluation.
The standard defines a comprehensive set of metrics that go beyond simple accuracy rates:
| Metric | Definition | Typical Deployment Threshold |
|---|---|---|
| False Accept Rate (FAR) | Proportion of impostor comparisons incorrectly accepted | 1:100,000 (high security) |
| False Reject Rate (FRR) | Proportion of genuine comparisons incorrectly rejected | 1:100 (border control) |
| True Positive Identification Rate (TPIR) | Proportion of genuine probes correctly identified at rank-1 | >99% (gallery size <10^6) |
| False Positive Identification Rate (FPIR) | Proportion of impostor probes incorrectly identified | <1% (watchlist screening) |
| Equal Error Rate (EER) | Operating point where FAR = FRR | Reported for comparability |
Importantly, the standard mandates that all metrics be reported with confidence intervals and be stratified by demographic subgroups — including age, sex, and skin tone — to enable fairness analysis. This demographic stratification is one of the most significant advances in 29128-1 compared to earlier face recognition evaluation practices.
The standard specifies a test protocol with strict requirements to ensure statistical validity and reproducibility:
Dataset Composition. The evaluation dataset must include at least 10,000 distinct subjects, with a minimum of 3 images per subject. The dataset must be balanced across demographic groups, with no single group representing more than 40% of the total. Cross-quality variation is mandatory — images must span a range of resolutions, lighting conditions, and angles.
Protocol Structure. The standard defines three testing regimes: verification (1:1 matching), identification (1:N search), and open-set identification (where some probes have no matching gallery entry). Each regime has distinct reporting requirements.
Gallery Size Scaling. Performance must be reported at multiple gallery sizes (10^3, 10^4, 10^5, and 10^6) to characterise how accuracy degrades as the search space grows. This is critical for deployment planning.
A cornerstone of ISO/IEC 29128-1 is its treatment of demographic fairness. The standard requires:
| Requirement | Specification | Purpose |
|---|---|---|
| Stratified Reporting | FAR/FRR reported for each demographic group | Detect performance disparities |
| Fairness Metrics | Maximum group-wise difference in FAR and FRR | Quantify bias magnitude |
| Cross-quality Strata | Performance reported by image quality tier | Separate bias from quality effects |
| Adversarial Testing | Evaluation on deliberately challenging subgroups | Identify failure modes |
The standard does not prescribe a specific fairness threshold (e.g., ‘FAR difference must not exceed 5%’) because acceptable thresholds depend on the application. However, it mandates transparent reporting so that deployers can make informed decisions. A system that performs well on the overall population but poorly on a specific subgroup should be flagged during evaluation.