ISO/IEC 29128-1:2024 — Biometrics — Face Recognition Performance — Part 1

Standardised Benchmarks for Face Recognition System Evaluation

ISO/IEC 29128-1:2024 is the first part of a multi-part standard dedicated to the performance evaluation of face recognition systems. It establishes a rigorous, standardised methodology for measuring how accurately a face recognition system can match probe images against gallery references across a wide range of operational conditions.

The standard arrives at a critical moment. Face recognition systems are deployed in high-stakes applications — law enforcement, airport border control, financial know-your-customer (KYC) checks — where both accuracy and fairness are under intense scrutiny. ISO/IEC 29128-1 provides the technical framework for transparent, reproducible, and bias-aware performance evaluation.

ISO/IEC 29128-1 is designed to complement the NIST FRVT (Face Recognition Vendor Test) programme. While FRVT provides ongoing independent evaluations, this standard defines the methodology that any party can follow to conduct their own conformant evaluation.

Core Performance Metrics

The standard defines a comprehensive set of metrics that go beyond simple accuracy rates:

Metric Definition Typical Deployment Threshold
False Accept Rate (FAR) Proportion of impostor comparisons incorrectly accepted 1:100,000 (high security)
False Reject Rate (FRR) Proportion of genuine comparisons incorrectly rejected 1:100 (border control)
True Positive Identification Rate (TPIR) Proportion of genuine probes correctly identified at rank-1 >99% (gallery size <10^6)
False Positive Identification Rate (FPIR) Proportion of impostor probes incorrectly identified <1% (watchlist screening)
Equal Error Rate (EER) Operating point where FAR = FRR Reported for comparability

Importantly, the standard mandates that all metrics be reported with confidence intervals and be stratified by demographic subgroups — including age, sex, and skin tone — to enable fairness analysis. This demographic stratification is one of the most significant advances in 29128-1 compared to earlier face recognition evaluation practices.

Test Protocol and Dataset Requirements

The standard specifies a test protocol with strict requirements to ensure statistical validity and reproducibility:

Dataset Composition. The evaluation dataset must include at least 10,000 distinct subjects, with a minimum of 3 images per subject. The dataset must be balanced across demographic groups, with no single group representing more than 40% of the total. Cross-quality variation is mandatory — images must span a range of resolutions, lighting conditions, and angles.

Protocol Structure. The standard defines three testing regimes: verification (1:1 matching), identification (1:N search), and open-set identification (where some probes have no matching gallery entry). Each regime has distinct reporting requirements.

Gallery Size Scaling. Performance must be reported at multiple gallery sizes (10^3, 10^4, 10^5, and 10^6) to characterise how accuracy degrades as the search space grows. This is critical for deployment planning.

When evaluating a face recognition system, always report FAR and FRR at multiple thresholds as a DET curve, not just at a single operating point. This provides a complete picture of system performance.
Beware of ‘gallery effects’ — the phenomenon where performance degrades not because of algorithm quality but because of gallery composition (e.g., multiple similar-looking subjects). The standard’s protocol design explicitly accounts for this by mandating diversity metrics for the gallery.

Demographic Fairness and Bias Analysis

A cornerstone of ISO/IEC 29128-1 is its treatment of demographic fairness. The standard requires:

Requirement Specification Purpose
Stratified Reporting FAR/FRR reported for each demographic group Detect performance disparities
Fairness Metrics Maximum group-wise difference in FAR and FRR Quantify bias magnitude
Cross-quality Strata Performance reported by image quality tier Separate bias from quality effects
Adversarial Testing Evaluation on deliberately challenging subgroups Identify failure modes

The standard does not prescribe a specific fairness threshold (e.g., ‘FAR difference must not exceed 5%’) because acceptable thresholds depend on the application. However, it mandates transparent reporting so that deployers can make informed decisions. A system that performs well on the overall population but poorly on a specific subgroup should be flagged during evaluation.

Frequently Asked Questions

How does ISO/IEC 29128-1 relate to ISO/IEC 19795?
ISO/IEC 19795 is the general biometric performance testing and reporting standard applicable to all modalities. ISO/IEC 29128-1 specialises it for face recognition, adding modality-specific requirements for pose variation, illumination, expression, and demographic stratification.
Is compliance with 29128-1 mandatory for deployment?
While the standard itself is voluntary, several regulatory frameworks (including the EU AI Act) are increasingly referencing it as the benchmark for face recognition evaluation. Compliance is becoming a de facto requirement for high-risk applications.
Can I use unlabeled web-scraped datasets for evaluation?
The standard strongly discourages this due to consent and representativeness concerns. It recommends using curated, consent-based datasets with verified demographic labels. Use of web-scraped data may invalidate regulatory compliance.
What is the recommended approach for cross-quality evaluation?
The standard recommends reporting performance stratified by image quality tiers (high, medium, low) based on ISO/IEC 29794-1 quality scores. This allows deployers to understand system robustness to real-world image quality variation.

Leave a Reply

Your email address will not be published. Required fields are marked *