IEC TR 63038: Video Analytics Performance Testing for Digital Video Monitoring

1. Introduction to IEC TR 63038

IEC TR 63038 provides a standardized framework for evaluating the performance of video analytics systems in digital video monitoring applications. As video surveillance deployments grow exponentially — from smart city traffic management to retail footfall analytics — the need for objective, repeatable performance metrics becomes paramount. This technical report defines test scenarios, ground-truth annotation methodologies, and statistical reporting conventions for object detection, classification, tracking, and event recognition.

The standard introduces the concept of “operational equivalence” — a detection at 90% confidence in clear daylight is not equivalent to the same confidence level in low-light fog conditions. Performance must be reported per environmental condition.

TR 63038 covers four core analytic tasks: (1) object detection (bounding box output), (2) object classification (label assignment), (3) multi-object tracking (ID preservation across frames), and (4) event detection (loitering, line-crossing, abandoned object). Each task has dedicated metrics, test datasets, and minimum reporting requirements.

2. Test Methodology and Key Metrics

2.1 Performance Metrics

The standard mandates reporting of the following metrics for every analytics evaluation:

Metric	Definition	Reporting Requirement
Precision	TP / (TP + FP)	Per object class, per condition
Recall	TP / (TP + FN)	Per object class, per condition
F₁ Score	2 · (Precision · Recall) / (Precision + Recall)	Harmonic mean, overall and per class
MOTA	Multiple Object Tracking Accuracy	For tracking scenarios only
Processing Latency	Frame-in to result-out delay	P₅₀, P₉₅ in milliseconds
Throughput	Frames processed per second	At native resolution

Engineering insight: MOTA is considerably more sensitive to ID switches than to missed detections. In crowded scenes (e.g., 50+ people in a metro platform), a tracker with high recall but frequent ID reassignments will score poorly on MOTA. For real-world deployment, weigh MOTA against end-user tolerance for ID flicker.

2.2 Test Dataset Requirements

TR 63038 specifies that test datasets must include at least 10,000 annotated frames per task, with a minimum of 500 frames per environmental condition (daylight, low-light, rain, fog, night-infrared). The annotation format is based on a modified COCO JSON schema, extended with temporal fields (track_id, occlusion_flag, confidence). Ground-truth accuracy must be ≥ 99% at the pixel level for bounding boxes and ≥ 99.5% for classification labels.

A common pitfall: using training data as test data. The standard explicitly requires that test datasets be independent of training datasets, with no overlap in scene geometry, camera vantage points, or subject identities. Contaminated evaluation is the leading cause of overoptimistic performance claims in vendor datasheets.

3. Deployment Considerations and Future Trends

Video analytics performance under the TR 63038 framework is highly dependent on edge-device compute capability. A typical deep learning accelerator (e.g., NVIDIA Jetson Orin, Hailo-8, Intel Movidius) can achieve 30-60 FPS on lightweight object detection networks (YOLOv8n, MobileNet-SSD) at 1080p resolution. The standard recommends reporting performance at the target deployment resolution rather than at the training resolution, as downscaling artifacts significantly affect small-object recall.

Field experience reveals that camera auto-exposure adaptation time (1-3 seconds after a scene change) causes a burst of false positives as the analytics algorithm adjusts to the new exposure. For security-critical applications, implement a “settling delay” that suppresses analytics output for the first 2 seconds after an exposure change, as recommended by TR 63038 Annex C.

Looking forward, the IEC is considering a second edition that incorporates neural network robustness testing (adversarial patch attacks) and privacy-preserving analytics evaluation (on-device inference vs. cloud-based). The foundational metrics framework defined in TR 63038 will remain central to these future extensions.

4. Frequently Asked Questions

Q: Is IEC TR 63038 applicable to thermal imaging cameras?
A: Yes. The metrics and methodology apply to any imaging modality (visible, thermal, multi-spectral). The test datasets must be captured with the target sensor type.

Q: How does the standard define false positive vs. false negative in crowded scenes?
A: An IoU (Intersection over Union) threshold of 0.5 is used to match detections to ground truth. A detection with IoU < 0.5 to any ground-truth box is a false positive; a ground-truth box with no matching detection is a false negative. In crowded scenes with heavy occlusion, the standard permits a relaxed threshold of 0.3.

Q: What is the recommended annotation tool for generating TR 63038 compliant datasets?
A: The standard does not mandate a specific tool, but CVAT (Computer Vision Annotation Tool) and Labelbox are commonly used. Both support the extended COCO JSON format required for temporal annotation fields.

Q: Can TR 63038 metrics be used for facial recognition evaluation?
A: Not directly. Facial recognition falls under separate standards (ISO/IEC 19795 series). TR 63038 focuses on object-level analytics (people, vehicles) and does not address identification or verification accuracy.