ISO/IEC 29170-2 — Advanced Image Coding and Evaluation — Part 2: Evaluation Method

Perceptual Quality Assessment for Next-Generation Image Compression

Overview of ISO/IEC 29170-2

ISO/IEC 29170-2 is the second part of a multi-part standard under ISO/IEC JTC 1/SC 29 that defines evaluation methods for advanced image coding technologies. While Part 1 establishes the coding framework, Part 2 focuses specifically on how to measure and compare the quality of coded images. It provides a rigorous methodology combining subjective visual assessment with objective metric computation, enabling fair benchmarking between conventional codecs such as JPEG and emerging neural-network-based compression schemes.

The standard introduces a double-stimulus continuous quality scale (DSCQS) methodology that reduces viewer bias and produces statistically reliable mean opinion scores (MOS) for coded image quality.

Modern image coding systems increasingly rely on learned compression models that optimize for perceptual metrics rather than traditional PSNR. ISO/IEC 29170-2 acknowledges this paradigm shift by specifying evaluation protocols that capture human visual system (HVS) characteristics, including contrast sensitivity, luminance masking, and texture masking effects. These protocols ensure that emerging neural codecs are measured by their perceptual output quality rather than purely mathematical fidelity, which aligns with how end users actually experience compressed images in consumer and professional applications.

Using the standardized evaluation framework of ISO/IEC 29170-2 ensures that codec performance claims are reproducible and comparable across different research groups and product vendors, fostering healthy competition in the image coding ecosystem.

Subjective and Objective Evaluation Procedures

The subjective evaluation procedure described in ISO/IEC 29170-2 involves carefully controlled viewing conditions: calibrated displays with D65 white point ambient lighting at 15 lux, a viewing distance of four times the picture height, and a standardized training session before scoring. Test material must include at least eight scenes spanning low to high spatial complexity, with each scene processed at multiple bit rates. The resulting MOS values are analyzed using confidence intervals and outlier detection to ensure statistical validity.

Evaluation Method Type Key Metric Best Use Case
DSCQS Subjective Mean Opinion Score (MOS) Codec comparison and standardization
SSIM Objective Structural Similarity Index Real-time monitoring
PSNR-HVS Objective HVS-weighted PSNR Fine-tuning encoder parameters
VMAF Objective Video Multi-Method Assessment Fusion Streaming quality optimization
LPIPS Objective Learned Perceptual Image Patch Similarity Neural codec evaluation
Objective metrics alone are insufficient for codec standardization. ISO/IEC 29170-2 mandates that any codec claiming superiority must pass a subjective validation test with statistical significance at the 95% confidence level.

For objective evaluation, the standard recommends a battery of complementary metrics. The structural similarity index (SSIM) captures luminance and contrast distortions, while newer metrics like LPIPS leverage deep neural network features to approximate human perceptual judgments. Engineers should compute all recommended metrics and report the full set to provide transparency.

Engineering Implementation and Best Practices

Implementing the evaluation framework requires an automated testing pipeline that ingests reference images, applies the codec under test at specified bit rates, computes objective metrics in batch, and coordinates subjective test sessions with trained human viewers. The pipeline should store all intermediate coded images and logs for auditability.

A common pitfall is using non-calibrated displays for subjective testing. Even minor deviations in display gamma, peak luminance, or color temperature can invalidate MOS results. Always use a spectroradiometer for display calibration before subjective sessions.

For engineering teams developing new codecs, the standard recommends a tiered approach: first, rapid screening using objective metrics (SSIM, VMAF) to prune unpromising designs; second, targeted subjective testing on the top candidates. This approach reduces the cost and time of subjective evaluations while maintaining statistical rigor. The standard also provides guidance on selecting test imagery that matches the target application domain — medical imaging requires different test content than consumer photography.

Frequently Asked Questions

Q: Is ISO/IEC 29170-2 applicable to video coding evaluation?

A: While primarily designed for still-image coding, the DSCQS subjective methodology can be adapted for short video clips by extending the presentation time. For full video evaluation, refer to ITU-R BT.500 and ITU-T P.910.

Q: How many human subjects are needed for a valid subjective test?

A: The standard recommends a minimum of 15 subjects after screening for visual acuity and color vision. At least 25 subjects are preferred for high-confidence results in standardization contexts.

Q: Can objective metrics replace subjective testing entirely?

A: No. While objective metrics provide useful engineering guidance, they cannot fully capture the complexity of human visual perception. Subjective testing remains the gold standard for codec evaluation and is required for ISO/IEC standardization.

Leave a Reply

Your email address will not be published. Required fields are marked *