Image and Video Quality Metrics: A Comprehensive Reference

Last updated: January 14, 2025
Status: 🟢 Actively maintained


Introduction

Image and video quality assessment spans multiple methodologies, from full-reference (intrusive) metrics requiring pristine originals to no-reference (blind) approaches that evaluate quality without any reference. This living reference consolidates metrics across traditional media, omnidirectional content, HDR, and specialized applications.


Intrusive (Full-Reference) Metrics

Metrics requiring access to both reference (original) and distorted signals.

Peak Signal-to-Noise Ratio (PSNR)

Description: Most widely used objective metric measuring pixel-wise difference between reference and distorted images.

How it works:

\[\text{PSNR} = 10 \log_{10} \left( \frac{\text{MAX}^2}{\text{MSE}} \right)\]

where MAX is maximum possible pixel value (255 for 8-bit images), MSE is mean squared error.

Limitations: Poor correlation with human perception.

Libraries:

  • Python: skimage.metrics.peak_signal_noise_ratio, cv2.PSNR
  • MATLAB: Built-in psnr() function

Open Source:

Datasets:

References:

Structural Similarity Index (SSIM)

Description: Perceptual metric considering luminance, contrast, and structure similarities.

How it works:

\[\text{SSIM}(x,y) = \frac{(2\mu_x\mu_y + C_1)(2\sigma_{xy} + C_2)}{(\mu_x^2 + \mu_y^2 + C_1)(\sigma_x^2 + \sigma_y^2 + C_2)}\]

where Ο is mean, σ is variance/covariance, C are constants.

Libraries:

  • Python: skimage.metrics.structural_similarity, pytorch-msssim
  • MATLAB: Built-in ssim() function

Open Source:

Datasets:

References:

  • Wang, Z., et al. (2004). “Image quality assessment: from error visibility to structural similarity”
  • IEEE Xplore paper
Multi-Scale SSIM (MS-SSIM)

Description: Extension of SSIM evaluating structure at multiple scales via downsampling.

How it works: Applies SSIM at multiple resolutions, combines results with weights.

Libraries:

  • Python: pytorch-msssim, piq
  • MATLAB: Available via File Exchange

Open Source:

References:

  • Wang, Z., et al. (2003). “Multiscale structural similarity for image quality assessment”
Video Multi-Method Assessment Fusion (VMAF)

Description: Machine learning-based video quality metric developed by Netflix, fusing multiple elementary metrics.

How it works: Combines VIF, DLM, motion, and temporal features via SVR (Support Vector Regression).

Libraries:

Open Source:

Datasets:

References:

  • Li, Z., et al. (2016). “Toward a practical perceptual video quality metric”
  • Netflix Tech Blog
Visual Information Fidelity (VIF)

Description: Information-theoretic metric quantifying shared information between reference and distorted images.

How it works: Models image as natural scene statistics passing through distortion channel, computes mutual information.

Libraries:

Open Source:

References:

  • Sheikh, H.R., & Bovik, A.C. (2006). “Image information and visual quality”
Feature Similarity Index (FSIM)

Description: Low-level feature-based metric using phase congruency and gradient magnitude.

How it works: Extracts phase congruency (PC) and gradient magnitude (GM) as features, computes similarity.

Libraries:

Open Source:

References:

  • Zhang, L., et al. (2011). “FSIM: A feature similarity index for image quality assessment”

Semi-Intrusive Metrics

Metrics using partial reference information (e.g., extracted features).

Reduced-Reference Entropic Differencing (RRED)

Description: Uses wavelet-based entropy features transmitted as side information.

How it works: Extracts entropy of wavelet subbands from reference, compares with distorted version.

Libraries:

References:

  • Soundararajan, R., & Bovik, A.C. (2012). “RRED indices: Reduced reference entropic differencing”
SpEED-QA (Spatial-Spectral Entropy-based Quality)

Description: Reduced-reference metric based on spatial and spectral entropies.

How it works: Transmits entropy statistics of DCT blocks and edges as reference features.

References:

  • Chandler, D.M., & Hemami, S.S. (2007). “A57 database and VSNR metric”

Non-Intrusive (No-Reference) Metrics

Blind quality assessment without access to reference.

Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE)

Description: No-reference metric using natural scene statistics (NSS) features and SVR.

How it works: Extracts locally normalized luminance coefficients, fits to generalized Gaussian distribution, trains SVR on features.

Libraries:

Open Source:

Datasets:

References:

  • Mittal, A., et al. (2012). “No-reference image quality assessment in the spatial domain”
Natural Image Quality Evaluator (NIQE)

Description: Opinion-unaware (no training on human scores) metric based on NSS model.

How it works: Models pristine natural images with multivariate Gaussian (MVG) in NSS feature space, measures distance of test image.

Libraries:

  • Python: piq, scikit-image (limited)
  • MATLAB: NIQE Code

Open Source:

References:

  • Mittal, A., et al. (2013). “Making a ‘completely blind’ image quality analyzer”
Perception-based Image Quality Evaluator (PIQE)

Description: No-reference metric analyzing blockiness, noise, and spatial activity.

How it works: Divides image into blocks, evaluates distortion using perceptual features (noticeably distorted blocks, noise, spatial activity).

Libraries:

  • MATLAB: Built-in piqe() function

References:

  • Venkatanath, N., et al. (2015). “Blind image quality evaluation using perception based features”
Deep Learning-Based: NIMA (Neural Image Assessment)

Description: CNN-based aesthetic and technical quality predictor trained on AVA dataset.

How it works: Fine-tunes pre-trained CNN (e.g., MobileNet, Inception) to predict distribution of human ratings.

Libraries:

  • Python: TensorFlow/Keras, PyTorch

Open Source:

Datasets:

References:

  • Talebi, H., & Milanfar, P. (2018). “NIMA: Neural image assessment”
MUSIQ (Multi-Scale Image Quality Transformer)

Description: Vision Transformer-based no-reference IQA handling arbitrary resolutions and aspect ratios.

How it works: Uses multi-scale image representation fed to Transformer encoder, predicts quality score.

Open Source:

References:

  • Ke, J., et al. (2021). “MUSIQ: Multi-scale image quality transformer”

Overall Image Quality Metrics

General-purpose metrics for diverse distortion types.

Mean Absolute Error (MAE)

Description: Average absolute pixel-wise difference.

How it works:

\[\text{MAE} = \frac{1}{N} \sum_{i=1}^{N} |x_i - y_i|\]

Libraries:

  • Python: numpy.mean(np.abs(x - y))

Limitations: Does not correlate well with perceived quality.

Perceptual Index (PI)

Description: No-reference metric combining Ma’s and NIQE scores to measure perceptual quality.

How it works:

\[\text{PI} = \frac{1}{2} \left( 10 - \text{Ma} + \text{NIQE} \right)\]

References:

  • Blau, Y., & Michaeli, T. (2018). “The perception-distortion tradeoff”

Image Attribute-Specific Metrics

Metrics targeting specific visual attributes.

Sharpness (Laplacian Variance)

Description: Measures image sharpness via Laplacian operator variance.

How it works: Applies Laplacian filter, computes variance (higher = sharper).

Libraries:

  • Python: cv2.Laplacian() + numpy.var()

Open Source:

References:

  • Pech-Pacheco, J.L., et al. (2000). “Diatom autofocusing in brightfield microscopy”
Colorfulness Metric

Description: Quantifies perceived colorfulness based on opponent color space.

How it works: Computes standard deviation and mean of rg and yb channels in opponent space.

Open Source:

References:

  • Hasler, D., & Suesstrunk, S. (2003). “Measuring colorfulness in natural images”
Contrast Metric (Michelson, RMS)

Description: Measures luminance contrast.

How it works:

Michelson: \(C = \frac{L_{\max} - L_{\min}}{L_{\max} + L_{\min}}\)

RMS: Standard deviation of pixel intensities.

Libraries:

  • Python: Custom with numpy

References:

  • Peli, E. (1990). “Contrast in complex images”
Blockiness/Blurring Detection

Description: Detects compression artifacts like blocking and blur.

How it works: Analyzes edge discontinuities (blocking) and high-frequency attenuation (blur).

Open Source:

  • Part of BRISQUE, PIQE implementations

References:

  • Wang, Z., et al. (2002). “A universal image quality index”

Omnidirectional (360°) Image/Video Metrics

Metrics for spherical content.

Spherical PSNR (S-PSNR)

Description: PSNR adapted for equirectangular projection, accounting for latitude-dependent sampling density.

How it works: Weights pixel errors by spherical area (cos of latitude).

Open Source:

References:

  • Yu, M., et al. (2015). “A framework to evaluate omnidirectional video coding schemes”
Weighted-to-Spherically-Uniform PSNR (WS-PSNR)

Description: Improved S-PSNR with uniform sampling on sphere via resampling.

How it works: Projects equirectangular to uniform spherical grid, computes PSNR.

Open Source:

References:

  • Sun, Y., et al. (2017). “Weighted-to-spherically-uniform quality evaluation for omnidirectional video”
Craster Parabolic Projection PSNR (CPP-PSNR)

Description: Uses Craster projection minimizing area distortion for quality measurement.

How it works: Converts to Craster parabolic projection before computing PSNR.

References:

  • Yu, M., et al. (2017). “A framework to evaluate omnidirectional video coding schemes”
Viewport-based Quality Assessment

Description: Evaluates quality based on user’s viewing direction/viewport.

How it works: Samples viewports according to head movement patterns, computes quality per viewport.

Datasets:

References:

  • Xu, M., et al. (2018). “Predicting head movement in panoramic video”

Natural Image Quality

Metrics for photographic/natural scenes.

NIQE (Natural Image Quality Evaluator)

Description: See “Non-Intrusive Metrics” section above.

IL-NIQE (Integrated Local NIQE)

Description: Improved NIQE with local quality assessment and integration.

How it works: Computes local NIQE scores in patches, aggregates for global score.

Open Source:

References:

  • Zhang, L., et al. (2015). “A feature-enriched completely blind image quality evaluator”

HDR Image Quality

Metrics for high dynamic range content.

HDR-VDP-2 (HDR Visual Difference Predictor)

Description: Full-reference metric modeling human visual system for HDR images.

How it works: Applies luminance masking, contrast sensitivity, spatial frequency channels.

Libraries:

Open Source:

Datasets:

References:

  • Mantiuk, R., et al. (2011). “HDR-VDP-2: A calibrated visual metric for visibility and quality predictions”
HDR-VQM (HDR Video Quality Metric)

Description: Extends HDR-VDP to video with temporal modeling.

How it works: Adds temporal contrast sensitivity and motion modeling to HDR-VDP.

References:

  • Narwaria, M., et al. (2015). “HDR-VQM: An objective quality measure for high dynamic range video”
PU21 (Perceptual Uniformity 2021)

Description: Perceptually uniform color space for HDR images.

How it works: Transforms HDR pixel values to perceptually uniform encoding before computing differences.

Open Source:

References:

  • Mikhailiuk, A., et al. (2021). “A perceptually uniform color space for HDR imaging”

Artistic Image Quality

Metrics for stylized/artistic content.

Neural Style Transfer Quality (Gatys Loss)

Description: Measures content and style preservation in neural style transfer.

How it works: Computes content loss (feature difference in deep layers) and style loss (Gram matrix difference).

Open Source:

References:

  • Gatys, L.A., et al. (2016). “Image style transfer using convolutional neural networks”
Aesthetic Quality Assessment (AVA-based models)

Description: Predicts aesthetic appeal using deep learning trained on AVA dataset.

How it works: CNN extracts features correlating with aesthetic ratings (composition, color harmony, etc.).

Open Source:

Datasets:

References:

  • Murray, N., et al. (2012). “AVA: A large-scale database for aesthetic visual analysis”

Video-Specific Metrics

Video Movies

Video Multimethod Assessment Fusion (VMAF)

Description: See “Intrusive Metrics” section above. Widely used for streaming video (Netflix, YouTube).

Spatial-Temporal SSIM (ST-SSIM)

Description: Extends SSIM to temporal dimension for video.

How it works: Computes SSIM across spatial and temporal patches (3D blocks).

References:

  • Wang, Z., et al. (2004). “Video quality assessment based on structural distortion measurement”
Video Quality Metric (VQM)

Description: ITU standard for broadcast video quality.

How it works: Extracts perceptual features (spatial/temporal activity, edge degradation, chroma spread), combines via linear model.

References:

Videoconferencing

ViVQM (Video-over-IP Visual Quality Metric)

Description: No-reference metric for videoconferencing and VoIP video.

How it works: Detects packet loss artifacts, blockiness, blurring specific to real-time video transmission.

References:

  • Reibman, A.R., et al. (2004). “Quality monitoring of video over a packet network”
Real-Time Video Quality Assessment (QoE models)

Description: Models predicting quality-of-experience considering bitrate switching, stalling, resolution changes.

How it works: Combines video quality metrics with buffering/stalling penalties.

Open Source:

References:


Key Datasets

Image Quality Databases

Video Quality Databases

360° Video Databases


References

Standards

  • ITU-T J.144: VQM
  • ITU-T P.1203: QoE for adaptive streaming
  • ITU-R BT.500: Subjective assessment of TV pictures

Key Papers

  • Wang, Z., et al. (2004). “Image quality assessment: from error visibility to structural similarity”
  • Li, Z., et al. (2016). “Toward a practical perceptual video quality metric” (VMAF)
  • Mittal, A., et al. (2012). “No-reference image quality assessment in the spatial domain” (BRISQUE)
  • Mantiuk, R., et al. (2011). “HDR-VDP-2”
  • Talebi, H., & Milanfar, P. (2018). “NIMA: Neural image assessment”

Toolboxes & Libraries


Last updated: January 14, 2025
This is a living document. Suggestions? Email me.




    Enjoy Reading This Article?

    Enjoy Reading This Article?

    Here are some more articles you might like to read next:

  • Why Your Expensive Earbuds Spend More Engineering Effort on Your Zoom Calls Than Your Spotify Playlist
  • MentoringBangRandy: Perjalanan dari Kuli Pabrik ke Denmark
  • Sound Horeg: Antara Euforia Budaya Populer dan Degradasi Fungsi Pendengaran
  • Evaluating AI-Generated Content: The Challenge of Measuring What Machines Create
  • Audio Quality Metrics: A Comprehensive Reference