HOA-SSR Dataset
Higher-Order Ambisonics Sound Scene Repository - A comprehensive 360° audiovisual quality dataset for immersive media research
The Higher-Order Ambisonics Sound Scene Repository (HOA-SSR) is a groundbreaking open-source dataset designed to advance research in perceptual quality evaluation of immersive 360° audiovisual content. This comprehensive dataset combines state-of-the-art spatial audio recording with ultra-high-definition 360° video, accompanied by subjective quality scores from trained assessors.
Research Impact
To our knowledge, this is the first recorded audiovisual dataset with Mean Opinion Scores (MOS) specifically created to support perceptual quality research in immersive audiovisual content. The dataset opens new possibilities for developing and validating quality metrics for next-generation virtual reality, augmented reality, and immersive media applications.
📊 Dataset Highlights
- 150+ audiovisual scenes captured in diverse real-world environments
- 8K 360° video (7680×3840) at 30fps with YUV 4:2:2 color space
- 4th order ambisonic audio (25 channels) at 48kHz, 24-bit
- Subjective quality scores from trained assessors
- Multiple quality metrics for both audio and video domains
Technical Specifications
Recording Equipment
The dataset was captured using professional-grade equipment to ensure the highest quality baseline:
🎥 Video Capture
Insta360 Pro2 - Professional spherical 360° camera with 6 synchronized lenses capturing every angle simultaneously
- Resolution: 8K (7680×3840)
- Frame rate: 30 fps
- Color depth: 8-bit YUV 4:2:2
- Format: Equirectangular projection (ERP)
🎙️ Audio Capture
em32 Eigenmike - Spherical microphone array with 32 omnidirectional microphones
- Order: 4th order ambisonics
- Channels: 25 (AmbiX B-format)
- Sample rate: 48 kHz
- Bit depth: 24-bit PCM
- Normalization: SN3D, ACN ordering
Scene Diversity
The dataset contains audiovisual scenes with diverse characteristics including nature-mechanical, indoor-outdoor, static-dynamic, traffic-quiet, impulsive-steady, and speech-music variations. This diversity ensures broad applicability across different use cases and research questions.
Subjective Quality Evaluation
Three comprehensive subjective experiments were conducted to assess perceptual quality:
Experimental Methodology
Evaluated spatial audio fidelity and clarity using a 26-channel loudspeaker setup compliant with EBU 3276 and ITU-R BS.1116-3 standards
Assessed visual quality through head-mounted display (Samsung Odyssey+ Mixed Reality Headset) for immersive viewing
Combined evaluation of multimodal perceptual quality using synchronized audio-visual presentation
Protocol: Multiple Stimulus with Hidden Reference (MUSHRA-style) methodology
Participants: 20 trained assessors
Location: SenseLab Listening Test and VR facilities at FORCE Technology, Denmark
Ethics Approval: Danish Committee System on Health Research Ethics (Journal-nr H-20031815)
Quality Metrics & Analysis
Objective Quality Metrics Evaluated
Audio Metrics:
- PEAQ (Perceptual Evaluation of Audio Quality)
- ViSQOL (Virtual Speech Quality Objective Listener)
- AMBIQUAL (Ambisonic quality metric)
Video Metrics:
- PSNR and variants (WS-PSNR, CPP-PSNR, S-PSNR)
- SSIM and MS-SSIM
- VMAF (2K and 4K variants)
Encoding Parameters
Audio was encoded at 16, 32, and 64 kbps per channel using AAC-LC encoder. Video was encoded using H.265/HEVC at three resolutions (1920×1080, 3840×1920, 6144×3072) and four quantization parameters (QP: 0, 22, 28, 34).
Machine Learning Predictions
Building on the subjective data, we developed predictive models for audiovisual quality assessment:
Modeling Approach
Four regression-based machine learning models were trained and tested: multiple linear regression, decision tree, random forest, and support vector machine. Each model was constructed using combinations of audio and video quality metrics, producing 312 predictive models through cross-validation.
Key Findings:
- The combination of VMAF and AMBIQUAL metrics proved most effective for audiovisual quality prediction
- Support Vector Machine achieved the highest performance with k-Fold cross-validation (PCC = 0.909, SROCC = 0.914, RMSE = 0.416)
- Machine learning approaches significantly outperformed simple linear models
# Example model architecture
from sklearn.svm import SVR
from sklearn.model_selection import KFold
# Best performing configuration
model = SVR(kernel='rbf', C=1.0, epsilon=0.1)
features = ['VMAF', 'AMBIQUAL'] # Audio-video metric fusion
cv = KFold(n_splits=5, shuffle=True)
# Achieved metrics:
# - Pearson Correlation: 0.909
# - Spearman Rank: 0.914
# - RMSE: 0.416
Research Applications
The HOA-SSR dataset enables research across multiple domains:
🎧 Audio Product Development
- Hearing aids and assistive devices testing
- True wireless stereo (TWS) earbuds evaluation
- Telecom headset quality assessment
- Spatial audio algorithm development
🤖 AI & Machine Learning
- Training perceptual quality models
- Audio-visual fusion algorithms
- Scene understanding and classification
- Quality metric development and validation
🎮 Virtual Reality
- Immersive experience quality evaluation
- Compression artifact assessment
- Codec performance benchmarking
- User experience optimization
📊 Quality of Experience Research
- Multimodal perception studies
- Cross-modal interaction analysis
- Standardization and benchmarking
- Quality metric correlation studies
Experimental Design Optimization
In follow-up research, we investigated efficient experimental design strategies:
Full Factorial Design (FFD) vs. Optimal Experimental Design (OED):
- D-optimal design for factor screening
- I-optimal design for prediction accuracy
- Significant reduction in required test conditions while maintaining statistical power
- Applications in large-scale perceptual studies where FFD becomes impractical
This work demonstrates how smart experimental design can reduce participant burden and testing time without compromising research validity.
Collaboration & Partners
The HOA-SSR dataset is the result of a collaborative effort between leading research institutions and industry partners:
Research Partners:
- Technical University of Denmark (DTU)
- FORCE Technology - SenseLab
- Nantes Université (France)
Industry Partners:
- Bang & Olufsen
- Demant
- GN Store Nord (Jabra)
- Sonova
- WSA
- Additional industrial collaborators
Funding:
- European Union Horizon 2020 Marie Skłodowska-Curie Actions (Grant No. 765911 - RealVision)
- Danish Ministry of Higher Education and Science
Access & Citation
Dataset Availability
The complete HOA-SSR dataset containing 150 audiovisual scenes is available for research and commercial use. The dataset can be purchased for full access, with partial access available for specific scenarios.
Contact: FORCE Technology SenseLab
Publications
If you use this dataset in your research, please cite the relevant publications:
Primary Dataset Paper:
@article{fela2022perceptual,
title={Perceptual Evaluation on Audio-visual Dataset of 360 Content},
author={Fela, Randy Frans and Pastor, Andr{\'e}as and Le Callet, Patrick and Zacharov, Nick and Vigier, Toinon and Forchhammer, S{\o}ren},
journal={arXiv preprint arXiv:2205.08007},
year={2022}
}
Machine Learning Predictions:
@article{fela2021perceptual,
title={Perceptual Evaluation of 360 Audiovisual Quality and Machine Learning Predictions},
author={Fela, Randy Frans and Zacharov, Nick and Forchhammer, S{\o}ren},
journal={arXiv preprint arXiv:2112.12273},
year={2021}
}
Experimental Design Optimization:
@article{fela2023comparison,
title={Comparison of Full Factorial and Optimal Experimental Design for Perceptual Evaluation of Audiovisual Quality},
author={Fela, Randy Frans and Zacharov, Nick and Forchhammer, S{\o}ren},
journal={Journal of the Audio Engineering Society},
volume={71},
number={1/2},
pages={4--19},
year={2023}
}
Assessor Selection Methodology:
@article{fela2022assessor,
title={Assessor Selection Process for Perceptual Quality Evaluation of 360 Audiovisual Content},
author={Fela, Randy Frans and Zacharov, Nick and Forchhammer, S{\o}ren},
journal={Journal of the Audio Engineering Society},
volume={70},
number={10},
pages={824--842},
year={2022}
}
Technical Documentation
For researchers implementing models or reproducing results:
- Audio Decoding: 4th order ambisonics decoded to 26-channel configuration
- Video Rendering: Equirectangular to viewport projection with proper field of view
- Synchronization: Audio-visual temporal alignment critical for multimodal evaluation
- Metric Computation: Frame-level metrics aggregated using temporal pooling
Detailed specifications and processing pipelines are available in the accompanying technical papers.
Status: Dataset Available
Version: 2.0
Last Updated: 2022
Related Papers:
- arXiv:2205.08007 - Primary Dataset Paper
- JAES 2023 - Experimental Design Comparison
- JAES 2022 - Assessor Selection Process Dataset URL: FORCE Technology
The HOA-SSR dataset represents a significant contribution to the field of immersive media quality assessment, enabling researchers worldwide to develop and validate next-generation quality metrics and perceptual models for 360° audiovisual content.