Perceptual Audio Evaluation - Theory, Method and Application

Perceptual Audio Evaluation - Theory, Method and Application
Author: Søren Bech
Publisher: Wiley
Total Pages: 0
Release: 2006-06-16
Genre: Technology & Engineering
ISBN: 9780470869239

Download Perceptual Audio Evaluation - Theory, Method and Application Book in PDF, Epub and Kindle

As audio and telecommunication technologies develop, there is an increasing need to evaluate the technical and perceptual performance of these innovations. A growing number of new technologies (e.g. low bit-rate coding) are based on specific properties of the auditory system, which are often highly non-linear. This means that the auditory quality of such systems cannot be measured by traditional physical measures (such as distortion, frequency response etc.), but only by perceptual evaluations in the form of listening tests. Perceptual Audio Evaluation provides a comprehensive guide to the many variables that need to be considered before, during and after experiments. Including the selection of the content of the programme material to be reproduced, technical aspects of the production of the programme material, the experimental set-up including calibration, and the statistical planning of the experiment and subsequent analysis of the data. Perceptual Audio Evaluation: Provides a complete and accessible guide to the motives, theory and practical application of perceptual evaluation of reproduced sound. Discusses all the variables of perceptual evaluation, their control and their possible influence on the results. Covers in detail all international standards on the topic. Is illustrated throughout with tables, figures and worked solutions. Perceptual Audio Evaluation will appeal to audio and speech engineers as well as researchers in audio and speech laboratories. Postgraduate students in engineering or acoustics and undergraduate students studying psychoacoustics, speech audio processing and signal processing will also find this an essential reference.

Parametric Time-Frequency Domain Spatial Audio

Parametric Time-Frequency Domain Spatial Audio
Author: Ville Pulkki
Publisher: John Wiley & Sons
Total Pages: 498
Release: 2017-10-11
Genre: Technology & Engineering
ISBN: 111925261X

Download Parametric Time-Frequency Domain Spatial Audio Book in PDF, Epub and Kindle

A comprehensive guide that addresses the theory and practice of spatial audio This book provides readers with the principles and best practices in spatial audio signal processing. It describes how sound fields and their perceptual attributes are captured and analyzed within the time-frequency domain, how essential representation parameters are coded, and how such signals are efficiently reproduced for practical applications. The book is split into four parts starting with an overview of the fundamentals. It then goes on to explain the reproduction of spatial sound before offering an examination of signal-dependent spatial filtering. The book finishes with coverage of both current and future applications and the direction that spatial audio research is heading in. Parametric Time-frequency Domain Spatial Audio focuses on applications in entertainment audio, including music, home cinema, and gaming—covering the capturing and reproduction of spatial sound as well as its generation, transduction, representation, transmission, and perception. This book will teach readers the tools needed for such processing, and provides an overview to existing research. It also shows recent up-to-date projects and commercial applications built on top of the systems. Provides an in-depth presentation of the principles, past developments, state-of-the-art methods, and future research directions of spatial audio technologies Includes contributions from leading researchers in the field Offers MATLAB codes with selected chapters An advanced book aimed at readers who are capable of digesting mathematical expressions about digital signal processing and sound field analysis, Parametric Time-frequency Domain Spatial Audio is best suited for researchers in academia and in the audio industry.

Spatial Sound Design and Perceptual Evaluation of Moving Sound Sources in Amplitude Panning

Spatial Sound Design and Perceptual Evaluation of Moving Sound Sources in Amplitude Panning
Author: David Caro Moreno
Publisher:
Total Pages:
Release: 2015
Genre:
ISBN:

Download Spatial Sound Design and Perceptual Evaluation of Moving Sound Sources in Amplitude Panning Book in PDF, Epub and Kindle

Music is organized sound. Usually, we think on organizing sound in frequency and time, but sound may be also organized in space. Today, musicians and scientists have a new artistic and scientific challenge to face, the creation of spatial music. Spatial music is music which involves the organization of sound in the three-dimensional space. So far, the availability of sound reproduction systems with height is very limited. Most popularly, we find stereo and surround systems, which use panning techniques to reproduce sound. From stereo to surround, a very important step further is the use of a scene-based paradigm, which opens the possibilities of music composition beyond the classical stereo image. In this research, a spatial sound system is proposed to create complex virtual sound scenes consisting of several moving sound sources. On the other side, we study the perceptual characteristics of moving sound sources in Vector Based Amplitude Panning, which is one of the most widespread spatial audio technologies for loudspeaker-based setups. Finally, we perform listening tests with a group of subjects to investigate the perceived attributes of moving sound sources reproduced using amplitude panning.

Perceptual and Quantitative Comparison of Headphone-Based Spatial Audio Techniques With Applications in Spatial Enhancement of Music and Binaural Panning

Perceptual and Quantitative Comparison of Headphone-Based Spatial Audio Techniques With Applications in Spatial Enhancement of Music and Binaural Panning
Author: Zane Rusk
Publisher:
Total Pages: 0
Release: 2024
Genre:
ISBN:

Download Perceptual and Quantitative Comparison of Headphone-Based Spatial Audio Techniques With Applications in Spatial Enhancement of Music and Binaural Panning Book in PDF, Epub and Kindle

Headphone-based spatial audio finds many consumer, industrial, and academic applications -- anywhere a realistic sound field needs to be presented to a human listener over headphones. However, the nontriviality of the challenges associated with the faithful reproduction of spatial auditory cues leaves many research questions yet unanswered. Broadly speaking, the goal of the current work was to make contributions to the understanding, engineering, and application of headphone-based spatial audio techniques. First, work was focused on a specific application area of binaural audio. Three methods for the spatial enhancement of stereophonic music over headphones were proposed and evaluated in a listening study. Two of these essentially replicated the spatial sound field arising from two loudspeakers and a subwoofer playing in a living room, while a third added no reverberation. Multiple music excerpts and options for spatial resolution were tested for each of the three methods, and four perceptual attributes were rated (Naturalness, Envelopment, Bass-Treble Balance, and Preference). Results indicated that the addition of spatial reverberation was generally not found to be more natural nor preferred to the no-reverberation option. Perceptions of envelopment were increased by the addition of spatial reverberation regardless of spatial resolution. Lastly, the choice of musical excerpt modulated the Naturalness and Envelopment ratings for each enhancement method, but only impacted the Preference and Bass-Treble Balance ratings of the reverberant enhancements. Fundamental aspects of binaural techniques were also investigated in this work. The second study compared several methods of rendering individual sound sources, called binaural panning techniques. These methods were used to pan free-field sound sources, which is analogous to the "reconstruction" of a head-related impulse response (HRIR). The studied techniques included higher-order Ambisonics (HOA) panning (Zotter and Frank 2019), vector-base amplitude panning (VBAP; Pulkki 1997), least-squares (Rafaely and Avni 2010) and Magnitude Least-Squares (MagLS; Schörkhuber et al. 2018) methods of binaural rendering, and the recently-proposed principal component-base amplitude panning (PCBAP; Neal and Zahorik 2022). Overall perceptual similarity between the original HRIRs and reconstructions was evaluated in a listening test. It was confirmed that time-aligning the HRIRs prior to panning is effective in reducing the number of filters required to achieve high fidelity. PCBAP, which is based on principal components analysis of the HRIR set, was also found to be particularly efficient at retaining fidelity. Models for the prediction of perceptual fidelity of binaural panning techniques using quantitative metrics were also generated, complementing the listening test data from the second study. Eight metrics were calculated for each HRIR reconstruction with respect to the original HRIRs. These metrics included the difference in interaural time delay (ITD), interaural level difference (ILD), and modeled binaural loudness. Spectral error metrics were also used, including Perceptual Spectral Difference (PSD) for each ear (Armstrong et al. 2018), normalized mean square error (NMSE) for each ear, and difference in the energetic HRTF magnitude sum across ears. The error metric values for every HRIR reconstruction were used to develop logistic regression models. These models, when assessing each of the two tested source directions individually, were shown to be quite accurate in identifying whether or not an HRIR reconstruction received a high rating for perceptual similarity to the original in the second listening study.

Look and Listen

Look and Listen
Author: Ruohan Gao (Ph. D.)
Publisher:
Total Pages: 442
Release: 2021
Genre:
ISBN:

Download Look and Listen Book in PDF, Epub and Kindle

Understanding scenes and events is inherently a multi-modal experience. We perceive the world by both looking and listening (and touching, smelling, and tasting). In particular, the sounds made by objects, whether actively generated or incidentally emitted, offer valuable signals about their physical properties and spatial locations—the cymbals crash on stage, the bird tweets up in the tree, the truck revs down the block, the silverware clinks in the drawer. However, while recognition has made significant progress by "looking"—detecting objects, actions, or people based on their appearance—it often does not listen. In this thesis, I show that audio that accompanies visual scenes and events can be used as a rich source of training signal for learning (audio-)visual models. Particularly, I have developed computational models that leverage both the semantic and spatial signals in audio to understand people, places, and things from continuous multi-modal observations. Below, I summarize my key contributions along these two themes: Audio as a semantic signal: First, I develop methods that learn how different objects sound by both looking at and listening to unlabeled video containing multiple sounding objects. I propose an unsupervised approach to separate mixed audio into its component sound sources by disentangling the audio frequency bases for detected visual objects. Next, I further propose a new approach that trains audio-visual source separation models on pairs of training videos. This co-separation framework permits both end-to-end training and learning object-level sounds from unlabeled videos of multiple sound sources. As an extension of the co-separation approach, then I study the classic cocktail party problem to separate voices from the speech mixture by leveraging the consistency between the speaker's facial appearance and their voice. The two modalities, vision and audition, are mutually beneficial. While visual objects are indicative of the sounds they make to enhance audio source separation, audio can also be informative of the visual events in videos. Finally, I propose a framework that uses audio as a semantic signal to help visual events classification. I design a preview mechanism to eliminate both short-term and long-term visual redundancies using audio for efficient action recognition in untrimmed video. Audio as a spatial signal: Both audio and visual data also convey significant spatial information. The two senses naturally work in concert to interpret spatial signals. Particularly, the human auditory system uses two ears to extract individual sound sources from a complex mixture. Leveraging the spatial signal in videos, I devise an approach to lift a flat monaural audio signal to binaural audio by injecting the spatial cues embedded in the accompanying visual frames. When listening to the predicted binaural audio—the 2.5D visual sound—listeners can then feel the locations of the sound sources as they are displayed in the video. Beyond learning from passively captured video, I next explore the spatial signal in audio by deploying an agent to actively interact with the environment using audio. I propose a novel representation learning framework that learns useful visual features via echolocation by capturing echo responses in photo-realistic 3D indoor scene environments. Experimental results demonstrate that the image features learned from echoes are comparable or even outperform heavily supervised pre-training methods for multiple fundamental spatial tasks—monocular depth prediction, surface normal estimation, and visual navigation. Our results serve as an exciting prompt for future work leveraging both the visual and audio modalities. Motivated by how we humans perceive and act in the world by making use of all our senses, the long-term goal of my research is to build systems that can perceive as well as we do by combining all the multisensory inputs. In the last chapter of my thesis, I outline the potential future research directions that I want to pursue beyond my Ph.D. dissertation

Sweet [re]production

Sweet [re]production
Author: Nils Peters
Publisher:
Total Pages:
Release: 2010
Genre:
ISBN:

Download Sweet [re]production Book in PDF, Epub and Kindle

Perceptual and Modeling Studies on Spatial Sound

Perceptual and Modeling Studies on Spatial Sound
Author: Toni Hirvonen
Publisher:
Total Pages: 74
Release: 2007
Genre:
ISBN: 9789512290512

Download Perceptual and Modeling Studies on Spatial Sound Book in PDF, Epub and Kindle

Tiivistelmä: Tutkimuksia tilaäänen havaitsemisesta ja mallinnuksesta.