Eigen-Images of Head-Related Transfer Functions [AES 143]

The individualization of head-related transfer functions (HRTFs) leads to perceptually enhanced virtual environments. Particularly the peak-notch structure in HRTF spectra depending on the listener’s specific head and pinna anthropometry contains crucial auditive cues, e.g. for the perception of sound source elevation. Inspired by the Eigen-faces approach, we have decomposed image representations of individual full spherical HRTF data sets into linear combinations of orthogonal Eigen-images by principle component analysis (PCA). Those eigen-images reveal regions of inter-subject variability across sets of HRTFs depending on direction and frequency. Results show common features as well as spectral variation within the individual HRTFs. Moreover, we can statistically de-noise the measured HRTFs using dimensionality reduction.
PDF Github

Bass Line Transcription for Polyphonic music

This paper presents an algorithm to detect bass lines in polyphonic music pieces. Several analysis methods from various research work in the field of music information retrieval and signal processing were assembled to an overall system. A database including tracks from various genres, tempo and key with bass line annotations was utilized for evaluation purpose. The algorithm achieves recall and precision note tracking scores situated around 60-75% for 50/100 ms note onset ranges and frame level evaluation scores of 80/78% voiced/total pitch detection as well as 85/82% for voiced/total chroma detection

Nearest neighbors similarity search for drum sounds

Common research in instrument recognition often concentrates on classifying instruments. This paper develops a method for finding perceptually similar alternatives to a drum sample within a dataset, but without the need of previous categorization or labeling. The proposed similarity search algorithm analyzes and weights both temporal and spectral features. A listening test with 18 subjects assesses its performance. The results confirms that the search algorithm produces perceptually similar outcomes in most cases 

Binaural Wall Impedence Modeling with Image Source Models and Stochastic Decay

This paper presents the results of a listening test comparing three different approaches to model boundary layers in binaural acoustic simulations of cubic rooms. Simulations included an image source model and a stochastic decay. Results show enhanced preference ratings of emulations that used Butterworth lowpass filter or porous layer modeling over constant linear wall absorption behavior. The porous layer model is the most realistic and reaches the highest preference score.

Machine Learning Approaches for the Individualization of Head Related Transfer Functions with Anthropometric Measures

This paper presents machine learning approaches, which aim at finding out whether anthropometric measurements can be used to predict features of corresponding Head Related Transfer Functions (HRTF). Obtaining anthropometric data takes much less effort than time consuming and expensive processes of measuring full HRTFs. High definition scans of head and torso as well as high resolution HRTFs of 40 subjects were condensed into a dataset of 15 anthropometric measures and four HRTF features. Machine learning algorithms were used and optimized in order to provide a satisfactory prediction of those HRTF features through the anthropometric measures. 

Review of Binaural Technology

Binaural techniques simulate the hearing cues created by acoustic interaction between our bodies and the environment around us. Audio signals are filtered to introduce these cues and give the impression that a sound source is located outside of the head at a given location in space. Our hearing system appears to be sensitive to inaccurate cues, it is common for binaural filters to create an unconvincing spatial impression as well as poor sound quality. Every person has an individual pattern of hearing cues that are created by their unique body shape, also these cues change as a listener moves. Natural binaural reverberation is also important for convincing effect. Achieving high quality binaural sound currently requires careful measurement and specialist equipment. 

X&M – A digital musical instrument using three-dimensional gesture recognition

X&M is a hand-crafted digital musical instrument, that allows touch sensitive and three-dimensional gestural sound control. The instrument tracks hand movements on and in front of a glass plate – installed to physically constrain the user‘s motion range – and turns this movement into MIDI data to control synthesis parameters. Touching the surface triggers sound output. A Leap Motion infrared tracking camera and two piezo-surface-microphones serve as input devices. The gestural interpretation as well as connection from gesture-to-sound mapping is programmed in MAX/MSP and Ableton Live.