----- Experience in your own room the magical nature of stereo sound -----

 

 

What's new

 

LX - Store

 

BLACKLIST

 

Conversations
with Fitz

 

OPLUG
Forum

 

Basics

The Magic in 2-Channel Sound

Issues in speaker
design

Dipole models

Active filters

Amplifiers etc

Microphone

FAQ's

Loudspeakers

Crossovers

Room acoustics

Stereo Recording and Rendering

Audio production

Conclusions

 

Projects

Your own desig

LXmini

LXmini+2

LXstudio

LX521.4

LX521
reference

ORION
challenge

ORION-3.4

PLUTO-2.1

WATSON-SEL

PLUTO+
subwoofer

THOR
subwoofer

PHOENIX
dipole speaker

Three-Box active
system (1978)

Reference
earphones

Surround
sound

Loudspeaker
& Room

 

Resources

Publications

Sound recordings

Links

Other designs

My current setup

About me

Site map

 

HOME

 

------------------
Digital Photo
Processes

 

------------------
The
Sea Ranch

 

------------------
My Daughter
the Jeweler

 

What's new

 

LX - Store

 

Conversations
with Fitz

 

OPLUG
Forum

 

 

 

Resource material

In those two recordings I used the microphone setup of FIG. 4. The microphones are close to the pinna but not inside it. I have used this method of recording to generate source material that still has a strong resemblance to natural hearing and has not been equalized or mixed down from multiple tracks. I needed sounds, which I had heard live, for loudspeaker evaluation during their development. Today, with ORION or PLUTO, I feel well equipped to do the reverse, to evaluate recordings. Many times I have wished that I could ask the recording engineer for a particular CD what he had in mind, when he chose the microphone setup and mix. This should be CD liner information. Next to the musicians, conductor and venue, it is the recording engineer who determines the sound of the recording. "Tonmeister" is their apt description.

I have tried placing the microphones near the ear canal entrance but found that this colors the image on loudspeaker playback. Here the sound has passed the pinna a second time before reaching the eardrum. Artificial head, FIG. 2, recordings need to be equalized for headphone or loudspeaker playback for the same reason. The sphere microphone, FIG. 3, retains two essential features of our hearing apparatus: the spacing of the ears and the sound blockage of the head for ILD and ITD effects [8, 9]. It gives up on the acoustic details of head, shoulders and ears. My former colleague from HP, Lyman Miller, has been using a sphere microphone in various forms and for decades while recording for Stanford University, initially with modified Panasonic back-electret capsules, then Schoeps omnis and now with Sennheiser omnis, Sphere microphones used to be in the Schoeps and Neuman catalogues. 

FIG. 14  Schoeps KFM6 free-field frequency responses of left and right transducers for 200 and 600 sound incidence from the left, relative to the stereo main axis. The main axis free-field response (not shown) is slightly boosted towards higher frequencies to obtain a flat diffuse-field response [9].

FIG. 15  KFM6 frequency response of the level difference for three angles of sound incidence relative to the stereo main axis, for which there is zero dB difference. [9]

The three microphone setups cannot deliver the eardrum signals of the binaural setup in FIG. 8 whether played back over headphones or loudspeakers. In either case they only provide auditory cues with strong resemblance to what a person's ears would have picked up in the live situation and being in the same location as the sphere. Precise recording of the sound field that existed in that location would have required four microphones, which sense the sound pressure W and the sound particle velocity in X, Y and Z directions. Such ambisonic microphone has the functionality of three orthogonal, bidirectional microphones and one omni microphone, which are all coincident. The four microphone signals must be matrixed to derive loudspeaker drive signals. Four or more loudspeakers are needed to reproduce the original sound field at one point in anechoic space and inside a useable size sphere around the point [10]. If the loudspeakers are in a reverberant room then the streams of room reflections are mixed with the streams of direct sound. The listener's brain must find cues to move the room's response beyond an acoustic horizon of attention in order to experience the original sound field.

 

============================================================================================

Binaural sound samples from dummy head with blocked ear canal:
Track 01-LL-Chimes Binaural.flac
Track 02-LL-Percussions Binaural.flac

especially if the playback HRTF does not match the recording HRTF.  

This form of spatial distortion of the AS, in-head localization, is common to headphone listening. Binaural recordings are usually made with the microphone blocking the ear canal entrance so that its resonance is eliminated. Even playback over headphones that have not been specially equalized, results in a very high degree of realism of the AS, though distances are strongly foreshortened, especially towards the front.  But someone can whisper scarily enough into your ears.   

 

        0 - Introduction
        1 - Hearing Processes
            1.1  Binaural
            1.2  Multiple sources, reflections, reverberation, space
            1.3  Real sources and phantom sources
                1.3.1 The sound of a single loudspeaker in a room
                1.3.2 The sound of two loudspeakers in a room
                1.3.3 The sound of two pairs of loudspeakers in a room
        2 - Recording Processes
        3 - Loudspeaker and Room
        4 - Sound Test Files
        5 - References

        Page 1 - Page 2 - Page 3 - Page 4

 

0 - INTRODUCTION

Recording music in one venue, like a concert hall, FIG. 1, jazz club or studio and then playing it back over two loudspeakers in another venue, like a living room, FIG. 2, must be treated as a system problem. What we hear in playback depends completely upon the types of microphones used, their setup, the venue, the equalization and mix-down of the microphone output signals to two channels, if more than two microphones were used. Furthermore, the two loudspeakers, where they are placed, how they illuminate the living room with sound, and how the mix of direct and reflected sounds reaches the listener's eardrums, all this determines what we hear when we close our eyes. Our brain constructs an Auditory Scene from the two streams of air pressure variations at the eardrums. The streams evoke emotions and memories. They influence our state of being. Hearing involves physics, but in the end it is a completely subjective, perceptual, cognitive process that is subject to the focus of our attention. Thus we are able to ignore what is not of interest at the moment. We move it beyond our Auditory Horizon, but immediately extend the horizon again if a new sound startles or irritates us.

FIG. 1  Symphony Hall from row 24

FIG. 2  My living room with ORION and PLUTO loudspeakers.
'A' is the place for critical listening at the apex of an equilateral triangle formed by the tweeters and my head. More casual listening is from 'B'. 

I have talked at length about loudspeakers and sound reproduction in a living room elsewhere on this website. Here I want to investigate the influence of the recording process upon the Gestalt of the Auditory Scene, AS, that I hear in my living room, in my mind to be accurate. The size of my room, FIG. 4, is almost an order of magnitude less than that of Symphony Hall, FIG. 3. The same physics apply in both venues but the calculated numbers for Reverberation Time, Schroeder Frequency or Critical Distance will be different and therefore also their effect upon the sound. Angular relationships, such as the angle subtended by the source, though, can be preserved, if the source width is scaled in proportion to the linear dimensions between the venues. Thus location A in my living room would correspond to a very good seat X in Symphony Hall, if the orchestra where scaled down. Since acoustic size tends to follow sound volume, the phantom orchestra must play at a lower level in my living room than in the Hall. But at what playback level do I perceive the Auditory Distance from A to the phantom orchestra as equal to the Auditory Distance from X to the real orchestra? 

FIG. 3  Symphony Hall

FIG. 4  My living room

Auditory Distance is what you hear as distance when your eyes are closed. It is an important attribute to realism in sound reproduction, but is usually distorted in the recording and playback process. How we determine auditory distance and other details of the Auditory Scene is an amazing brain activity that must have evolved for survival. There are still gaps in our understanding of hearing, because experiments tend to focus on a single or limited number of parameters for practical reasons, but hearing operates usually in an environment where a multitude of parameters interacts. There can be real surprises like the seemingly disproportionate auditory effect of a few tenth of dB change to the frequency response of the ORION over certain parts of its frequency range. Audible changes that would be difficult to measure acoustically. They are difficult to explain by the hearing processes, which we use to recognize sound sources and their surrounding space and where space and its content is heard by its reflections and reverberation. Those processes, though, can lead to an understanding of stereo presentation and its phantom sources.

 

1 - Hearing Processes

The fact that we have two ears, and not just one, indicates that the most important function of our hearing apparatus is to find the DIRECTION and DISTANCE of a sound source that is out of view, while subconsciously scanning the multitude of sounds in the ENVIRONMENT for potentially dangerous or unknown sources. If the source is seen, then vision takes over, because it is more precise. Decisions about what to pay immediate attention to are sometimes made in milliseconds and we jump or run before having seen the thread. Normally sounds occur in the spatial context of other sounds, reflections and reverberation. Recording and reproduction should preserve the given context to sound natural.

FIG. 5  Zeroing in on one of five sound sources in the 
presence of their environmental reflections

Hearing, the formation of an Auditory Scene, happens in our brain based on the streams of air pressure variations at the eardrums and possibly combined with bone conducted mechanical vibrations in the body. We determine the direction of a source by arrival time and level differences in the streams at left and right ears, FIG. 5. The distance between the ears is about 17 cm. Timing differences depend upon the orientation a of the head relative to the source and its distance. With the speed of sound c = 340 m/s we have for a distant source:

DT = 500 sin(a)  microseconds, e.g. for a = 300, DT = 250 ms. 

The head also presents a blockage and diffracts arriving sounds. This causes differences in the sound pressure levels at the ears. The differences are magnified at higher frequencies and shorter wavelengths by the cup shaped outer ear and its finer anatomical details imparting detail and variation to the frequency response at the eardrum.

 

Interaural time differences, ITD, predominate directional hearing below about 1 kHz, while interaural level differences, ILD, are perceptually used above. The frequency responses of these pressure differences are difficult to interpret as to their specific meaning, FIG. 6. When modeled and applied in combination with head movement tracking, where the ear signals change according to the direction the listener is facing, then the head-related-transfer-function, HRTF, leads to the perception of natural direction, distance and spatial spread of a recorded sound source.

For example, I listened to a mono recording over headphones. The AS was located right between my ears, inside my head. With head tracking activated and after a few left-right turns of my head the AS was perceived as coming from a source clear across the room. It stayed in that location when I turned 3600. When head tracking was turned off, the source still remained across the room. It took several left-right turns before the source started sliding towards me with increasing speed and located itself inside my head again.

A similar perception occurs in an anechoic chamber when the listener's head is clamped and the source is not visible. The AS is perceived as being located inside the head. 

A pair of highly directional loudspeakers in a dead room can create a center phantom source of a singer that is located in front of the plane of the loudspeakers and closer to the listener. But it is not perceived inside the head, because the ear signals change with head movement and usually there are visual distance cues. 

 

 

   FIG. 6  Typical frequency responses for interaural level and time
   differences in the horizontal plane [1]

 

 

We have this amazing hard-wired ability to automatically compensate for the changes in ear signals that occur with head turning. The tonal character or timbre is independent of the direction to the source. In Theile's Association Model [2] of hearing, FIG. 7, the outer ear and head shape the frequency response M at left and right eardrums. Inside the brain an inverse filter M-1 corrects for the response change to assist in recognizing the source timbre independent of its location relative to the direction of view. Clearly this must be a survival mechanism.

You can easily demonstrate to yourself the invariability of timbre in various situations. We do not even notice it, because it is a normal part of life. But when you realize that all along the eardrum signals change significantly, then it is good reason to marvel. 

 

   FIG. 7  Principal function of the Association Model [2]

1.2 Multiple Sources, Reflections, Reverberation, Space

In FIG. 5 above I have indicated five separate sources and their reflections, which might represent a real situation like a cocktail party in a room or an outdoor acoustic scene in nature with birds, dogs and people. We are well adapted to localize different sources even in the presence of strong reflections and to differentiate between them. However the auditory processes involved are difficult to study and therefore the bulk of the investigations has dealt with a single source and a single reflection in an otherwise anechoic environment. One must be careful to draw conclusions for multiple sources and the effects of reflections. We have the ability to focus attention onto specific sources and to move what is static or not of interest beyond our momentary acoustic horizon.

Haas found out that a single reflection, delayed by 6-23 ms, is perceived to be just as loud as the direct sound if its relative sound pressure level is +10.4 dB, FIG. 9, [4, 5]. He focused on equal loudness, not directional drift, spread or separation of the image(s). Also the two sources were visible.
The threshold of detection of the single reflection is about 25 dB lower [6]. Spreading of the source image and audibility of the reflection source occur before the equal loudness level has been reached.

Note that the threshold curves have a characteristic downward slope with increasing delay time. A reflection in a natural environment that has been delayed by 50 ms has traveled 17 m more than the direct sound. If the source is at 3 m distance, then the reflection is at least 15 dB down. I speculate that for survival reasons the reflection sound should still reside within our auditory horizon in order to differentiate it from a new source. 

 

Fig. 9  Threshold curves for detecting different effects of a single reflection.  

Damaske investigated the effect of up to six reflections at various angles upon the lateral drift threshold for a single frontal source of continuous speech [4]. He found no relevant drift for the center of the voice and no dependence upon the direction of the reflections! The six reflections had threshold levels of: 
R1 =  +1.1dB, 6 ms 
R2 = +5.3 dB, 12 ms
R3 = +12 dB, 18 ms
R4 = +15.8 dB, 24 ms
R5 = +20.3 dB, 31 ms
R6 = +23.4 dB, 37 ms 
Though the voice stayed centered in front it increasingly spread with more reflections and regardless of their angle of incidence. A certain degree of diffuseness is given automatically if reflections arrive from whatever direction. With  6 reflections the voice spread nearly 2700 around the listener. There were 28 testers. 

Diffuseness is not the same as spatiality, which derives from the reflection patterns and reverberation of a specific space. The 6 reflections might say something about the onset of reverberation, but in a concert hall the direct sound and a few early reflections are immediately followed by a multitude of reflections and their decaying sound. Early reflections and reverberation together give us a sense of the size and nature of a space as well as our distance from the sound source.

FIG. 10 shows the drift threshold curve of a single reflection R1 at 360. The curve for R1 does not change if the reflection is at a different angle. With R1 at +1.1 dB and 6 ms a second reflection R2 is added. It has to be at a higher level than R1 to cause a drift, which is again independent of the angle of R2. 
Note how in these tests the brain holds on to the direction of the first arriving sound and is not confused even by these very high levels of reflections. But it becomes more sensitive the larger their delay. 

FIG. 10 Drift threshold for one and two reflections [4]. 

 

1.3 Real Sources and Phantom Sources

While anechoic conditions are necessary to isolate and study hearing phenomena they are extremely uncommon in normal life situations. Our hearing has adapted to work in spaces with very different reflective properties, forests and grass lands, indoors and outdoors, and with few or many sources of sound, and in different locations, coming and going. I have experienced the quietness of sitting in a kayak on a lake in dense fog and far from shore or standing alone in a pine forest after a heavy winter snowfall. I have noticed that I am highly alert in such situations to pick up the slightest sound. Unlike in an anechoic chamber we live and listen with a reflective plane underneath our feet. As we grow up we change our height above this plane and thus the delay of the floor reflection of a source that has not changed. The spectrum changes, but does the source sound different? We certainly perceive when there is no floor in front of us, like when we come to the edge of a cliff. I suspect that we use the floor reflection relative to the direct sound to judge the elevation of the source, but I am not aware of a formal study. I heard not long ago the demonstration of a WFS-like system using numerous, tightly packed loudspeakers in a level ring all around the walls of a large room. I was surrounded by a fairly realistic AS that was all around me. The AS had a minimum fixed distance from me, a quiet space. Nothing was 'in my face'. Surprisingly everything seemed to happen above the level of my ears, even above my head. Lions were up in the trees, heavy rain did not hit the ground. Upon questioning I was told afterwards that the floor reflection had been taken out of the synthesized loudspeaker drive signals. I had experienced the quiet space, being inside a ring of sound, like being inside an empty arena, when I heard the demo of a new 11.1 surround system, which ended with a recording of being inside a concert hall audience where everyone was clapping hands. No one stood near me. They were at least 6 m away. Surround sound is not 3D sound unless distances from various recorded sound sources to the listener are realistically perceived in azimuth and elevation.

Naturally occurring sounds are produced in a spatial context. We never just hear a voice, a cello or a bird in isolation from its environmental space. That space participates in what we hear. "Spaces speak, are you listening?", is the title of a book by Barry Blesser and Linda-Ruth Salter [7]. Blesser was instrumental in the development of DSP products, such as reverberators, for the recording industry. He knows from direct experience how difficult it is to fool the human brain in its auditory processes. This book is for anyone interested in hearing and its cultural aspects. It should be required-reading for architects as they are by default not only structural architects but also aural architects. Many though do not seem to be aware of this fact or do not care, judging by their creations.

My Sound Pictures demo disk provides examples of familiar sounds in different spaces. Here are two more:
Track 03-LL-audience preconcert.mp3 - How did you recognize the loudspeaker?
Track 04-LL-hall-street.mp3 - Voices and sounds as I walk through five different surroundings.

Top

Page 1 - Page 2 - Page 3 - Page 4

 

4 - Sound Test Files

01-04-LL-Sound_Test_Files.zip (13.3 MB)

        01-LL-Chimes Binaural.flac

        02-LL-Percussions Binaural.flac

        03-LL-audience preconcert.mp3

        04-LL-hall-street.mp3

05-12-LL-Sound_Test_Files.zip (5.2 MB)

        05-LL-1kHz_bursts_500ms.wav

        06-LL-1kburst-orion-pluto.mp3

        07-LL-anechoic-female-male-music.wav

        08-LL-left_orion.mp3

        09-LL-left_pluto.mp3

        10-LL-left_SL.mp3

        11-LL-left_orion-SL_replay.mp3

        12-LL-left_SL_EL.mp3

 

Top

 

 

[2] Guenther Theile, On the Standardization of the Frequency Response of High-Quality Studio Headphones, JAES, 1986

[5] H. Haas, Einfluss eines Einfachechos auf die Hoersamkeit von Sprache. Acoustica 1, 49 (1951)

[7] Barry Blesser, Linda-Ruth Salter, Spaces speak, are you listening? Experiencing aural architecture, The MIT Press, 2007

[8] Guenther Theile, On the Naturalness of Two-Channel Stereo Sound, JAES 1991, https://www.hauptmikrofon.de/theile/Theile_Naturelness_JAES1991.PDF

[11] Siegfried Linkwitz, Room Reflections Misunderstood?, 123rd AES Convention, New York, October 2007. Abstract. MS PowerPoint Presentation, Preprint 7162.

[12] Barry Blesser, An Interdisciplinary Synthesis of Reverberation Viewpoints, JAES, Vol. 49, No. 10, October 2001

[13] Albert S. Bregman, Auditory Scene Analysis - The perceptual organization of sound, The MIT Press,1990

[14] David Griesinger, The Importance of the Direct to Reverberant Ratio in the Perception of Distance, Localization, Clarity and Envelopment, 126th AES Convention, Munich 2009, Preprint 7724, www.davidgriesinger.com

[15] Yost, W. A., The Cocktail Party Problem: Forty Years Later, Chapter 17 in "Binaural and Spatial Hearing in Real and Virtual Environments" R. Gilkey and T. Anderson (Eds.), Lawrence Erlbaum Associates, Hillsdale, NJ, 1997

[16] Hartmann, W. M., Listening in a Room and the Precedence Effect, Chapter 10 in “Binaural and Spatial Hearing in Real and Virtual Environments”, R. Gilkey and T. Anderson (Eds.), Lawrence Erlbaum Associates, Hillsdale, NJ, 1997

[17] Yost, W. A., Perceptual Models for Auditory Localization, in “The Perception of Reproduced Sound”, The Proceedings of the AES 12th International Conference, Copenhagen, Denmark, 1993

[18] Hartmann, W. M., Auditory Localization in Rooms, in “The Perception of Reproduced Sound”, The Proceedings of the AES 12th International Conference, Copenhagen, Denmark, 1993

[19] Hilmar Lehnert, Auditory Spatial Impression, in “The Perception of Reproduced Sound”, The Proceedings of the AES 12th International Conference, Copenhagen, Denmark, 1993

[20] A. S. Bregman and P. A. Ahad, Demonstrations of Auditory Scene Analysis Audio CD, https://mitpress.mit.edu

Top

sphere - 

Bregman

 

 

-- Auditory Scene creation -- Recording what we hear -- Experimental results -- Theory -- SRA --

 

 

 

What you hear is not the air pressure variation in itself 
but what has drawn your attention
in the streams of superimposed air pressure variations 
at your eardrums

An acoustic event has dimensions of Time, Tone, Loudness and Space
Have they been recorded and rendered sensibly?

___________________________________________________________
Last revised: 02/15/2023   -  © 1999-2019 LINKWITZ LAB, All Rights Reserved