Resource material

In those two recordings I used the microphone setup of FIG. 4. The microphones are close to the pinna but not inside it. I have used this method of recording to generate source material that still has a strong resemblance to natural hearing and has not been equalized or mixed down from multiple tracks. I needed sounds, which I had heard live, for loudspeaker evaluation during their development. Today, with ORION or PLUTO, I feel well equipped to do the reverse, to evaluate recordings. Many times I have wished that I could ask the recording engineer for a particular CD what he had in mind, when he chose the microphone setup and mix. This should be CD liner information. Next to the musicians, conductor and venue, it is the recording engineer who determines the sound of the recording. "Tonmeister" is their apt description.

I have tried placing the microphones near the ear canal entrance but found that this colors the image on loudspeaker playback. Here the sound has passed the pinna a second time before reaching the eardrum. Artificial head, FIG. 2, recordings need to be equalized for headphone or loudspeaker playback for the same reason. The sphere microphone, FIG. 3, retains two essential features of our hearing apparatus: the spacing of the ears and the sound blockage of the head for ILD and ITD effects [8, 9]. It gives up on the acoustic details of head, shoulders and ears. My former colleague from HP, Lyman Miller, has been using a sphere microphone in various forms and for decades while recording for Stanford University, initially with modified Panasonic back-electret capsules, then Schoeps omnis and now with Sennheiser omnis, Sphere microphones used to be in the Schoeps and Neuman catalogues.

FIG. 14 Schoeps KFM6 free-field frequency responses of left and right transducers for 20⁰ and 60⁰ sound incidence from the left, relative to the stereo main axis. The main axis free-field response (not shown) is slightly boosted towards higher frequencies to obtain a flat diffuse-field response [9].

FIG. 15 KFM6 frequency response of the level difference for three angles of sound incidence relative to the stereo main axis, for which there is zero dB difference. [9]

The three microphone setups cannot deliver the eardrum signals of the binaural setup in FIG. 8 whether played back over headphones or loudspeakers. In either case they only provide auditory cues with strong resemblance to what a person's ears would have picked up in the live situation and being in the same location as the sphere. Precise recording of the sound field that existed in that location would have required four microphones, which sense the sound pressure W and the sound particle velocity in X, Y and Z directions. Such ambisonic microphone has the functionality of three orthogonal, bidirectional microphones and one omni microphone, which are all coincident. The four microphone signals must be matrixed to derive loudspeaker drive signals. Four or more loudspeakers are needed to reproduce the original sound field at one point in anechoic space and inside a useable size sphere around the point [10]. If the loudspeakers are in a reverberant room then the streams of room reflections are mixed with the streams of direct sound. The listener's brain must find cues to move the room's response beyond an acoustic horizon of attention in order to experience the original sound field.

============================================================================================

This form of spatial distortion of the AS, in-head localization, is common to headphone listening. Binaural recordings are usually made with the microphone blocking the ear canal entrance so that its resonance is eliminated. Even playback over headphones that have not been specially equalized, results in a very high degree of realism of the AS, though distances are strongly foreshortened, especially towards the front. But someone can whisper scarily enough into your ears.

        0 - Introduction
        1 - Hearing Processes
            1.1 Binaural
            1.2 Multiple sources, reflections, reverberation, space
            1.3 Real sources and phantom sources
                1.3.1 The sound of a single loudspeaker in a room
                1.3.2 The sound of two loudspeakers in a room
                1.3.3 The sound of two pairs of loudspeakers in a room
        2 - Recording Processes
        3 - Loudspeaker and Room
        4 - Sound Test Files
        5 - References

0 - INTRODUCTION

Recording music in one venue, like a concert hall, FIG. 1, jazz club or studio and then playing it back over two loudspeakers in another venue, like a living room, FIG. 2, must be treated as a system problem. What we hear in playback depends completely upon the types of microphones used, their setup, the venue, the equalization and mix-down of the microphone output signals to two channels, if more than two microphones were used. Furthermore, the two loudspeakers, where they are placed, how they illuminate the living room with sound, and how the mix of direct and reflected sounds reaches the listener's eardrums, all this determines what we hear when we close our eyes. Our brain constructs an Auditory Scene from the two streams of air pressure variations at the eardrums. The streams evoke emotions and memories. They influence our state of being. Hearing involves physics, but in the end it is a completely subjective, perceptual, cognitive process that is subject to the focus of our attention. Thus we are able to ignore what is not of interest at the moment. We move it beyond our Auditory Horizon, but immediately extend the horizon again if a new sound startles or irritates us.

I have talked at length about loudspeakers and sound reproduction in a living room elsewhere on this website. Here I want to investigate the influence of the recording process upon the Gestalt of the Auditory Scene, AS, that I hear in my living room, in my mind to be accurate. The size of my room, FIG. 4, is almost an order of magnitude less than that of Symphony Hall, FIG. 3. The same physics apply in both venues but the calculated numbers for Reverberation Time, Schroeder Frequency or Critical Distance will be different and therefore also their effect upon the sound. Angular relationships, such as the angle subtended by the source, though, can be preserved, if the source width is scaled in proportion to the linear dimensions between the venues. Thus location A in my living room would correspond to a very good seat X in Symphony Hall, if the orchestra where scaled down. Since acoustic size tends to follow sound volume, the phantom orchestra must play at a lower level in my living room than in the Hall. But at what playback level do I perceive the Auditory Distance from A to the phantom orchestra as equal to the Auditory Distance from X to the real orchestra?

Auditory Distance is what you hear as distance when your eyes are closed. It is an important attribute to realism in sound reproduction, but is usually distorted in the recording and playback process. How we determine auditory distance and other details of the Auditory Scene is an amazing brain activity that must have evolved for survival. There are still gaps in our understanding of hearing, because experiments tend to focus on a single or limited number of parameters for practical reasons, but hearing operates usually in an environment where a multitude of parameters interacts. There can be real surprises like the seemingly disproportionate auditory effect of a few tenth of dB change to the frequency response of the ORION over certain parts of its frequency range. Audible changes that would be difficult to measure acoustically. They are difficult to explain by the hearing processes, which we use to recognize sound sources and their surrounding space and where space and its content is heard by its reflections and reverberation. Those processes, though, can lead to an understanding of stereo presentation and its phantom sources.

1 - Hearing Processes

The fact that we have two ears, and not just one, indicates that the most important function of our hearing apparatus is to find the DIRECTION and DISTANCE of a sound source that is out of view, while subconsciously scanning the multitude of sounds in the ENVIRONMENT for potentially dangerous or unknown sources. If the source is seen, then vision takes over, because it is more precise. Decisions about what to pay immediate attention to are sometimes made in milliseconds and we jump or run before having seen the thread. Normally sounds occur in the spatial context of other sounds, reflections and reverberation. Recording and reproduction should preserve the given context to sound natural.

Interaural time differences, ITD, predominate directional hearing below about 1 kHz, while interaural level differences, ILD, are perceptually used above. The frequency responses of these pressure differences are difficult to interpret as to their specific meaning, FIG. 6. When modeled and applied in combination with head movement tracking, where the ear signals change according to the direction the listener is facing, then the head-related-transfer-function, HRTF, leads to the perception of natural direction, distance and spatial spread of a recorded sound source.

For example, I listened to a mono recording over headphones. The AS was located right between my ears, inside my head. With head tracking activated and after a few left-right turns of my head the AS was perceived as coming from a source clear across the room. It stayed in that location when I turned 360⁰. When head tracking was turned off, the source still remained across the room. It took several left-right turns before the source started sliding towards me with increasing speed and located itself inside my head again.

A similar perception occurs in an anechoic chamber when the listener's head is clamped and the source is not visible. The AS is perceived as being located inside the head.

A pair of highly directional loudspeakers in a dead room can create a center phantom source of a singer that is located in front of the plane of the loudspeakers and closer to the listener. But it is not perceived inside the head, because the ear signals change with head movement and usually there are visual distance cues.

FIG. 6 Typical frequency responses for interaural level and time
differences in the horizontal plane [1]

We have this amazing hard-wired ability to automatically compensate for the changes in ear signals that occur with head turning. The tonal character or timbre is independent of the direction to the source. In Theile's Association Model [2] of hearing, FIG. 7, the outer ear and head shape the frequency response M at left and right eardrums. Inside the brain an inverse filter M^-1 corrects for the response change to assist in recognizing the source timbre independent of its location relative to the direction of view. Clearly this must be a survival mechanism.

You can easily demonstrate to yourself the invariability of timbre in various situations. We do not even notice it, because it is a normal part of life. But when you realize that all along the eardrum signals change significantly, then it is good reason to marvel.

FIG. 7 Principal function of the Association Model [2]

1.2 Multiple Sources, Reflections, Reverberation, Space

In FIG. 5 above I have indicated five separate sources and their reflections, which might represent a real situation like a cocktail party in a room or an outdoor acoustic scene in nature with birds, dogs and people. We are well adapted to localize different sources even in the presence of strong reflections and to differentiate between them. However the auditory processes involved are difficult to study and therefore the bulk of the investigations has dealt with a single source and a single reflection in an otherwise anechoic environment. One must be careful to draw conclusions for multiple sources and the effects of reflections. We have the ability to focus attention onto specific sources and to move what is static or not of interest beyond our momentary acoustic horizon.

Damaske investigated the effect of up to six reflections at various angles upon the lateral drift threshold for a single frontal source of continuous speech [4]. He found no relevant drift for the center of the voice and no dependence upon the direction of the reflections! The six reflections had threshold levels of:
R1 = +1.1dB, 6 ms
R2 = +5.3 dB, 12 ms
R3 = +12 dB, 18 ms
R4 = +15.8 dB, 24 ms
R5 = +20.3 dB, 31 ms
R6 = +23.4 dB, 37 ms
Though the voice stayed centered in front it increasingly spread with more reflections and regardless of their angle of incidence. A certain degree of diffuseness is given automatically if reflections arrive from whatever direction. With 6 reflections the voice spread nearly 270⁰ around the listener. There were 28 testers.

Diffuseness is not the same as spatiality, which derives from the reflection patterns and reverberation of a specific space. The 6 reflections might say something about the onset of reverberation, but in a concert hall the direct sound and a few early reflections are immediately followed by a multitude of reflections and their decaying sound. Early reflections and reverberation together give us a sense of the size and nature of a space as well as our distance from the sound source.

FIG. 10 shows the drift threshold curve of a single reflection R1 at 36⁰. The curve for R1 does not change if the reflection is at a different angle. With R1 at +1.1 dB and 6 ms a second reflection R2 is added. It has to be at a higher level than R1 to cause a drift, which is again independent of the angle of R2.
Note how in these tests the brain holds on to the direction of the first arriving sound and is not confused even by these very high levels of reflections. But it becomes more sensitive the larger their delay.

FIG. 10 Drift threshold for one and two reflections [4].

1.3 Real Sources and Phantom Sources

While anechoic conditions are necessary to isolate and study hearing phenomena they are extremely uncommon in normal life situations. Our hearing has adapted to work in spaces with very different reflective properties, forests and grass lands, indoors and outdoors, and with few or many sources of sound, and in different locations, coming and going. I have experienced the quietness of sitting in a kayak on a lake in dense fog and far from shore or standing alone in a pine forest after a heavy winter snowfall. I have noticed that I am highly alert in such situations to pick up the slightest sound. Unlike in an anechoic chamber we live and listen with a reflective plane underneath our feet. As we grow up we change our height above this plane and thus the delay of the floor reflection of a source that has not changed. The spectrum changes, but does the source sound different? We certainly perceive when there is no floor in front of us, like when we come to the edge of a cliff. I suspect that we use the floor reflection relative to the direct sound to judge the elevation of the source, but I am not aware of a formal study. I heard not long ago the demonstration of a WFS-like system using numerous, tightly packed loudspeakers in a level ring all around the walls of a large room. I was surrounded by a fairly realistic AS that was all around me. The AS had a minimum fixed distance from me, a quiet space. Nothing was 'in my face'. Surprisingly everything seemed to happen above the level of my ears, even above my head. Lions were up in the trees, heavy rain did not hit the ground. Upon questioning I was told afterwards that the floor reflection had been taken out of the synthesized loudspeaker drive signals. I had experienced the quiet space, being inside a ring of sound, like being inside an empty arena, when I heard the demo of a new 11.1 surround system, which ended with a recording of being inside a concert hall audience where everyone was clapping hands. No one stood near me. They were at least 6 m away. Surround sound is not 3D sound unless distances from various recorded sound sources to the listener are realistically perceived in azimuth and elevation.

Naturally occurring sounds are produced in a spatial context. We never just hear a voice, a cello or a bird in isolation from its environmental space. That space participates in what we hear. "Spaces speak, are you listening?", is the title of a book by Barry Blesser and Linda-Ruth Salter [7]. Blesser was instrumental in the development of DSP products, such as reverberators, for the recording industry. He knows from direct experience how difficult it is to fool the human brain in its auditory processes. This book is for anyone interested in hearing and its cultural aspects. It should be required-reading for architects as they are by default not only structural architects but also aural architects. Many though do not seem to be aware of this fact or do not care, judging by their creations.

My Sound Pictures demo disk provides examples of familiar sounds in different spaces. Here are two more:
Track 03-LL-audience preconcert.mp3 - How did you recognize the loudspeaker?
Track 04-LL-hall-street.mp3 - Voices and sounds as I walk through five different surroundings.

4 - Sound Test Files

[2] Guenther Theile, On the Standardization of the Frequency Response of High-Quality Studio Headphones, JAES, 1986

[5] H. Haas, Einfluss eines Einfachechos auf die Hoersamkeit von Sprache. Acoustica 1, 49 (1951)

[7] Barry Blesser, Linda-Ruth Salter, Spaces speak, are you listening? Experiencing aural architecture, The MIT Press, 2007

[11] Siegfried Linkwitz, Room Reflections Misunderstood?, 123rd AES Convention, New York, October 2007. Abstract. MS PowerPoint Presentation, Preprint 7162.

[12] Barry Blesser, An Interdisciplinary Synthesis of Reverberation Viewpoints, JAES, Vol. 49, No. 10, October 2001

[13] Albert S. Bregman, Auditory Scene Analysis - The perceptual organization of sound, The MIT Press,1990

[14] David Griesinger, The Importance of the Direct to Reverberant Ratio in the Perception of Distance, Localization, Clarity and Envelopment, 126th AES Convention, Munich 2009, Preprint 7724, www.davidgriesinger.com

[15] Yost, W. A., The Cocktail Party Problem: Forty Years Later, Chapter 17 in "Binaural and Spatial Hearing in Real and Virtual Environments" R. Gilkey and T. Anderson (Eds.), Lawrence Erlbaum Associates, Hillsdale, NJ, 1997

[16] Hartmann, W. M., Listening in a Room and the Precedence Effect, Chapter 10 in “Binaural and Spatial Hearing in Real and Virtual Environments”, R. Gilkey and T. Anderson (Eds.), Lawrence Erlbaum Associates, Hillsdale, NJ, 1997

[17] Yost, W. A., Perceptual Models for Auditory Localization, in “The Perception of Reproduced Sound”, The Proceedings of the AES 12th International Conference, Copenhagen, Denmark, 1993

[18] Hartmann, W. M., Auditory Localization in Rooms, in “The Perception of Reproduced Sound”, The Proceedings of the AES 12th International Conference, Copenhagen, Denmark, 1993

[19] Hilmar Lehnert, Auditory Spatial Impression, in “The Perception of Reproduced Sound”, The Proceedings of the AES 12th International Conference, Copenhagen, Denmark, 1993

[20] A. S. Bregman and P. A. Ahad, Demonstrations of Auditory Scene Analysis Audio CD, https://mitpress.mit.edu