----- Experience in your own room the magical nature of stereo sound -----

 

 

What's new

 

LX - Store

 

BLACKLIST

 

Conversations
with Fitz

 

OPLUG
Forum

 

Basics

The Magic in 2-Channel Sound

Issues in speaker
design

Dipole models

Active filters

Amplifiers etc

Microphone

FAQ's

Loudspeakers

Crossovers

Room acoustics

Stereo Recording and Rendering

Audio production

Conclusions

 

Projects

Your own desig

LXmini

LXmini+2

LXstudio

LX521.4

LX521
reference

ORION
challenge

ORION-3.4

PLUTO-2.1

WATSON-SEL

PLUTO+
subwoofer

THOR
subwoofer

PHOENIX
dipole speaker

Three-Box active
system (1978)

Reference
earphones

Surround
sound

Loudspeaker
& Room

 

Resources

Publications

Sound recordings

Links

Other designs

My current setup

About me

Site map

 

HOME

 

------------------
Digital Photo
Processes

 

------------------
The
Sea Ranch

 

------------------
My Daughter
the Jeweler

 

What's new

 

LX - Store

 

Conversations
with Fitz

 

OPLUG
Forum

 

 

 

Sensible Stereo Recording

-- Auditory Scene creation -- Recording what we hear -- Experimental results -- Theory -- SRA --

 

Under construction:
3/10/12 - On hold while I am working on a paper for the San Francisco AES Convention with the title:
"Spatial rendering of the reverberated stereo signal in the ITD range of hearing." 
or 
"Recordings, Rooms and Rendering" for short.

Listening to WATSON-SEL gave me new insights.

 

Creating the Gestalt of the Auditory Scene

Page 1 - Page 2 - Page 3 - Page 4

 

1.3.1 The sound of a single loudspeaker in a room 

Let's look at the simple case of a single loudspeaker in a reverberant room. Wall reflections, FIG. 16, can be modeled by image sources, which replace the walls. FIG 17 shows the intersection of three orthogonal mirror surfaces as in a room corner. It can be seen that there are seven images of the single loudspeaker, all radiating sound but to varying degree. If the ceiling were added to the three walls, then there would be images added in the ceiling corner, and their images in the floor corner, and so on. If a second side wall and a front wall are added so that the room is closed and has eight corners, then the total number of image sources becomes very large. But as the image distances from the corner increases from 1st order, to 2nd order, to 3rd order reflections and so on, their strength also decreases due absorption and diffusion by the wall surface properties. Still, as long as there are reflections there is never just a single loudspeaker in the room, but many copies and seen in different orientations. The situation is not unlike having another person in the room who is talking to you. She too has many acoustic images and if the room is very reverberant she may be hard to understand. In an anechoic room there is no issue with intelligibility, but we feel uneasy. Between those two extremes of reverberation lies a comfort zone for normal living spaces. Cathedrals, concert halls, auditoria or office spaces have their unique and different requirements for sound reverberation. What applies to acoustically large spaces does not necessarily translate to living rooms, which are considered as acoustically small below their Schroeder frequency, i.e. typically below 200 Hz. What constitutes an acoustically comfortable living room? To me it is a room with RT60 around 500ms, which is much more live than a German recording studio built to the RT60 standard of 250ms. Even Tonmeisters agree that this is not an environment to listen for pleasure, but a working space to analyze the recording. They speculated that the PLUTO loudspeaker would not be suitable in the studio, but they would like it in a living room for enjoyment and checking how their recording translates.

FIG. 16  Direct sound and first reflections from a loudspeaker near a room corner. No floor or ceiling reflections are shown.

FIG. 17  Image sources corresponding to floor, rear and side wall reflections for a dipole loudspeaker near a room corner. Actual surfaces scatter, diffuse and absorb sound to varying degrees. Thus the mirror images should look foggy, fainter and differently colored in comparison to the direct source. Initial coloring of the images is caused by changes in the polar frequency response of the loudspeaker. Reflections are also weaker by (Ddirect / Dreflected) due to greater distance and are delayed by (Dreflected - Ddirect) for an observer at Ddirect from the loudspeaker. [11]

As an experiment to show the interaction of a single loudspeaker with a room I will use a 1 kHz toneburst, FIG. 18, applied to the left ORION, FIG. 19, and then applied to the left PLUTO after it has been moved into the previous location of the ORION. The loudspeaker outputs are recorded with a stereo microphone from the listening position A in FIG. 4.

 FIG. 18  Narrowband test signal to show 1 kHz room reflections with approximately 10 ms time resolution

FIG. 19  Recording of the sound output from the left located ORION, PLUTO or my voice with a SONY PCM-M10 at the listening position

FIG. 20  Direct and reflected signal response during the first 200ms as recorded from listening position A for ORION and PLUTO.

Though ORION and PLUTO reproduce the same 1 kHz burst the response of the room is stronger and different for PLUTO as it is for ORION. PLUTO radiates omni-directional while ORION radiates bi-directional in dipole fashion. Reflected signals that arrive within 10 ms of the direct sound from nearby surfaces or objects add to the burst received by the microphone and cannot be resolved visually. To do so would require to increase the burst frequency, e.g. to 3 kHz for a 3.3 ms duration test signal. The room response at the higher burst frequency would be indicative of the radiation pattern of the loudspeakers at the higher frequencies and the reflectivity of the room around 3 kHz. In general it should be noted that the reflection pattern and the reverberant field at the listening position is a direct function of the 3-D polar response of the loudspeaker and its variation with frequency. A corresponding situation occurs in the concert hall, where any musical instrument that is being played determines the gestalt of its direct and reverberated sound at the recording microphone location due to its directivity. Thus, when a musician is recorded in an isolation booth and reverberation added later, it is highly unlikely that the reverberation will have the natural characteristics of the same musical instrument in a particular venue. 

The reflections in FIG. 20 are above the threshold of detection in FIG. 9. They are below the drift threshold in FIG. 10 for the first 50 ms or so but not for greater delays. Since we are dealing with a large number of reflections and a situation that is quite different from the cases that were investigated in those two figures above, one must be careful when drawing conclusions about how we are likely to perceive the acoustic event. You can apply the 1 kHz burst test signal to your own loudspeaker and hear for yourself. I generated the burst using the f(x) Expression Evaluator in Goldwave under Tools. The expression is: (0.5-0.5*cos(2*pi*t*f/x))*sin(2*pi*t*f) with x=10 and f=1000. It creates a 1000 Hz sinewave, which is 100% amplitude modulated by a 100 Hz raised-cosine wave. I copied one modulation cycle and pasted it three times at 500 ms intervals into a New wave of 2 s duration to generate Track 05-LL-1kHz_bursts_500ms.wav. You can use this technique to generate tonebursts of different frequency, number of cycles or repetition rate for testing purposes, particularly for high level stress testing of devices that might otherwise be damaged by overheating from continuous signals.

a - Track 05-LL-1kHz_bursts_500ms.wav
Here are the three tonebursts that are fed to the left loudspeaker. The right preamp output is disconnected. The loudspeaker location and distance are recognized. This is a real sound source, which creates an Auditory Scene  when I close my eyes, of a pinging sound coming from a specific place in my room. The sound does not characterize the real nature of the source, what generated the sound, only its location. The pinging tends to sound distorted at high levels even though the drivers should have little problem around 1 kHz. I assume this is due to reflections of the burst.

b - Track 06-LL-1kburst-orion-pluto.mp3
Here we have the three bursts of Track 05 followed by a recording from listening position A of the left ORION and PLUTO outputs. All three sets of bursts sound different. More room
reflections are added to PLUTO's bursts than ORION's. But neither of the recorded bursts sound like how I hear them from location A when Track 05 is reproduced by ORION or PLUTO.  

c - Track 07-LL-anechoic-female-male-music.wav
Short clips of reflection free, anechic recordings of female and male speech and music. Neither of the clips creates the
perception of a real person or a music band being in my room, though I can readily locate them at the left loudspeaker. What is wrong?  I suspect the acoustic size and thus the 3-D radiation pattern of the loudspeaker is too different from that of a real person. Therefore the reflections in the room and the reverberation have a different character than those of a real person in the loudspeaker location. The test should give more realistic results in an anechoic chamber, but how often do we converse with someone in an anechoic chamber.  The music section is even less convincing because it should extend over a much wider spatial arc to be believable. 

d - Track 08-LL-left_orion.mp3
Here Track 07 was played through the left loudspeaker and recorded. Thus the response of my room became part of the recording and so it is no longer anechoic. It is always surprising to me how strongly I can hear the room in the recording, when I did not notice it as much on playback of the anechoic track. It seems that the brain can tune out the room excitation, but not if it is already imbedded with the recording. 

e - Track 09-LL-left_pluto.mp3
Similar to Track 08, except that PLUTO was recorded. The room response is stronger than for ORION.

f - Track 10-LL-left_SL.mp3
Instead of recording a loudspeaker reproduction I recorded my voice from the location of the left loudspeaker. Th
e recording contains, of course, the response of the room as seen from the microphone location A. Note how much more realistic my voice sounds than the previous anechoic voice over loudspeaker.

g - Track 11-LL-left_orion-SL_replay.mp3
Here is my voice as played back by the left ORION loudspeaker. 

h - Track 12-LL-left_SL_EL.mp3
The previous recordings were at a distance of 2.4 m from the loudspeaker or from me. For this track I am standing at the left loudspeaker with the microphone in my hand. My wife sits in a chair at A. Because of the close proximity of the microphone there is little room reverberation heard. My wife's voice is more distant and
clearly in a reverberant environment. The volume of her voice did not seem as low to me as it is on loudspeaker playback. Her distance from the microphone was about eight times that of mine. 

Note that my voice appears at the loudspeaker, not 30 cm in front of you, as was the distance from my mouth to the microphone. If this were a true 3-D reproduction, then my voice would have been this close to you. Instead it is located at the loudspeaker. We process directional and distance cues in the direct and reflected sounds of any source of sound. In this case we localize the voice at the loudspeaker. My voice has little recorded reverberation, but the voice of my wife does and is at lower volume. In the AS she is placed at some distance from me. The real distance to the loudspeaker from where ever I am in the room is also the minimum distance between me and the AS when I close my eyes.

Last night I attended a concert, FIG.1, and made some related observations about single loudspeaker sound reproduction. The lecturer of the pre-concert music talk had a microphone. A loudspeaker high above the stage amplified his voice but the volume was kept low and the precedence effect [2] prevented me from localizing the loudspeaker as a source separate from the speaking person. Nevertheless it was obvious to me that a loudspeaker was involved. His voice had an unnatural overlay due to the strong reverberation of the loudspeaker's output by the hall. Furthermore the reverberation was colored by the polar response of the loudspeaker.
Just before the concert started a female voice announced over loudspeaker(s) to turn off electronic devices. Her voice was quite loud and so there was no question that it was reproduced. Her acoustic size was unrealistically large compared to what a speaking voice sounds like in the large hall. This was demonstrated a few minutes later when the conductor turned to the audience to introduce the upcoming modern violin concerto. His voice did not suffer from unnatural reverberation. His voice actually showed little reverberation though I am seated at great distance. It sounded like a distant voice in a large space and though not loud, was easy to understand as the audience was quiet. The concert illustrated again how different solo instruments reverberate with different strength and character due to output volume and directional aim of their sound. The bass drum sound rolls through the hall even when only slightly tapped. The solo violin does not seem to excite any reverberation whether played softly or strident. The massed orchestra illuminates the hall with sound and enveloping reverberation. 

I have often wondered how we recognize whether a sound was made by a real instrument or by a loudspeaker, especially when the sound comes from some distance, like through the hallway in a building or from an open window. It must be the reverberant sound, which is characteristic for every instrument or source. The physical construction, size and geometry of a source determines its radiation pattern. Since we live in reflective environments each source of sounds assumes a gestalt. We recognize the source by its gestalt even when the environment changes. We can differentiate a piano from the loudspeaker reproduction of a piano even when the sound that we hear has lost high or low frequencies or envelope distortion in its acoustic transmission from a distant place. 

I draw several conclusions from the listening tests for sound reproduction from a single loudspeaker in a reverberant space:

  1. The most believable AS is created by Track 10 and Track 12, which are recordings of live voice in my living room. I recognize the acoustic event and there is no illusion about the nature of the loudspeaker reproduction in my room. They are there, not here. These are live, not anechoic recordings and I hear the spatial context and relationships even with single loudspeaker playback. The AS has a familiar connection to reality. My physical distance to the loudspeaker is the minimum auditory distance of the AS. 

    Each source of sound, a person, a musical instrument, a tool or loudspeaker has its own acoustic gestalt, which is not hidden by reflections and reverberation. Reflections and reverberation may actually help in recognizing the gestalt and thus the identity of the source. Together with finding the direction and distance of an unknown source of sound, namely its spatial location, recognizing its identity, the nature of the beast, could be important for survival. We are naturally motivated to learn to recognize the acoustic signature of sound sources in spaces with different reflective and reverberant properties. We stay especially alert when in an environment where this source might reside. The brain uses cognitive processes all the time even when we are not conscious of them. They allow us to relax when we are in a familiar and friendly space. There we pull in our auditory horizon and pay attention to the AS of the present acoustic event. This might be a live concert or its phantom image produced by a pair of loudspeakers, where the loudspeakers have disappeared. But spatial awareness remains and might even be heightened for enjoyment. The auditory apparatus is always working. Unlike the visual apparatus it has no lids. Auditory events do not occlude each other. When multiple mixed streams of sound arrive at the ears, the brain knows which one to attend to and which ones to ignore as background, even if only a hint of the source's identity is provided [12, 13]. 

  2. Track 07, the set of anechoic recordings, whether played back over ORION or PLUTO creates an AS that is not mistaken for a real female, male or music band in my room. I recognize the absence of reverberation in the recording. The AS is that of the left ORION or PLUTO in my room. If we listened only in mono, then the perfect loudspeaker would have a flat on-axis response and the polar response of the source that it is reproducing. It would be an interesting exercise to build such a loudspeaker for human voice, but then it is not of use for anything else. The typical single loudspeaker can at best create the illusion of an acoustic event. It cannot recreate the acoustic event.

    As an aside, when I design a loudspeaker I first built only one, measure and listened to it in my room and then question if it is worth to build an identical second unit. A single loudspeaker is more revealing than a stereo setup. 

  3. Tracks 08, 09 and 11 illustrate the perceptual distortion of loudspeaker and room, which was created by an imperfect loudspeaker interacting with the room and then recorded from a single point in space. These tracks highlight limitations in recording and reproduction of acoustic events. They are specific magnifications of the acoustic behavior of loudspeaker and room. I do not believe that they could be the starting point to change either the loudspeaker polar response or the room acoustics. The ratio of direct to reverberant sound, D/R, has been decreased in the playback of the room recording when the playback level for Track 08 and Track 09 is the same as for Track 07. The direct to reverberant ratio is a critical parameter for the quality of speech transmission and intelligibility but also for concert hall acoustics [13]. D/R depends upon the radiation pattern of the source and its location on the stage. Thus it is nearly impossible to provide artificial reverberation that sounds natural in all cases.   

The polar response of the loudspeaker, the spatial context in the recording and the listening room are important contributors to creating the Gestalt of the AS.

With currently used methods of sound recording and reproduction we are merely trying to fool the mind. Therefore it should be important to understand the cues or tricks that we fall for and to avoid those that destroy the illusion. Physical reconstruction of the wave field that existed at some location in space during the original acoustic event is still at the experimental stage. If realized it could give us a taste of time travel. Stereo loudspeakers in a room are not up to that task, only binaural is, FIG. 8.

 

1.3.2 The sound of two loudspeakers in a room 

FIG. 21  Direct signals and room reflected signals from left and right loudspeakers superimpose at both ears of the observer

Two loudspeakers in a room, FIG. 21, are perceptually treated as two independent sources of sound to the degree that the sounds are uncorrelated. The left speaker may carry the interview of a male in a church, the right speaker the interview of a female in the same church and location, but recorded at a later time. When played back simultaneously we hear a male voice at the left speaker, a female voice voice at the right speaker and that both talk in a similarly reverberant space. They may be difficult to understand when they talk at the same time. They would be even more difficult to understand if both tracks were played back simultaneously over only one of the two loudspeakers. The distance between left and right loudspeakers helps localization of male or female voice by turning the head towards the one of interest. Localization makes it easier to follow the sound stream and to suppress sound from other locations to some degree. 

We have an amazing ability to hold a conversation with the person in front of us when surrounded by a multitude of people, conversations and background noise. The so called "Cocktail Party Effect" and related phenomena have been intensively studied [15, 16, 17, 18]. People with hearing aids often have difficulty coping in such situations.  

I am in awe of how the brain has evolved to perceive, locate and recognize multiple, individual sources of sound in environments of greatly different reflective properties. To quote from Bregman [20]:

"Sound is a pattern of pressure waves moving through the air, each sound producing event creating its own wave pattern. The human brain recognizes these patterns as indicative of the events that give rise to them: a car going by, a violin playing, a woman speaking, and so on. Unfortunately by the time the sound has reached the ear, the wave patterns arriving from the individual events have been added together in the air so that the pressure wave that reaches the eardrum is the sum of the pressure patterns coming from the individual events. The summed pressure wave need not resemble the wave patterns of the individual sounds.
As listeners, we are not interested in this summed pattern, but in the individual wave patterns arising from the separate events. Therefore our brains have to solve the problem of creating separate descriptions of the individual happenings, but it doesn't even know, at the outset, how many sounds there are, never mind what their wave patterns are: so the discovery of the number and nature of sound sources is analogous to the following mathematical problem: - The number 837 is the sum of an unknown number of other numbers; what are they? There is a unique answer. - "  

Physics Today carried in 2011 an article, "Listening in on the listening brain", which indicates that hearing the eardrum signals involves a bi-directional stream of information - from inner ear to brain and from brain to inner ear. A short 0.1 ms click stimulates approximately 7 ms of brain activity.

How to make sense of two loudspeakers emitting identical or nearly identical wave patterns in a reflective environment, as with stereo, would seem to be a particularly difficult and confusing problem in hearing. In evolutionary terms, there is no natural precedence for such situation. Maybe a wolf pack can create a similar acoustic event. Hearing phantom sources is like an escape from reality, from hearing the two loudspeakers, which are readily recognized, and the familiar room. Hiding loudspeakers and room should therefore be the challenge for every loudspeaker designer and recording engineer in order to create a believable auditory illusion, an Auditory Scene of a different reality, that of listening to musicians performing Copland's Organ Symphony in Davies Symphony Hall, for example. 

 

i - Track 05-LL-1kHz_bursts_500ms.wav

j - Track 06-LL-1kburst-orion-pluto.mp3

k - Track 07-LL-anechoic-female-male-music.wav

l - Track 08-LL-left_orion.mp3

m - Track 09-LL-left_pluto.mp3

n - Track 10-LL-left_SL.mp3

o - Track 11-LL-left_orion-SL_replay.mp3

p - Track 12-LL-left_SL_EL.mp3

 

----------------------------------------------------------------------------------------------------------------------


60s of Beethoven's 9th


50ms of Beethoven's 9th

 

impulse response inside the brain

 

   

 

   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

-- Auditory Scene creation -- Recording what we hear -- Experimental results -- Theory -- SRA --

 

 

What you hear is not the air pressure variation in itself 
but what has drawn your attention
in the streams of superimposed air pressure variations 
at your eardrums

An acoustic event has dimensions of Time, Tone, Loudness and Space
Have they been recorded and rendered sensibly?

___________________________________________________________
Last revised: 02/15/2023   -  © 1999-2019 LINKWITZ LAB, All Rights Reserved