Accurate performance of a stereo sound system
The acoustic performance of a stereo sound system can vary significantly depending upon the loudspeakers used, upon their setup in the listening room and upon the size and acoustic properties of the room. I have observed that the two loudspeakers should be set up symmetrical with respect to adjacent large surfaces. Their distance from these surfaces should be at least 1 m (3 feet). Traditionally, the loudspeakers and listener should form an equilateral triangle with a base width W of at least 2.4 m (8 feet). In that case the distance D from the listener to each loudspeaker is the same as the distance W between the loudspeakers, taking the tweeters and the center of the head as reference points. A distance D of 0.7*W, i.e. a listening angle of 90 degrees rather than 60 degrees, provides a wider auditory scene and can be desirable on well recorded material. A distance D of 2*W or more brings significant loss of imaging detail, because the direct sound from the loudspeakers is much lower that the reverberant sound in the room. The overall presentation, though, can be more pleasing and is less sensitive to sitting off-axis.
With the listener in the "sweet spot" a virtual sound scene should open up in front of him. The minimum perceived distance to this scene is the distance D to the loudspeakers. Well recorded program material will evoke a realistic auditory scene, like a picture, in the listener's brain. The scene has a width of at least W, and a depth beyond D, and extending frequently beyond the distance to the wall behind the loudspeakers. The scene has the height of the loudspeakers and possibly beyond.
While the auditory scene is essentially bounded in width by the loudspeakers, the loudspeakers themselves are not part of the scene. Only when they distort severely is attention drawn to their location. Otherwise they disappear from the scene. The listening room has also disappeared from the auditory scene due to the loudspeaker setup and regardless of a recording's quality, but with an exception. Recordings with excessive bass may excite a few strong low frequency room resonance modes that decay slowly and become dominant in the auditory scene. Such modes can be equalized and attenuated when below 150 Hz.
The room participates in the creation of the auditory scene via low frequency modes, mid frequency reverberation and high frequency reflections. The extend to which modes, reverberation and reflections occur depends upon the radiation pattern of the loudspeaker and the room's acoustic properties. A directional loudspeaker, like a dipole, interacts less with the room. A directional loudspeaker with the same polar pattern at all frequencies illuminates the room uniformly. Our ear-brain-awareness hearing apparatus is well trained by evolution to sort out reverberation and reflections from the direct sound that created them, provided it receives consistent physical acoustic cues. Thus it adapts within seconds to a new environment and acoustic scene. It evokes an auditory scene in the mind and pays attention to what is relevant at the moment and might hold surprises. Everything else becomes background and is withdrawn from attention until it changes. We experience this whenever we enter a different acoustic space, be it outdoors, a room, or the ticking of a clock.
A properly set up and functioning stereo system allows a
listener to make judgments about the plausibility and artificiality of the
evoked auditory scene. In addition to timbre and loudness accuracy it is the
spatial distribution of virtual sources that becomes apparent. Are they spread
between the loudspeakers like laundry on a clothes line? Are instruments stacked
up on top of the center vocalist? Are different instruments contained in their
own acoustic sub-space forming together an auditory collage? Are they part of a
common acoustic space evoking the auditory image of a jazz club or concert hall?
Is the soloist sliding to the center or does he stand out unnaturally when it is
his turn? Is it a multi-miked studio recording lacking natural space? Is it a
live recording as one might have heard it from the audience
A sequence of tests is presented below that should reveal to what degree a given stereo system achieves the potential that is inherent in the 2-loudspeaker reproduction format. (See also the more recent Accuracy, spatial distortion and plausibility of the auditory scene article)
A - Pink Noise
Pink noise is a random process with a power spectrum that decreases at a 10 dB/decade or 3 dB/octave rate with increasing frequency. When measured with a 1/3rd octave analyzer, or constant Q filter bank, it has a flat frequency response. Since the critical bandwidth in hearing is approximately 1/3rd octave wide, pink noise tends to give an equal representation of all frequencies in the audio spectrum, from lows to highs. Thus it would seem to be a good auditory test signal, except that we do not have a reference for what it should sound like in an absolute sense. This limits the usefulness of pink noise to comparison tests of A versus B. Pink noise can reveal small physical differences between two sound sources, but it can be difficult to find the cause for those differences or to predict their consequences. Pink noise can drive you nuts, so be careful. Still, pink noise will point to flaws and errors in a sound system.
The tests use various 5 second combinations of L and R streams of uncorrelated pink noise. What I call Stereo here is actually fuzzy stereo and has no solid image, but is spatial like a cloud. In Mono the left and right tracks are identical. Left or Right means that there is sound only in one or the other track.
Download and save pink-alternating3.wav (12 MB). Then burn the file to a CD-R for convenient access and repetition of the 1 minute sound file.
A1 - Timbre of loudspeakers next to each other - (2, 3, 10, 11)
The two loudspeakers are placed next to each other as close as possible and in an open area of the room. Ideally they would sound identical, though it is rare not to hear differences. There could be slight differences in the frequency response due to component variability or driver level adjustments. There could also be hearing differences between left and right ear that show up when listening from relatively close distance to the two loudspeakers. Test your hearing response to pink noise with headphones. Swap left and right headphones to hear if a particular timbre stays consistently with the same ear. The test should reveal any driver polarity differences between left and right loudspeakers due to wiring errors.
Note also if the Stereo segments 5, 7, 9 sound spatially different from segment 1. I have observed with Pluto 2.1 that the diffuse sound cloud can stretch to the left or right depending on the placement of the two loudspeakers in the room. It appears that I am hearing a strong room reflection since I can find the direction and greater distance of its origin by head turning. This effect does not show up on the Mono segments or segment 1, which begs for an explanation. Apparently learning is involved. Also, the radiation pattern of this 2-pole source is independent of time in the Mono case of identical radiation from each loudspeaker. The pattern fluctuates in Stereo where the two loudspeakers radiate incoherently from each other. The room is illuminated consistently in the Mono case and this may suppress the perception of the reflection. The stereo case is processed differently.
A2 - Timbre of loudspeakers in their intended locations - (2, 3, 10, 11)
This test and the following tests A3 to A9 are for the intended room placement and separation of left and right loudspeakers and with the listener on the axis of symmetry and at the intended distance. Different listening distances should also be tried to note the effect that this has upon hearing.
Thus A2 is a repeat of the previous test A1 but for the setup under investigation. The previously observed timbre difference should not have increased.
A3 - Overall timbre - (1 to 12)
This is a difficult test, because one does not know with certainty how pink noise is supposed to sound like and what Neutral means hearingwise. It is not too difficult to hear colorations due to peaks in the spectrum, but they can also be easily imagined. Bass is likely to be soft unless the pink noise is played back at high volume level. This is a consequence of the Equal Loudness Contours (Fletcher-Munson). Levels and woofer excursions should be watched to avoid mechanical clipping and damage of the drivers. Cut very low frequencies if necessary.
A4 - Loudspeaker localization in Stereo - (1)
This test determines how the loudspeakers stand out from the diffuseness and uniformity of the Stereo sound cloud. While it is seen and known that left and right loudspeakers bound the cloud softly in width, the question is whether the loudspeakers add lumpiness to the width of the cloud, like coming out of the fog. The test should be done with repeats of segment 1, because alternating between Stereo and Mono can lead to hearing a left and right image in Stereo.
A5 - Stereo to Mono comparison - (4, 5, 6, 7, 8, 9)
This is a crucial test and there should be a clear difference between Stereo and Mono. Loudspeaker type, placement, listening distance and off-axis angle can significantly change the test result. The Mono presentation should also be louder (+3 dB).
A6 - Phantom center width - (1 to 14)
This test investigates the diffuseness if the virtual center source.
A7 - Phantom center timbre change with slow lateral head movement - (4, 6, 8, 12)
This is a crucial test to determine that the direct loudspeaker signals are symmetrical and not masked by room reflections! It makes the interference notch in the spectrum around 2 kHz audible and predicts accurate imaging for program material.
Note that pink noise is an artificial signal whereas program material benefits from pattern recognition in the brain. Thus the spectral notch is not heard for a phantom center source on program material. For a different take on stereo see Floyd Toole, Sound Reproduction, page 151.
A8 - Image location of 3 kHz burst - (13)
Pink noise is a broadband signal. The 10-cycle sinewave at 3 kHz with a raised cosine envelope has a -6 dB bandwidth of 600 Hz. It is a narrowband test signal. It should image precisely halfway between left and right loudspeakers and be far less diffuse than the Mono pink noise test signal. 3 kHz is in a frequency range where strong specular reflections can diffuse or shift the image. Directional hearing works on interaural level differences, ILD, between left and right ears at these frequencies. Turning the head and facing the left or right loudspeaker will make the burst appear to come from the left or right loudspeaker, which is surprising to me.
A9 - Image location of 300 Hz burst - (14)
Pink noise is a broadband signal. The 10-cycle sinewave at 300 Hz with a raised cosine envelope has a -6 dB bandwidth of 60 Hz. It is a narrowband test signal. It should image precisely halfway between left and right loudspeakers and be far less diffuse than the Mono pink noise test signal. 300 Hz is in a frequency range where interaural arrival time differences, ITD, at the ears determine the perceived direction to the sound source. Turning the head and facing the left or right loudspeaker will leave the 300 Hz burst between the loudspeakers, which is surprising to me.
It seems that the six burst signals after the pink noise segments clear the brain of the noise patterns and make for a fresh start to repeat the noise tests.
As a first step towards an accurate stereo system, the loudspeakers and listener must be placed in the room in such a way that the above pink noise and burst tests are met to a high degree. The tests represent basic requirements for accuracy, but they cannot reveal the accuracy of those parameters for which we have memory and pattern recognition, like human voice, music and certain sounds. The tests are "necessary, but not sufficient". Program material is the final arbiter of system accuracy.
B - Program material
The loudspeaker system will be considered accurate if it it can create plausible Auditory Scenes (AS) in a normal living room. Various recordings will be used for evaluation. It should be possible with an accurate sound system to recognize the consequences of different recording techniques. Also, if the system has systematic errors in its frequency response, room interaction, or too much distortion, then this should become apparent from listening to a variety of recordings and calling upon one's memory of live, unamplified acoustic events.
B1 - Where are elements of the auditory scene perceived?
Normally the distance from the listener to the closest element of the auditory scene (AS) is approximately equal to the distance to the loudspeakers. Sometimes a centered solo voice may appear closer than that distance. It appears to be caused by high directivity of the loudspeakers and/or dead room acoustics. The loudspeakers are acting to some degree like headphones. For headphones the AS is normally placed between the ears and inside the head.
B2 - What is the width of the auditory scene?
The width of the AS is determined by closing the eyes and pointing with outstretched arms to the outside edges of the scene. Normally it is at least as wide as the loudspeaker separation and has a soft edge. With some recordings it can be considerably wider.
B3 - How deep is the auditory scene?
The depth of the AS depends strongly upon the recording technique used. On pan-potted and closely miked studio recordings - as is common for pop music - it tends to be very shallow. The ear/brain does not receive adequate cues for perceiving depth of a natural space within which the instruments were played. On classical music recordings - and again depending on the microphone technique used - there can be a strong sense of depth developing as different instruments of the orchestra illuminate the venue in which the performance took place. The perceived depth is not bounded by the distance of the loudspeakers to the wall behind them if that distance is at least 1 m.
B4 - Does the listening room disappear aurally?
The perception of the AS appears to override the perception of the listening room, if the loudspeakers are optimally placed, have a frequency independent radiation pattern and do not excite attention getting low frequency modes. To avoid misleading visual cues the listener should close his eyes to concentrate only on the AS. The program material choice does not seem to have any influence upon the effect.
B5 - Do the loudspeakers disappear aurally?
The aural disappearance of the loudspeakers is to some extend a function of the program material. It is also a function of the loudspeaker's non-linear and linear distortion performance and polar response. Program material with left and right channel monaural sources gives hard boundaries to the AS. The loudspeakers stand out as left and right sources of sound.
B6 - What is the lateral distribution of phantom sources?
The lateral placement of sources in the AS is a function of the recording technique used. Sources may be lumped together and on top of each other, may be distributed like laundry on a clothes line between the loudspeakers, may be pin-point localized or be smoothly distributed as live. An accurate stereo system will reveal all of this. Pin-point localization of is an artifact of stereo and should not by itself be taken as a measure of accuracy in sound reproduction.
B7 - Are individual phantom sources contained in a coherent phantom space?
All acoustic events contain direct sounds and reflections of the direct sound, or the response of the venue in which the event took place. To sound natural the response of a venue must also be included in a recording. But recordings vary greatly and I assume that many recording engineers do not hear spatial aspects of the AS correctly over their monitor loudspeakers. Otherwise I do not understand why an assembly of individual instruments in their own acoustic space would be preferred over a coherent space for all instruments. An accurate stereo system will reveal this practice.
B8 - How tall is the auditory scene?
Height information is not intentionally recorded in stereo, but the ear/brain perceptual apparatus can find cues in some recordings from which it forms an impression of the height of the AS. The height is perceived at a greater distance than the loudspeakers and not as above the listener, because he "looks" into the AS in front of him.
B9 - What is the tonal balance and where is the emphasis?
When comparing loudspeakers much attention is paid to the frequency response, its extension and aberrations. For B9, though, the focus is on the realism of the tonal balance of the AS that is evoked by a particular recording. Thus attention is paid to boomy bass, lack of bass, lack of highs, turned up highs, lack of warmth, etc. as being artificial sounding rather than live. Any problem might be due to the recording, the loudspeakers or the room placement. It requires listening to many different recordings to find systematic errors and to sort out the origin of the problem. This is also the playground for loudspeaker designers in order to differentiate their products from the competition. Beware of exciting loudspeakers, they do not wear well over time. the easier it is to describe specific characteristics of a loudspeaker, the less likely the loudspeaker is a neutral transducer from electrical signal in to acoustical signal out. The loudspeaker should just reproduce what is in the recording, not add to it nor subtract from it. Thus, what you hear should still hold surprises and not be tinted by familiar sameness.
B10 - How clear/articulate is the presentation?
This depends strongly on the program material and the non-linear and linear distortions of the loudspeakers. Fortunately there are many recordings that avoid either extreme. They are mostly found in earlier CDs from before 1990 or so. This then leaves the loudspeaker as the source of distortion and primarily in the frequency range above 1 kHz for this test.
B11 - Can you follow a single instrument amongst many?
It should be possible to follow the sound stream of a single instrument in a densely scored piece of music and even during a loud passage by focusing attention upon it. This is akin to the cocktail-party-effect where you have a conversation with one person and overhear the conversation between other people in the distance.
B12 - How sensitive are the phantom image locations to head movement?
The sweet spot should have some comfortable size, for which the AS does not change as you rotate your head, face left or right loudspeakers, move it laterally or back and forth. The spot size is a function of the loudspeakers used and their setup. The program material has little influence. Increasing the listening distance also increases the sweet spot but the AS becomes more diffuse. The reflected sound in the listening room masks imaging detail. This is less pronounced for a directional loudspeaker like ORION than an omnidirectional loudspeaker like PLUTO.
B13 - How sensitive is the auditory scene location to off-axis listening?
The sensitivity is due to the radiation pattern of the
loudspeaker. For ORION and extreme off-axis angles the AS tends to collapse into
the near loudspeaker, while for PLUTO it is as if listening to the AS from an
angle and with the AS still between the loudspeakers. Extreme toe-in of the
ORIONs makes them behave more like PLUTO.
The tests described above make use of pink noise and a variety of recordings to evaluate the evoked Auditory Scene. A sound system is considered accurate if it allows the analysis of such scenes for plausibility in comparison to live acoustic events for which one has some memory.
See also: Spatial distortion in audio