What We Can and Can't Tell from Measurements of Headphones

January 2016

I’ve been noticing that the proliferation online of information about headphones has inspired some enthusiasts to judge headphones by their frequency-response measurements alone. One anonymous commenter went so far as to proclaim that the HiFiMan HE400S headphones -- a model praised by many reviewers as the best you can buy for $300 -- are “terrible.” His or her evidence? That someone on an enthusiast forum had measured the HE400S and the measurements didn’t look good, even though the highly experienced headphone expert Tyll Hertsens had published measurements on the website Inner|Fidelity that conflicted with the ones from the forum.

Brent Butterworth

Some of the blame for this might fall on me and my colleagues at the SoundStage! Network. For years, we’ve been writing about how important speaker measurements are because they so closely correlate with listener impressions. Decades of research at Canada’s National Research Council, in Ottawa, and, later, at Harman International, in California, have shown that speakers with flat measured response on axis, and gradually rolled-off response off axis, will sound good to most listeners in an average room. Measurements provide a useful “second opinion” in reviews: If a reviewer trashes a speaker that measures great, or raves about a speaker that measures poorly, you might want to gather more opinions before you come to final judgment.

I’m not yet so confident about headphone measurements.

As someone who owns headphone-measuring equipment (no small investment, at about $5000 for an ear/cheek simulator plus the cost of an audio analyzer), has measured more than 200 headphones, and has been able to compare the subjective results of many experienced listeners with those measurements, I’m reluctant to judge a headphone on the basis of its frequency-response measurements alone.

Unlike speakers, headphones haven’t received much attention from researchers over the last few decades. The existing headphone-measuring gear and procedures were primarily developed with hearing aids, not headphones, in mind. As a result, the measurements have little validity above 8kHz. Someday, this may change. Researchers at Harman and elsewhere have devoted more attention to headphones. G.R.A.S. Sound and Vibration, the company that makes the Model 43AG ear/cheek simulator that I and many other technicians use, recently demonstrated a revised measurement system that seems to improve on the existing equipment, potentially allowing more consistent and meaningful measurements. But we’re still a ways off from the time when we can judge headphones by their measurements.

It’s easy to demonstrate this. The chart below shows the measured frequency response of four headphones. Three are high-end models that have received overwhelmingly positive reviews. The fourth are headphones marketed to late teens, are readily available for under $60, and, to my knowledge, have received no positive reviews from professional reviewers. Can you tell from the chart which represents the “bad” headphone model?

Four headphone measurements

A couple of factors complicate matters here. First, the practices of headphone measurement vary. Unless you know what test gear and procedures the technician is using and are familiar with their characteristics, you can’t come to many useful conclusions about how the headphones measured with that gear and procedures might sound. The makers of headphone-measuring gear provide a calibration curve that compensates for the slight fluctuations from unit to unit, but most technicians don’t seem to use them. There are compensation curves that simulate the response at the ear reference point (ERP: roughly the point in space where, when you press your hand against your ear, your palm meets the center axis of your ear canal) or ear entrance point (EEP: the point where the central axis of the ear canal intersects with the entrance to the canal) -- as opposed to the drum reference point (DRP: the center of the eardrum). Some technicians use a diffuse-field compensation curve, which compares the headphone’s measurement to a response curve thought to simulate the sound of real speakers in a typical listening room.

While most technicians use industry-standard measuring gear from G.R.A.S. or Brüel & Kjær, I’ve seen some enthusiasts -- and two headphone manufacturers -- do their measurements with a standard measuring microphone installed in the center of a flat metal plate.

Another factor: Judging from my experience with listening panels, and from the wide disparities we sometimes see in user comments and professional reviews, subjective reactions to headphones vary more from listener to listener than they do with speakers. One of the explanations for this phenomenon is that fit is so critical to headphone sound: Even a slight leak in the acoustical seal of the headphones around or in the ear can completely change a listener’s perception of those headphones’ sound. The sizes of human ears and ear canals vary considerably, as do the shapes and sizes of human heads. Whether or not the listener wears glasses, and if so, what sort of glasses are worn, can also make a big difference in fit. Any of these factors might change a listener’s impression of a given pair of headphones from “great” to “terrible,” or vice versa.

Another issue is that the shape of the ear canal varies greatly from person to person. This is one reason there’s no standardization of headphone measurements above 8kHz -- creating an acoustical model of a “standard human ear canal” is about as vexing as trying to design a single sweater that will fit any dog. It’s also why rave reviews of earphones -- which are especially susceptible to variances in ear-canal shape -- are often accompanied by hostile user comments insisting that the product sounds as if the music is coming out of a tin can, or a speaker under a pillow.

I do believe that headphone measurements are interesting, which is why I do them. It’s particularly useful to see how the frequency response of the headphones tested compares to the responses of competing models. Recently, I did some research (to be expanded, if and when time permits) that shows that the isolation measurements I’ve published closely correlate with user perceptions. And measurements that show sharp swings in impedance definitely indicate that the headphones’ sound is likely to vary noticeably with the output impedance of headphone amplifiers -- a specification that itself isn’t standardized, and can vary from less than 1 ohm to more than 100 ohms.

I know, I know: You want to know the identities of the headphones in the chart. By color of trace, they are:

Green: Audeze LCD-X ($1699)
Red: Bowers & Wilkins  P7 ($399)
Orange: Sennheiser Momentum over-ear ($379, readily available for much less)
Blue: Skullcandy Hesh 2 ($69)

You might conclude from these measurements that the Hesh 2s have a thin sound, because they have the least measured bass output. But if you look at the other side of the chart, you can see that the Hesh 2s have less of the typical response peak between 2 and 3kHz than do the other models. Even though the Hesh 2s output much more treble energy than the Audeze LCD-Xes above 3.5kHz, the Skullcandys sound subjectively dull because they don’t have so much of that lower-treble peak. How much energy would the Hesh 2s need between 2 and 3kHz to perfectly counterbalance their measured bass response, and to allow me to judge them as “good” headphones based on their measurements alone? I don’t know. Anonymous Internet commenters might think they know, but they’re just guessing -- and when their guesses are wrong, they suffer no shame or repercussions.

Complicated, huh? Now you see why I don’t judge headphones entirely by their frequency-response measurements.

. . . Brent Butterworth