The Illusion of Measuring What Customers Want

Be suspicious of any method that claims to count (quantify) what goes on in someone’s head

Alan Klement
Jobs to be Done

--

Innovation is wrought with uncertainty. As a result, anything that promises certainty is attractive to designers and innovators. One such promise is that customers’ preferences and desired outcomes can be quantified in a reliable and valid way. Can it be done? The answer from statistical theory and psychology is clear: no.

This article will equip you with knowledge and understanding so you don’t fall into the trap of fooling yourself into quantifying what customers want.

The most important figures that one needs for management are unknown or unknowable, but successful management must nevertheless take account of them.

-W Edwards Deming

Few have changed the world more than Dr. Deming. His knowledge of data and statistical theory helped transform the United States from agricultural backwater into the world’s breadbasket. He was pivotal in reshaping Japan’s economy from post WW2 disaster into to an economic powerhouse. Quantitative data based methodologies much as TQM, QFD, Six Sigma, and Lean can all be traced back to him.

Yet, what made Dr. Deming both a master and pioneer of statistical theory wasn’t only his proficiency in measuring things, it was also his understanding of what couldn’t and shouldn’t be measured. For example, do you think it’s possible to quantify how much you love someone? Or rank the importance of each of your friends?

Not everything that can be counted counts, and not everything that counts can be counted.

-William Bruce Cameron

Unfortunately, few people heed lessons from those like Deming. As a result, we’ve picked up a bad habit as we’ve journeyed from the industrial age through the information age: Quantitative data has been fetishized. As a result, many suggest that “what customers want” can and should be measured.

However, people like Dr. Deming and anyone else with a basic understanding of statistical theory and psychology knows that any measure of customer preference (desired outcomes) will always be invalid and unreliable. You simply cannot count (i.e. quantify) what’s inside of someone’s head. Those who claim otherwise either don’t know any better, or are trying to sell you something.

This article covers the three biggest reasons why what customers want cannot be measured:

  • Putting a number on something, doesn’t make it quantitative
  • Measures of customer’s desired outcomes (preferences) change continually and are easily manipulated
  • Value is non-linear

Equipped with this understanding, you will not only become a better innovator, designer and entrepreneur, you will also increase the likelihood of innovation success.

The simple fact is that all of the time and money and skill poured into consumer research on a new Coca-Cola could not measure or reveal the depth and abiding emotional attachment to original Coca-Cola felt by so many people.

-Former Coca-Cola CEO Donald Keough

Putting a number on something, doesn’t make it quantitative

We see them all the time. In fact, just the other day I saw this while passing through Helsinki airport (figure 1).

Figure 1. A Likert survey links a preference with a number, but this isn’t quantitative.

Surveys such as these and the use of Likert scales form the basis for various innovation and customer satisfaction methodologies. However, there’s something to be known about such surveys and the data they collect. Something those well versed in statistical and number theory know that others don’t:

Those numbers are not quantitative values. Rather, they are categorical descriptions of quality.

It’s the same thing with how we determine the winners of a race. We attach categorical and ordinal (ordered) descriptions to the people who cross the finish line (figure 2). If no one came before me, I came in 1st place. If two people came before me, I came in 3rd place. The fact that these are categorical data is why it’s entirely appropriate to exchange “1” with “Gold” and “3” with “Bronze”.

Figure 2. Ordinal data are frequently used to describe winners of a competition.

The other point to be aware of with categorical and ordinal data is that the distance between them is not part of the data. For example, you may know who came in 1st, 2nd, and 3rd place — but you don’t know the distance in-between them (figure 3). The person who got 1st place may have finished the race in 60 min, while the 2nd and 3rd place winners placed at 90 and 91 min respectively.

Figure 3. Ordinal data tell you the arrangement of data in relation to other points. It doesn’t tell you what distance, if any, exists between them

The fact that these numbers represent neither quantitative measures nor denote any distance between them, is why some surveys skip numbers all together. Instead they use facial expressions as categories (figure 4). This approach is closer to how we actually think about the products we use.

Figure 4. Some researchers understand the limits of quantifying preference. This Likert scale uses faces instead of numbers.

Now, there’s nothing wrong with using numbers to describe customer preference. It’s just necessary to know that these data are qualitative, not quantitative. They are not values themselves, they are descriptions of value. However, too many people either forget this or don’t know. So they end up doing things with them that they shouldn’t.

A perfect example of doing statistics wrong comes from a method developed by Anthony Ulwick of Strategyn called Outcome Driven Innovation. Part of this method includes an “Opportunity Algorithm”.

Importance + (Importance-Satisfaction) = Opportunity

One big error this formula commits is to subtract between two different categories of data. This cannot be done. It’s equivalent to this:

X = percent of people who paid cash - people who ordered salad

X = Kilometers - Gallons

In the Journal of Product Innovation Management, Jeffery Pinegar commented on this formula by saying:

Technically, there are two problems with this formula. First, satisfaction is subtracted from importance; this is like subtracting apples from broccoli.

Ulwick casually dismisses the criticism, saying that it doesn‘t hold up when talking about jobs, outcomes, and constraints (p. 47) , but offers nothing to support his proposition other than anecdotes.

In the paper A Critique of Outcome-Driven Innovation, author Gerry Katz sums it up nicely with a quote from MIT professor John Hauser:

[The Opportunity Algorithm] is pseudo-scientific. It mixes units of measure. No self-respecting engineer would ever do such a thing.

This mistake is also echoed by Statistics in a Nutshell author, Sarah Boslaugh:

Problem What is the argument against analyzing Likert and similar attitude scales as interval data?

Solution There is no natural metric for constructs such as attitudes and opinions. We can devise scales that are ordinal (the responses can be ranked in order of strength of agreement, for instance) to measure such constructs, but it is impossible to determine whether the intervals among points on such scales are equally spaced.

In other words, because there’s no way to create objective, countable units from attitudes and opinions, you cannot subtract or add them. It’s like thinking you can do this:

how much I love my wife (5)+ how much I love my daughter (5)= total love of my family (10)

Measures of customer’s desired outcomes (preferences) change continually and are easily manipulated

Some people may insist still that quantitative measures can be attached to customer’s desired outcomes. Or that since these data are categorical and ordinal, you can at least find the median and mode of a data set. However, even if you go down this route you still must contend with issue #2:

Measures of customer’s desired outcomes (preferences) change continually and are easily manipulated

A good researcher and statistician knows that it’s not enough to just take a measurement of something. You must understand the system that is generating those data. How you collect data from a system is dependent on the type of system it is. For example:

  • Doctors don’t take a patient’s heart rate and blood pressure only once before a surgery. Instead, they constantly measure those metrics during surgery.
  • Workers and on oil rigs constantly measure the status of their drilling.
  • Manufacturers use control charts to measure and improve their production processes.

The people in these environments understand that taking a snapshot of their data won’t work. Why? Because the things they are measuring are always in flux and are susceptible to outside influences.

This is true of customer’s desired outcomes and preferences. Any metrics associated with either are always in flux and susceptible to outside influences.

For example, you’re at a restaurant with some friends. The waitress comes over to you. “I’ll take the steak with mashed potatoes” you say. Then your friend orders, “I’ll have the steak with grilled vegetables”. Upon hearing your friend, you decide to change your order. “Actually, instead of the mashed potatoes I’ll have the grilled vegetables as well.” Even though you knew grilled vegetables were on the menu, you first chose mashed potatoes. But for some reason, after hearing your friend order vegetables, you switched.

This happens all the time. So often in fact, behavioral economists call it a Preference Reversal.

Another example is grocery shopping. Many studies have proven time and again that people who shop while hungry make poor shopping decisions and almost always buy food they don’t need or want. Such phenomena are called Projection Bias and Hot-Cold Empathy Gaps. This happens because what we want today is highly influenced by what we feel at the moment. This makes predicting what we will want in the future extremely unreliable.

A recurring example of this in action is the outrage customers have against Apple from time to time. When Apple removed the floppy drive from their PCs, people called the company crazy. The same is true for removing optical drives — and more recently — the removal of the headphone jack from their iPhone. Customers are immediately indignant about the change, but over time they forget about it and even begin to appreciate the new way.

The fact that we are so terrible at predicting what we will like was outlined in Kahneman’s and Snell’s article Predicting a changing taste: Do people know what they will like? Their conclusion was simply:

People are just not good at guessing how their tastes in particular will change over a period of time.

Moreover, measures of customer preference can vary depending on what options you provide them. This is well known among those involved in pricing products. Rarely do you see just one or two price options presented at once. Often there are low, middle, and high priced options presented together. The idea is that you make the middle-priced option more attractive simply by adding next to it a high priced option. This well documented phenomenon is called Context-Dependent Preferences.

And if that wasn’t tricky enough, data gathered about “what customers want” can change during the survey. As Norbert Schwarz points out in his paper Cognitive Aspects of Survey Methodology:

Since the early days of opinion polls (Cantril, 1944; Payne, 1951), survey researchers observed that minor variations in question wording, format and order can profoundly affect the obtained answers.

This happens because people don’t have a firm opinion on what they do and don’t like. Schwarz continues:

Respondents first need to interpret the question to understand what is meant and to determine which information they ought to provide. If the question is an attitude question, they may either retrieve a previously formed attitude judgment from memory, or they may form a judgment on the spot, based on whatever relevant information is accessible at that point in time. While survey researchers have typically hoped for the former, the latter is far more likely.

And even if you could tap into a customer previously formed judgment, it would be unreliable. As Nobel Prize winner Daniel Kahneman has pointed out:

Kahneman believes the most direct way to evaluate experienced utility is to ask people how they feel at a certain moment, a notion he calls “moment utility.” This is the concept, Kahneman said, Bentham really had in mind. But because researchers are more interested in extended outcomes, more often the question they ask is memory-based: “How was it?” Kahneman said this is a different question that reflects the individual’s global evaluation of an entire episode in the past and it may not be a direct assessment of the individual’s real-time state. This “remembered utility,” said Kahneman, is not a very good guide when predicting outcomes. The “total utility” of a state is derived from the moment-based approach of measuring the real time pleasure or pain experienced by the individual.

This begs a question: when filling out a survey or responding in a interview, is the person retrieving a previously formed option about the outcome you’re asking about (remembered utility), or are they forming a judgment on the spot (moment utility)?

What does all this mean? Well, consumer preference — including the importance and satisfaction of desired outcomes — is always changing and highly malleable. This makes it difficult, and perhaps impossible, to measure them reliably.

Value is non-linear

OK. Suppose you do choose to believe that you can attach a quantitative number to preference. And you believe such a measure would be reliable. You’d still have one hurdle to clear:

Value is non-linear

For hundreds of years, value was believed to be linear. It makes sense to do because it makes the math way easier. An example of linear thinking would be to think that if I double my wealth, I double my happiness. But as we all know, that isn’t even close to being true. Why? Because value is distinctly non-linear. Moreover, gains are calculated differently than losses.

Figure 5. Value as linear (left) vs non-linear (right).

Figure 5 shows value as linear and nonlinear. The right image shows a model known as Cumulative Prospect Theory. It’s a combination of phenomena such as Decision Weights, Diminishing Marginal Utility, and Loss Aversion. This means that to humans, value is non-linear.

Figure 6. Losses and gains are non-linear. Losing $100 can make me feel 2x as bad. While winning $100 makes me feel only about 1.5x better.

This article won’t go into all the reasons why this is true. There are countless books and academic articles written about this topic. What will be pointed out is that anyone who tries to quantify customer’s desired outcomes must account for this phenomena.

I’ll illustrate what’s going on here with an example. Suppose I hand you a ruler where each value between each number is different (figure 7).

Figure 7. A ruler you can’t trust. The distance between each number is inconsistent.

My question is this: how useful is this ruler when measuring something?

When you use something like a Likert scale to measure customer’s preferences, you’re using this unreliable ruler– whether you know it or not. In the customer’s mind, the distance between 1 and 2 is different than between 4 and 5. A visual representation of a Likert scale within the context of measuring customer preference is depicted in figure 8.

Figure 8. Methods that attach numbers to desired outcomes and preferences think they’re getting data on the left, but they’re really getting data closer to the right.

What does this mean? Even if you could assign a quantitative value to customer preference, and if those values didn’t change, you’d still have to account for where on the value spectrum the measurement falls, and then compensate for that.

Discussion and conclusion

It would be amazing if we could quantify customer’s desired outcomes reliably. Design would be simple. We’d just send off a survey to prospective customers, get back the results, and then build what they want. However, this isn’t even remotely the case.

The realization of such facts are why, for example, Facebook offers simple options for rating preference. Facebook offers a “Like” button as well as a collection of faces to express preference (figure 9).

Figure 9. Facebook uses a collection of emoticons to represent preference.

Google and the YouTube also realized the futility of quantifying preference several years ago:

Seems like when it comes to ratings it’s pretty much all or nothing. Great videos prompt action; anything less prompts indifference.

As a result, YouTube moved from offering a 5-star rating system, to a thumbs up / down model. The data told them that people didn’t logically rate their preference for a video. Instead they just rounded it up to a 5, labeled it a 1, or didn’t even care enough to vote (figure 10).

Figure 10. Google’s YouTube used to use a Likert scale to rate customer’s preferences. They learned such data were unreliable.

Lastly, Netflix has just killed off its 5-star rating system, also in favor of a thumbs up/down model. Netflix vice president of product Todd Yellin commented:

Five stars feels very yesterday now. We’re spending many billions of dollars on the titles we’re producing and licensing, and with these big catalogs, that just adds a challenge. Bubbling up the stuff people actually want to watch is super important.

So, what should we do about quantifying what customers want? The answer is simple: don’t even try. Customers are humans, not robots. You can’t take measurements from them and build a product as if you were building kitchen cabinets.

What you can do, and what I have done for my own products, is to model customer behavior using qualitative data, and then use quantitative data to verify and adjust those models. And if you do use a self-report rating system, do what Google and Facebook did: either offer a up/down option or offer responses as emotions.

Market research and customer surveys can become proxies for customers — something that’s especially dangerous when you’re inventing and designing products. “Fifty-five percent of beta testers report being satisfied with this feature. That is up from 47% in the first survey.” That’s hard to interpret and could unintentionally mislead.

Good inventors and designers deeply understand their customer. They spend tremendous energy developing that intuition. They study and understand many anecdotes rather than only the averages you’ll find on surveys. They live with the design.

-Jeff Bezos’ shareholder letter from 2017

Learn more

In October 2016, I released the first book dedicated to Customer Jobs theory. Get a deeper understanding of what a customer Job to be Done is from my book When Coffee and Kale Compete.

Learn more about JTBD in When Coffee and Kale Compete

You can download it as a free PDF, or buy it in paperback & kindle right here. You can also read it online here.

If you have more questions about Jobs to be Done, or want help applying JTBD concepts to your business or startup, contact me.

[1] The phrase “desired outcome” is used to mean many different things (example, example). This article uses it to mean the result of an action, activity, or event. Similar to how the seminal paper Reversals of preference between bids and choices in gambling decisions described the desirably, or preference, for one type of bet over another.

Updated December 11 2017

  • Added footnote to explain what is meant by “desired outcome”
  • An NPS example has been added
  • An example of the problems with top-two-box proportion were added
  • A previous version of this draft did not acknowledge that Ulwick’s Opportunity Algorithm used the top-two-box proportion method and that populations of satisfaction and importance were being subtracted. Critiques of this method by Jeff Sauro and Jerry W. Thomas were also added
  • Added Jeffery Pinegar’s comment about Ulwick’s defense of the opportunity algorithm

References

A Critique of Outcome-Driven Innovation by Gerry Katz, Executive Vice President, Applied Marketing Science, Inc.

Pinegar, J. S. (2006). What Customers Want: Using Outcome‐Driven Innovation to Create Breakthrough Products and Services by Anthony W. Ulwick. Journal of Product Innovation Management, 23(5), 464–466.

Deming’s quotes are from his book, The New Economics for Industry, Government, Education, Second Edition.

Info on Tony Ulwick’s Outcome Driven Innovation and the Opportunity Algorithm can be found from his book, What Customers Want.

Ulwick’s claim that importance and satisfaction are dimensionless quantities can be found here.

Find the quote from CEO of Coca-Cola in the article 30 years ago today, Coca-Cola made its worst mistake, here.

Learn about Netflix’s switch away from a Likert-scale here.

Learn about Facebook’s reactions here.

YouTube’s reason for switching away from Likert scales is found here.

I. Elaine Allen and Christopher A. Seaman wrote a simple overview of Likert scales and what they can, and can’t do, can be found here.

More good info on what Likert scales are, and are not, is found here and here.

Read Jeff Bezos’ shareholder letter here.

Boslaugh, S. (2012). Statistics in a nutshell: A desktop quick reference. “ O’Reilly Media, Inc.”

Hsee, Christopher K., and Yuval Rottenstreich. “Music, pandas, and muggers: on the affective psychology of value.” Journal of Experimental Psychology: General 133.1 (2004): 23.

Jamieson, Susan. “Likert scales: how to (ab) use them.” Medical education 38.12 (2004): 1217–1218.

Kahneman, Daniel, and Jackie Snell. “Predicting a changing taste: Do people know what they will like?.” Journal of Behavioral Decision Making 5.3 (1992): 187–200.

Kahneman, Daniel, and Richard H. Thaler. “Anomalies: Utility maximization and experienced utility.” The Journal of Economic Perspectives 20.1 (2006): 221–234.

Lichtenstein, S., & Slovic, P. (1971). Reversals of preference between bids and choices in gambling decisions. Journal of experimental psychology, 89(1), 46.

Loewenstein, George. “Hot-cold empathy gaps and medical decision making.” Health Psychology 24.4S (2005): S49.

Loewenstein, George, Ted O’Donoghue, and Matthew Rabin. “Projection bias in predicting future utility.” The Quarterly Journal of Economics 118.4 (2003): 1209–1248.

Riquelme, Hernan. “Do consumers know what they want?.” Journal of consumer marketing 18.5 (2001): 437–448.

Schwarz, Norbert. “Cognitive aspects of survey methodology.” (2007).

Stevens, Stanley Smith. “On the theory of scales of measurement.” (1946): 677–680.

Tversky, Amos. “Intransitivity of Preferences.” Preference, Belief, and Similarity (1969): 433.

Tversky, Amos, Paul Slovic, and Daniel Kahneman. “The causes of preference reversal.” The American Economic Review (1990): 204–217.

Tversky, Amos, and Itamar Simonson. “Context-dependent preferences.” Management science 39.10 (1993): 1179–1189.

Tversky, Amos, and Daniel Kahneman. “Advances in prospect theory: Cumulative representation of uncertainty.” Journal of Risk and uncertainty 5.4 (1992): 297–323.

--

--