Coding Open Ended Questions

Verbatim coding is used in market research to classify open-end responses for quantitative analysis.

Survey Research is an important branch of Market Research.  Survey research poses questions to a constituency to gain insight into their thoughts and preferences by responses.  Researchers use surveys and the data for many purposes: customer satisfaction, employee satisfaction, purchasing propensity, drug efficacy, and many more.

In market research, you will encounter terms and concepts in data specific to the industry.  This post will get you started on the basics.  When you run into a term or response you don’t understand you can check this resource:

MRA Marketing Research Glossary

Questions and Answers

Every company in the world has the same goal: they want to increase their sales and make a good profit. For most companies, this means they need to make their customers happier — both the customers they have and the customers they want to have.

Companies work toward this goal in many ways, but for our purposes the most important way is to ask questions and plan action based on the responses and data gathered.  By “ask questions” we mean asking a customer or potential customer about what they care about and make actions based on the customer response.

One way to go about this is to literally ask your customers (or potential customers) to answer open ended questions and gather the responses:

Q: What do you think about the new package for our laundry detergent?
A: It is too slippery to open when my hands are wet.

This technique is the basis of survey research.  A company can conduct a survey to get actionable information or responses by asking open ended questions.

In other cases, there may be an implied question and response.  For example, a company may have a help desk for their product.  When a customer calls the help desk there is an implied question:

Q: What problem are you having with our offering?

The answers or response to this implied question can be as valuable (or more!) as answers and responses to survey questions.

Thinking more broadly, the “customer” does not necessarily have to be the person who buys the company’s product or service. For example, if you are the manager of the Human Resources department, your “customers” are the employees of the company. Still, the goal is the same: based on the feedback or response from employees you want to act to improve their satisfaction.

Open, Closed, and Other Specify

There are two basic types of data to gather responses in survey research: open and closed.  We also call these open-end and closed-end questions.

A closed-end question is one where the set of possible responses is known in advance.  These are typically presented to the survey respondent, who chooses among them.  For example:

Open-end questions ask for an “in your own words” response:

The response to this question will be whatever text the user types in her response.

We can also create a hybrid type of question that has a fixed set of possible responses, but lets the user make an answer or response that was not in the list:

We call these Other Specify questions (O/S for short).  If the user types a response to an O/S question it is typically short, often one or two words.

Just as we apply the terms Open, Closed, and O/S to questions we can apply these terms to the answers or responses.  So, we can say Male is a closed response, and The barista was rude is an open response.

What is an answer?

If you are conducting a survey, the meaning of the term answer is clear.  It is the response given by the respondent to the question posed.  But as we have said, we can also get “answers” to implied questions, such as responses to what a customer tells the help desk.  For this reason, we will use the more generic term comment to refer to some text or responses that we want to make an examination for actionable insight.

In most cases comments are electronic text, but they can also be images (handwriting) and voice recording responses.

You need to be aware of some terminology that varies by industry.  In the marketing research industry, a response to a question is called either a response or a verbatim.  So, when reading data in survey research we can call these responses, verbatims, or comments interchangeably.  They are responses to open-end questions.  As we will see later, we don’t call the responses to an open-end question answers.  We will find that these verbatims are effectively turned into answers by the process of verbatim coding.

Outside of survey research the term verbatim is rarely used.  Here the term comment is much more prevalent.  In survey research the word verbatim is used as a noun, meaning the actual text given in response to a question.

Data collection

In the survey research world, verbatims are collected by fielding the survey. Fielding a survey means putting it in front of a set of respondents and asking them to read it and fill it out.

Surveys can be fielded in all sorts of ways. Here are some of the different categories of surveys marketing research companies might be using:

  • Paper surveys
    • Mailed to respondents
    • Distributed in a retail store
    • Given to a customer in a service department
  • In person interviews
    • In kiosks in shopping malls
    • Political exit polling
    • Door to door polling
  • Telephone interviews
    • Outbound calling to households
    • Quality review questions after making an airline reservation
    • Survey by voice robot with either keypad or voice responses
  • Mobile device surveys
    • Using an app that pays rewards for completed surveys
    • In-store surveys during the shopping experience
    • Asking shoppers to photo their favorite items in a store
  • Web surveys
    • By respondents directed to the survey while visiting a site
    • By customers directed to the survey on the sales receipt

There are many more categories. The number of ways to field surveys the ingenious market research industry has come up with is almost endless.

As you can see, the form of the data collected can vary considerably.  It might be:

  • Handwriting on paper
  • Electronic text
  • Voice recording responses
  • Electronic data like telephone keyboard button presses
  • Photographs or other images
  • Video recording responses

And so on.  In the end, all surveys require:

  1. A willing respondent
  2. A way of capturing the responses

The way of capturing the responses is easy.  The first takes us to the area of sample we will consider soon.

Looping and branch logic

Data collection tools can be very sophisticated.  Many data collection tools have logic built in to change the way that the survey is presented to the respondent based on the data or responses given.

Suppose for example you want to get the political opinions of Republican voters.  The first question might make the respondent to provide his political party affiliation.  If he responds with an answer other than “Republican” the survey ends.  The survey has been terminated for the respondent, or the respondent is termed.  This is a simple example of branch logic.  A more sophisticated example would be to direct the respondent to question Q11 if she answers A, or to question Q32 if she answers B.

Another common bit of data collection logic is looping.  Suppose we make our respondents participate in an evaluation of five household cleaning products.  We might have four questions we want to ask the respondents about each product, the same four for each product.  We can set up a loop in our data collection tool.  It loops through the same four questions five times, once for each product.

There are many more logic features of data collection tools, such as randomization of ordering of questions and responses to remove possible bias for the first question or answer presented.

Sample

Sample can be described simply as a set of willing respondents. There is a sizable industry around providing samples to survey researchers. These sample providers organize collections of willing respondents and provide access to these respondents to survey researchers for a fee.

A panel is a set of willing respondents selected by some criteria.  We might have a panel of homeowners, a panel of airline travelers, or a panel of hematologists.  Panelists almost always receive a reward for completing a survey.  Often this is money, which may range from cents to hundreds of dollars.  This reward is a major component of the cost per complete of a survey: the cost to get a completed survey.  The reward for a complete survey is not always cash.  It may be coupons or vouchers for consumer goods, credits for video purchases, or anything that will attract the desired panelists.

Sample providers spend a lot of time maintaining their panels.  The survey researcher wants assurance that the sample she purchases is truly representative of the market segment she is researching.  Sample providers build their reputation on the quality of sample they provide.  They use statistical tools, trial surveys, and other techniques to measure and document the sample quality.

Trackers and Waves

Many surveys are fielded only once, a one-off survey.  Some surveys are fielded repeatedly.  These are commonly used to examine the change in the attitude of the respondents over time.  Researching the change in attitude over time is called longitudinal analysis.  A survey that is fielded repeatedly is called a tracker.  A tracker might be fielded monthly, quarterly, yearly, or at other intervals.  The intervals are normally evenly spaced in time.  Each fielding of a tracker is called a wave.

Verbatim Coding Market Research

In the survey research industry responses to open-end questions are called verbatims.  In a closed-end question the set of possible responses from the respondent is known in advance.  With an open-end question the respondent can say anything.  For example, suppose a company that sells laundry detergent has designed a new bottle for their product.  The company sends a sample to 5,000 households and conducts a survey after the consumers have tried the product.  The survey will probably have some closed-end responses to get a profile of the consumer, but to get an honest assessment of what the consumer thinks of the new package the survey might have an open-end question:

What do you dislike about the new package?

So, what does the survey researcher do with the responses to this question?  Well, she could just read each verbatim.  While that could provide a general understanding of the consumers’ attitudes, it’s really not what the company that is testing the package wants.  The researcher would like to provide more specific and actionable advice to the company.  Things like:

22% of women over 60 thought the screw cap was too slippery.
8% of respondents said the bottle was too wide for their shelves.

This is where verbatim coding, or simply coding, comes in. Codes are answers, just like for closed-end questions. The difference is that the codes are typically created after the survey is conducted and responses are gathered. Coders are people trained in the art of verbatim coding. Coders read the verbatims collected in the survey and invent a set of codes that capture the key points in the verbatims. The set of codes is called a codebook or code frame. For our question the codebook might contain these codes:

  • Screw cap too slippery
  • Bottle too wide
  • Not sufficiently child-proof
  • Tends to drip after pouring

The coders read each verbatim and assign one or more codes to it. Once completed, the researcher can now easily read each one of the coded responses and see what percentage of respondents thought the cap was too slippery. You can see that armed with information from the closed-end responses the researcher could then make the statement:

22% of women over 60 thought the screw cap was too slippery.

Now you can see why the responses to open-end questions are called verbatims, not answers.  The answers are the codes, and the coding process turns verbatims into answers.  Put another way, coding turns qualitative information into quantitative information.

Codes and Nets

Let’s look at a real codebook. The question posed to the respondent is:

In addition to the varieties already offered by this product, are there any other old time Snapple favorites that you would want to see included as new varieties of this product?

And here is the codebook:


  • VARIETY OF FLAVORS
    • like apple variety
    • like peach variety
    • like cherry variety
    • like peach tea variety (unspecified)
    • like peach iced tea variety
    • like raspberry tea variety
    • like lemon iced tea variety
    • other variety of flavors comments
  • HEALTH/ NUTRITION
    • good for dieting/ weight management
    • natural/ not contain artificial ingredients
    • sugar free
    • other health/ nutrition comments
  • MISCELLANEOUS
    • other miscellaneous comments
  • NOTHING
  • DON’T KNOW

Notice that the codebook is not a simple list.  It is indented.  The items in bold type are called nets, and the other items are codes.  Nets are used to organize the codebook.  Here the codebook has two major categories, one for people who responses are that they like specific flavors and the other for people mentioning health or nutrition.

In this example there is only one level of nets, but nets can be nested in other nets.  You can think of it like a document in outline form, where the nets are the headers of the various sections.

Nets cannot be used to code responses.  They are not themselves answers or responses to questions.  They are used to organize the answers (codes).

Downstream data processing

Once the questions in a study are coded they are ready to be used by the downstream data processing department in the survey research company.  This department may be called data processing, or tabulation, or simply tab.  In tab, the results of the survey are prepared for review by the market researcher and then to the end client.

The tab department uses software tools to analyze and organize the results of the study.  These tools include statistical analysis which can be very sophisticated.  Normally, this software is not interested in the text of the code.  For example, if a response is coded “like apple variety” the tab software is not interested in that text but wants a number like 002.  From the tab software point of view the respondent said 002, not “like apple variety”.  The text “like apple variety” is used by the tab software only when it is printing a report for a human to read.  At that time, it will replace 002 with “like apple variety” to make it human readable.  Before the data are sent to the tab department each code must be given a number.  The codebook then looks like this:


  • 001   VARIETY OF FLAVORS
    • 002   like apple variety
    • 003   like peach variety
    • 004   like cherry variety
    • 021   like peach tea variety (unspecified)
    • 022   like peach iced tea variety
    • 023   like raspberry tea variety
    • 024   like lemon iced tea variety
    • 025   other variety of flavors comments
  • 026   HEALTH/ NUTRITION
    • 027   good for dieting/ weight management
    • 028   natural/ not contain artificial ingredients
    • 029   sugar free
    • 030   other health/ nutrition comments
  • 031   MISCELLANEOUS
    • 032   other miscellaneous comments
  • 998   NOTHING
  • 999   DON’T KNOW

The tab department may impose some rules on how codes are numbered.  In this example the code 999 always means “don’t know”.