Coding an open ended Question
Verbatim coding is used in survey research to classify open-end responses for quantitative analysis.
Survey Research is an important branch of Market Research. Survey research poses questions to a constituency to gain insight into their thoughts and preferences. Researchers use surveys for many purposes: customer satisfaction, employee satisfaction, purchasing propensity, drug efficacy, and many more.
In survey research you will encounter terms and concepts specific to the industry. This post will get you started on the basics. When you run into a term you don’t understand you can check this resource:
Questions and Answers
Every company in the world has the same goal: they want to increase their sales and profits. For most companies, this means they must make their customers happier — both the customers they have and the customers they want to have.
Companies work toward this goal in many ways, but for our purposes the most important way is to ask questions and plan action based on the responses. By “ask questions” we mean asking a customer or potential customer about what they care about and acting based on the customer response.
One way to go about this is to literally ask your customers (or potential customers) to answer open ended questions:
Q: What do you think about the new package for our laundry detergent?
A: It is too slippery to open when my hands are wet.
This technique is the basis of survey research. A company can conduct a survey to get actionable information by asking open ended questions.
In other cases, there may be an implied question. For example, a company may have a help desk for their product. When a customer calls the help desk there is an implied question:
Q: What problem are you having with our offering?
The answers to this implied question can be as valuable (or more!) as answers to survey questions.
Thinking more broadly, the “customer” does not necessarily have to be the person who buys the company’s product or service. If you are the manager of the Human Resources department, your “customers” are the employees of the company. Still the goal is the same: based on the feedback from employees you want to act to improve their satisfaction.
Open, Closed, and Other Specify
There are two basic types of questions in survey research: open and closed. We also call these open-end and closed-end questions.
A closed-end question is one where the set of possible answers is known in advance. These are typically presented to the survey respondent, who chooses among them. For example:
Open-end questions ask for an “in your own words” response:
The response to this question will be whatever text the user types in her response.
We can also create a hybrid type of question that has a fixed set of possible answers, but lets the user enter an answer that was not in the list:
We call these Other Specify questions (O/S for short). If the user types a response to an O/S question it is typically short, often one or two words.
Just as we apply the terms Open, Closed, and O/S to questions we can apply these terms to the answers. So, we can say Male is a closed answer, and The barista was rude is an open answer.
What is an answer?
If you are conducting a survey, the meaning of the term answer is clear. It is the response given by the respondent to the question posed. But as we have said, we can also get “answers” to implied questions, such as what a customer tells the help desk. For this reason, we will use the more generic term comment to refer to some text that we want to examine for actionable insight.
In most cases comments are electronic text, but they can also be images (handwriting) and voice recordings.
You need to be aware of some terminology that varies by industry. In the survey research industry, a response to a question is called either a response or a verbatim. So, in survey research we can call these responses, verbatims, or comments interchangeably. They are responses to open-end questions. As we will see later, we don’t call the responses to an open-end question answers. We will find that these verbatims are effectively turned into answers by the process of verbatim coding.
Outside of survey research the term verbatim is rarely used. Here the term comment is much more prevalent. In survey research the word verbatim is used as a noun, meaning the actual text given in response to a question.
In the survey research world responses are collected by fielding the survey. Fielding a survey means putting it in front of a set of respondents and asking them to fill it out.
Surveys can be fielded in all sorts of ways. Here are some examples:
- Paper surveys
- Mailed to respondents
- Distributed in a retail store
- Given to a customer in a service department
- In person interviews
- In kiosks in shopping malls
- Political exit polling
- Door to door polling
- Telephone interviews
- Outbound calling to households
- Quality review questions after making an airline reservation
- Survey by voice robot with either keypad or voice responses
- Mobile device surveys
- Using an app that pays rewards for completed surveys
- In-store surveys during the shopping experience
- Asking shoppers to photo their favorite items in a store
- Web surveys
- By respondents directed to the survey while visiting a site
- By customers directed to the survey on the sales receipt
There are many more. The number of ways to field surveys the ingenious market research industry has come up with is almost endless.
As you can see, the form of the data collected can vary considerably. It might be:
- Handwriting on paper
- Electronic text
- Voice recordings
- Electronic data like telephone keyboard button presses
- Photographs or other images
- Video recordings
And so on. In the end, all surveys require:
- A willing respondent
- A way of capturing the responses
The second one is easy. The first takes us to the area of sample we will consider soon.
Looping and branch logic
Data collection tools can be very sophisticated. Many data collection tools have logic built in to change the way that the survey is presented to the respondent based on the answers given.
Suppose for example you want to get the political opinions of Republican voters. The first question might ask the respondent to provide his political party affiliation. If he responds with an answer other than “Republican” the survey ends. The survey has been terminated for the respondent, or the respondent is termed. This is a simple example of branch logic. A more sophisticated example would be to direct the respondent to question Q11 if she answers A, or to question Q32 if she answers B.
Another common bit of data collection logic is looping. Suppose our respondents have participated in an evaluation of five household cleaning products. We might have four questions we want to ask the respondents about each product, the same four for each product. We can set up a loop in our data collection tool. It loops through the same four questions five times, once for each product.
There are many more logic features of data collection tools, such as randomization of ordering of questions and answers to remove possible bias for the first question or answer presented.
Sample can be described simply as a set of willing respondents. There is a sizable industry around providing sample to survey researchers. These sample providers organize collections of willing respondents and providing access to these respondents to survey researchers for a fee.
A panel is a set of willing respondents selected by some criteria. We might have a panel of homeowners, a panel of airline travelers, or a panel of hematologists. Panelists almost always receive a reward for completing a survey. Often this is money, which may range from cents to hundreds of dollars. This reward is a major component of the cost per complete of a survey: the cost to get a completed survey. The reward for a complete survey is not always cash. It may be coupons or vouchers for consumer goods, credits for video purchases, or anything that will attract the desired panelists.
Sample providers spend a lot of time maintaining their panels. The survey researcher wants assurance that the sample she purchases is truly representative of the market segment she is researching. Sample providers build their reputation on the quality of sample they provide. They use statistical tools, trial surveys, and other techniques to measure and document the sample quality.
Trackers and Waves
Many surveys are fielded only once, a one-off survey. Some surveys are fielded repeatedly. These are commonly used to examine the change in the attitude of the respondents over time. Researching the change in attitude over time is called longitudinal analysis. A survey that is fielded repeatedly is called a tracker. A tracker might be fielded monthly, quarterly, yearly, or at other intervals. The intervals are normally evenly spaced in time. Each fielding of a tracker is called a wave.
Verbatim Coding Market Research
In the survey research industry responses to open-end questions are called verbatims. In a closed-end question the set of possible answers from the respondent is known in advance. With an open-end question the respondent can say anything. For example, suppose a company that sells laundry detergent has designed a new bottle for their product. The company sends a sample to 5,000 households and conducts a survey after the consumers have tried the product. The survey will probably have some closed-end questions to get a profile of the consumer, but to get an honest assessment of what the consumer thinks of the new package the survey might have an open-end question:
What do you dislike about the new package?
So, what does the survey researcher do with the responses to this question? Well, she could just read each verbatim. While that could provide a general understanding of the consumers’ attitudes, it’s really not what the company that is testing the package wants. The researcher would like to provide more specific and actionable advice to the company. Things like:
22% of women over 60 thought the screw cap was too slippery.
8% of respondents said the bottle was too wide for their shelves.
This is where verbatim coding, or simply coding, comes in. Codes are answers, just like for closed-end questions. The difference is that the codes are typically created after the survey is conducted. Coders are people trained in the art of verbatim coding. Coders look at the responses collected in the survey and invent a set of codes that capture the key points in the verbatims. The set of codes are called a codebook or codeframe. For our question the codebook might contain these codes:
- Screw cap too slippery
- Bottle too wide
- Not sufficiently child-proof
- Tends to drip after pouring
The coders read each verbatim and assign one or more codes to it. Once completed the researcher can now easily look at the coded responses and see what percentage of respondents thought the cap was too slippery. You can see that armed with information from the closed-end questions the researcher could then make the statement:
22% of women over 60 thought the screw cap was too slippery.
Now you can see why the responses to open-end questions are called verbatims, not answers. The answers are the codes, and the coding process turns verbatims into answers. Put another way, coding turns qualitative information into quantitative information.
Codes and Nets
Let’s look at a real codebook. The question posed to the respondent is:
In addition to the varieties already offered by this product, are there any other old time Snapple favorites that you would want to see included as new varieties of this product?
And here is the codebook:
- VARIETY OF FLAVORS
- like apple variety
- like peach variety
- like cherry variety
- like peach tea variety (unspecified)
- like peach iced tea variety
- like raspberry tea variety
- like lemon iced tea variety
- other variety of flavors comments
- HEALTH/ NUTRITION
- good for dieting/ weight management
- natural/ not contain artificial ingredients
- sugar free
- other health/ nutrition comments
- other miscellaneous comments
- DON’T KNOW
- VARIETY OF FLAVORS
Notice that the codebook is not a simple list. It is indented. The items in bold type are called nets, and the other items are codes. Nets are used to organize the codebook. Here the codebook has two major categories, one for people who mention that they like specific flavors and the other for people mentioning health or nutrition.
In this example there is only one level of nets, but nets can be nested in other nets. You can think of it like a document in outline form, where the nets are the headers of the various sections.
Nets cannot be used to code responses. They are not themselves answers to questions. They are used to organize the answers (codes).
Downstream data processing
Once the questions in a study are coded they are ready to be used by the downstream data processing department in the survey research company. This department may be called data processing, or tabulation, or simply tab. In tab, the results of the survey are prepared for review by the market researcher and then to the end client.
The tab department uses software tools to analyze and organize the results of the study. These tools include statistical analysis which can be very sophisticated. Normally, this software is not interested in the text of the code. For example, if a response is coded “like apple variety” the tab software is not interested in that text but wants a number like 002. From the tab software point of view the respondent said 002, not “like apple variety”. The text “like apple variety” is used by the tab software only when it is printing a report for a human to read. At that time, it will replace 002 with “like apple variety”. Before the data are sent to the tab department each code must be given a number. The codebook then looks like this:
- 001 VARIETY OF FLAVORS
- 002 like apple variety
- 003 like peach variety
- 004 like cherry variety
- 021 like peach tea variety (unspecified)
- 022 like peach iced tea variety
- 023 like raspberry tea variety
- 024 like lemon iced tea variety
- 025 other variety of flavors comments
- 026 HEALTH/ NUTRITION
- 027 good for dieting/ weight management
- 028 natural/ not contain artificial ingredients
- 029 sugar free
- 030 other health/ nutrition comments
- 031 MISCELLANEOUS
- 032 other miscellaneous comments
- 998 NOTHING
- 999 DON’T KNOW
The tab department may impose some rules on how codes are numbered. In this example the code 999 always means “don’t know”.