Questionnaire Design and Testing

Joe Ripberger

Questionnaire Design (Wording)

Survey Question Commandments

Thou shalt not write a question that thou thyself cannot answer.
- Write only questions you could confidently and accurately answer yourself.
Thou shalt ask no question for which thou knowest not the purpose of the answer.
- Include only questions whose answers will serve a defined analytic or decision-making purpose.

Nonsensitive Behavioral Questions (Groves et al. 2009)

With closed questions, include all reasonable possibilities as explicit response options.
Make the questions as specific as possible.
Use words that virtually all respondents will understand.
Lengthen the questions by adding memory cues to improve recall.
When forgetting is likely, use aided recall.
When the events of interest are frequent but not very involving, have respondents keep a diary.
When long recall periods must be used, use a life event calendar to improve reporting.
To reduce telescoping errors, ask respondents to use household records or use bounded recall (or do both).
If cost is a factor, consider whether proxies might be able to provide accurate information.

Sensitive Behavioral Questions (Groves et al. 2009)

Use open rather than closed questions for eliciting the frequency of sensitive behaviors.
Use long rather than short questions.
Use familiar words in describing sensitive behaviors.
Deliberately load the question to reduce misreporting.
Ask about long periods (such as one’s entire lifetime) or periods from the distant past first in asking about sensitive behaviors.
Embed the sensitive question among other sensitive items to make it stand out less.
Use self-administration or some similar method to improve reporting.
Consider collecting the data in a diary.
At the end of the questionnaire, include some items to assess how sensitive the key behavioral questions were.
Collect validation data.

Attitude Questions (Groves et al. 2009)

Specify the attitude object clearly.
Avoid double-barreled questions.
Measure the strength of the attitude, if necessary using separate items for this purpose.
Use bipolar items except when they might miss key information.
The alternatives mentioned in the question have a big impact on the answers; carefully consider which alternatives to include.
In measuring change over time, ask the same questions each time.
When asking general and specific questions about a topic, ask the general question first.
When asking questions about multiple items, start with the least popular one.
Use closed questions for measuring attitudes.
Use five-to seven-point response scales and label every scale point.
Start with the end of the scale that is the least popular.
Use analogue devices (such as thermometers) to collect more detailed scale information.
Use ranking only if the respondents can see all the alternatives; otherwise, use paired comparisons.
Get ratings for every item of interest; do not use check-all-that-apply items.

Conventional Wisdom (Krosnick and Presser 2010)

Use simple, familiar words (avoid technical terms, jargon, and slang.
Use simple syntax.
Avoid words with ambiguous meanings, i.e., aim for wording that all respondents will interpret in the same way.
Strive for wording that is specific and concrete (as opposed to general and abstract).
Make response options exhaustive and mutually exclusive.
Avoid leading or loaded questions that push respondents toward an answer.
Ask about one thing at a time (avoid double-barreled questions).
Avoid questions with single or double negations.

Question Order (Krosnick and Presser 2010)

Early questions should be easy and pleasant to answer, and should build rapport between the respondent and the researcher.
Questions at the very beginning of a questionnaire should explicitly address the topic of the survey, as it was described to the respondent prior to the interview.
Questions on the same topic should be grouped together.
Questions on the same topic should proceed from general to specific.
Questions on sensitive topics that might make respondents uncomfortable should be placed at the end of the questionnaire.
Filter questions should be included, to avoid asking respondents questions that do not apply to them.

Open-Ended Questions (Dillman et al. 2014)

Specify the type of response desired in the question stem.
Avoid making respondents (or interviewers) calculate sums; when possible, have the computer do it.
Provide extra motivation to respond.
Use nondirective probes to obtain more information on open-ended items.

All Closed-Ended Questions (Dillman et al. 2014)

When asking either/or types of questions, state both the positive and negative side in the question stem.
Develop lists of answer categories that include all reasonable possible answers.
Develop lists of answer categories that are mutually exclusive.
Consider what types of answer spaces are most appropriate for the measurement intent.

Nominal Closed-Ended Questions (Dillman et al. 2014)

Ask respondents to rank only a few items at once rather than a long list.
Avoid bias from unequal comparisons.
Randomize response options if there is concern about order effects.
Use forced-choice questions instead of check-all-that-apply questions.

Ordinal Closed-Ended Questions (Dillman et al. 2014)

Choose between a unipolar or a bipolar scale.
Choose an appropriate scale length—in general, limit scales to four or five categories.
Choose direct or construct-specific labels to improve cognition.
If there is a natural metric (e.g., frequencies, amounts, sizes, etc.), use it instead of vague quantifiers.
Provide balanced scales where categories are relatively equal distances apart conceptually.
Verbally label all categories.
Remove numeric labels from vague quantifier scales whenever possible.
Consider branching (or decomposing) bipolar scales to ease respondent burden and improve data quality.
Provide scales that approximate the actual distribution of the characteristic in the population, or ask the question in an open-ended format to avoid biasing responses.

Summary Points (Schaeffer and Dykema 2020)

Given the continuing confirmations that verbal labels increase reliability and the need for consistency across modes and presentations, five categories for unipolar and possibly bipolar scales is both justifiable and practical at this time.
Studies that scale adverbs and phrases support the categories “not at all, slightly/a little, somewhat/moderately, very, extremely” for unipolar intensity questions; “never, rarely, sometimes, very often, extremely often/always” for relative frequency questions; and “none, a little/a little bit, some/somewhat, quite a bit, a great deal” for quantity-based questions. The slashes indicate alternatives with similar scale values.
For measures of subjective concepts, the current recommendation to use item-specific response formats rather than agree-disagree response formats is based on strong theoretical reasons, the ease with which respondents understand the questions, and available evidence on data quality.
Yes–no checklists in which items are presented as a grid (or battery) that requests a “yes” or “no” answer for each item are recommended over CATA formats.
Questions should be written so that the format for the response projected by the question matches.

Questionnaire Design (Design)

Self-Administered Questions (Groves et al. 2009)

Use visual elements in a consistent way to define the desired path through the questionnaire.
When the questionnaire must change its conventions part way through, prominent visual guides should alert respondents to the switch.
Place directions where they are to be used and where they can be seen.
Present information that needs to be used together in the same location.
Ask one question at a time.

General Guidelines (Dillman et al. 2014)

Use darker and/or larger print for the question stem and lighter and/or smaller print for answer choices and answer spaces.
Use spacing to help create subgrouping within a question.
Visually standardize all answer spaces or response options.
Use visual design properties to emphasize elements that are important to the respondent and to deemphasize those that are not.
Choose font, font size, and line length to ensure the legibility of the text.
Integrate special instructions into the question where they will be used, rather than including them as free-standing entities.
Separate optional or occasionally needed instructions from the question stem by font or symbol variation.

Open-Ended Questions (Dillman et al. 2014)

Provide a single answer box if only one answer is needed and multiple answer boxes if multiple answers are needed.
Provide answer spaces that are sized appropriately for the response task.
To encourage the use of proper units or a desired response format, provide labels and templates with answer spaces.

Closed-Ended Questions (Dillman et al. 2014)

Align response options vertically in one column or horizontally in one row, and provide equal distance between categories.
Place nonsubstantive options after and separate from substantive options.
Consider using differently shaped answer spaces (circles and squares) to help respondents distinguish between single- and multiple-answer questions.

Pages or Screens (Dillman et al. 2014)

Establish grouping and subgrouping within and across questions in the questionnaire.
Establish consistency in the visual presentation of questions, and use alignment and vertical spacing to help respondents organize the information on the page.
Use color and contrast to help respondents recognize the components of the questions and the navigational path through the questionnaire.
Visually group related information in regions through the use of contrast and enclosure.
Consistently identify the beginning of each question and/or section.
Use visual elements and properties consistently across questions and pages/screens to visually emphasize or deemphasize certain types of information.
Avoid visual clutter.
Avoid placing questions side by side on a page so that respondents are not asked to answer two questions at once.
Minimize the use of matrices and grids, and when they cannot be avoided, minimize their complexity.

Questionnaire Testing (Evaluation)

The objective of questionnaire testing is to ensure that survey questions measure what they are intended to measure, that respondents understand and can answer them as intended, and that they can be administered effectively under real field conditions.
Common methods include:
- Expert reviews
- Focus groups
- Cognitive testing (cognitive interviews)
- Field pretests
- Split-ballot experiments

Expert Reviews

A structured review of draft questions by subject-matter or questionnaire design experts
Used to assess whether the questions:
- Accurately measure the intended concepts
- Are appropriate for the target population
- Meet standards for clarity, neutrality, and completeness
Usually conducted early in questionnaire development
Helps identify conceptual gaps, poor wording, or leading phrasing before field testing

Focus Groups

Semi-structured discussions with members of the target population
Used to understand how respondents:
- Think and talk about key issues
- Interpret terms and concepts
- Frame their opinions or experiences in natural language
Conducted before finalizing question wording
Informs both content validity and the vocabulary used in survey items

Cognitive Testing (Cognitive Interviews)

One-on-one interviews where respondents answer draft survey questions while explaining their thought process
Used to understand how respondents:
- Interpret the question
- Retrieve relevant information from memory
- Formulate and report an answer
Reveals cognitive difficulties, ambiguous wording, and potential response errors
Provides direct insight into how respondents actually process the question

Field Pretests

Small-scale version of the survey, usually fewer than 100 interviews, conducted under realistic field conditions
Used to test:
- Interviewer performance and instructions
- Sampling and administration procedures
- Question comprehension and flow
Data may be tabulated to identify problematic items
Interviewers may be debriefed to discuss respondent difficulties
Recordings can be behavior-coded to quantify comprehension problems

Split-Ballot Experiments

Experimental design where different subgroups of the pretest sample receive alternative versions of the same question
Used to estimate how different wordings or formats affect responses
Allows researchers to identify measurement bias or context effects
Provides empirical evidence about which question version performs best
Often used after pretesting to confirm the most valid and reliable item