Sampling II: Nonprobability Samples

Joe Ripberger

Probability vs. Nonprobability Sampling

Probability Sampling

  • Frame coverage: A list that approximates the target population
  • Selection: Units drawn with a known, nonzero probability
  • Inference: Design-based; sampling error can be quantified
  • Bias control: Achieved through randomization; adjust for nonresponse

Nonprobability Sampling

  • Frame coverage: No complete list; relies on opt-in or convenience pools
  • Selection: Probability of inclusion is unknown or zero for some units
  • Inference: Model-based; validity rests on assumptions about the model
  • Bias control: Post-stratification or calibration to reduce nonrepresentation

Probability vs. Nonprobability Sampling

  • Total Survey Error (TSE)
    • Probability sampling: errors are quantifiable because inclusion probabilities are known
    • Nonprobability sampling: errors are not quantifiable; validity rests on modeling assumptions
  • Total Survey Quality (TSQ)
    • Probability sampling: quality judged by accuracy of estimates
    • Nonprobability sampling: quality judged by fitness for use (cost, speed, analytic value)

Common Types of Nonprobability Samples

  • Convenience (Availability): select units that are easiest to reach
  • Opt-in / Volunteer: participants self-select (e.g., open web polls)
  • Purposive (Judgment): researcher chooses “typical” or key cases
  • Snowball (Chain-referral): participants recruit others in their networks
  • Opt-in Online Panels: large pools of self-selected respondents used for repeated surveys

Common Types of Nonprobability Samples

Most academic surveys rely on opt-in online panels

How Opt-in Online Panels Work

  1. Recruitment: vendors recruit panelists who opt in through ads, email lists, or referrals
  2. Panel maintenance: vendors keep databases of participants, but often supplement with river/intercept samples
  3. Sample marketplace: to meet demand, many vendors, sell, or exchange participants across firms; researchers may also blend vendors
  4. Survey fielding: vendors invite selected panelists to a study (quota controls often used to approximate population distributions)
  5. Participation: panelists complete the survey online, usually for incentives (points, gift cards, cash)
  6. Adjustment: vendors or researchers apply postsurvey weighting (e.g., raking) to reduce bias

Inference from Opt-in Online Panels

  • Descriptive inference
    • Online opt-in panels often produce biased population estimates compared to gold-standard probability surveys
    • Example: an online panel might overestimate overall support for climate policy compared to benchmarks
  • Multivariate inference
    • Associations between variables are usually more stable than simple descriptive estimates, but still vulnerable to hidden biases
    • Example: an online panel might overestimate overall support for climate policy, but still correctly capture the correlation between ideology and support
  • Causal inference (survey experiments)
    • Random assignment ensures unbiased estimates within the sample, so experiments often generalize better across sample types
    • Example: a survey experiment testing the effect of different climate policy messages may replicate treatment effects across both online panels and probability samples

Inference from Opt-in Online Panels

Nonprobability samples are generally better at capturing relationships than producing accurate point estimates

Describers vs. Modelers

Problematic Participants Nonprobability Samples

  • Professional participants: people who participate authentically in a large number of online studies as a way to earn money
  • Incapable participants: people who do not have the cognitive or linguistic skills needed to do the study
  • Inattentive participants: people whose ability to provide valid responses is compromised by their lack of attention to the survey
  • Inauthentic participants: people who are ineligible to be doing the survey (e.g., scammers) or not human at all (e.g., bots)

Methods of Detecting Problematic Participants

  • Response time: flag unusually fast or inconsistent completions
  • Response patterns: spot straight-lining or random answers
  • Open-ended checks: identify gibberish, irrelevant, or copy-pasted text
  • Captchas: block bots at survey entry
  • Attention / comprehension checks: test whether participants read and understand instructions

Common Vendors in Academic Research