Survey Data Cleaning and Weighting

Joe Ripberger

Survey Data Cleaning

What Is Careless Responding?

  • Occurs when respondents fail to read or attend to item content
  • Responses do not reflect true attitudes or traits
  • Differs from:
    • Socially desirable responding (intentional distortion)
    • Faking or malingering (deliberate misrepresentation)

Why Care About Careless Responding?

  • Prevalence: Appears in 1–50% of survey responses; typical ≈ 10–12%
  • Consequences:
    • Adds random error → lowers reliability
    • Attenuates or inflates correlations
    • Distorts factor structures and construct validity
    • Reduces power; increases Type I/II errors
    • Leads to misleading conclusions

Causes of Careless Responding

  • Survey factors: length, complexity, poor instructions
  • Person factors: low interest, boredom, low self-control, low conscientiousness
  • Context factors: online settings, distractions, weak social norms

Data Patterns That Indicate Careless Responding

  • Invariability: straight-lining (same answer repeatedly)
  • Fast responding: unrealistically short completion times
  • Inconsistency: contradictory answers on similar items
  • Respondents may shift between careful and careless phases within a survey

How to Identify Careless Responding

A Priori Methods (Direct)

  • Instructed response items: “To make sure you’re paying attention, please select Strongly disagree for this statement.”
  • Bogus items: “I have visited every country in the world.”
  • Self-report checks: “Should we use your responses in our analysis?”
  • Response-time recording: Flag unrealistically fast response times (≈ 2 sec / page)

How to Identify Careless Responding

Post Hoc Methods (Indirect)

  • Invariability indices:
    • Longstring index: longest string of identical responses
    • Intra-individual Response Variability (IVR): within person SD
  • Consistency indices:
    • Psychometric Synonyms: positive correlations between similar items
    • Psychometric Antonyms: negative correlations between opposite items
  • More advanced:
    • Mahalanobis distance (MD): multivariate outliers
    • Resampled Personal Reliability (RPR): resampled personal reliability
    • R Package: careless

Deciding Who Is Careless

  • No universal cut-offs; researcher judgment is necessary
  • Combine multiple indicators for a more robust assessment
  • Run analyses with and without potentially careless respondents to assess the impact on your results

Preventing Careless Responding

  • Clear, explicit instructions emphasizing data quality
  • Commitment cues: e.g., sign initials pledging carefulness
  • Warnings: data will be checked
  • Proctoring cues: in-person or virtual monitoring reduces carelessness
  • Survey design tweaks: shorter blocks, engaging visuals, reduced monotony

Survey Data Weighting

What Is Survey Weighting?

  • Survey weighting is a statistical technique used to adjust the results of a survey to make them more representative of the target population
    • Weights may reduce bias but increase variance
  • Types of survey weights:
    • Design weights: correct for unequal selection probabilities due to the sampling design
    • Nonresponse weights: correct for differential response (non-response) rates among different groups (e.g., young adults)
    • Post-stratification weights: correct for demographic discrepancies between the sample and the population (e.g., gender, age, race)
  • With probability samples, we often need to incorporate all three types of weights
    • Final Weight = Design Weight × Nonresponse Adjustment × Post-Stratification Adjustment
  • With nonprobability samples, we can usually apply only post-stratification (or calibration) weights

Primary Ways to Calculate Post-Stratification Weights

  1. Cell-based adjustment (traditional post-stratification)
    • Aligns sample to joint distributions (e.g., gender × age)
    • Requires at least one respondent in every cell
    • Works best when you have a large sample and simple stratification
  2. Raking (Iterative Proportional Fitting)
    • Aligns sample to marginal distributions (e.g., gender, age)
    • Does not require a respondent in every cell
  • Both approaches start from the same data; the choice depends on whether you have reliable joint targets and coverage in every cell (cell-based) or only marginal targets / sparse cells (raking)

Cell-based adjustment (traditional post-stratification)

  • Divide the sample into post-strata based on key variables (e.g., age × gender)
  • Compute weights as the ratio of population proportion to sample proportion within each cell:
    • \(w_i = \frac{P_j}{S_j}\), where \(P_j\) = population proportion and \(S_j\) = sample proportion in cell \(j\)

Example: Cell-Based Adjustment

Gender 18–34 35–54 55+ Total
Male 0.18 0.20 0.10 0.48
Female 0.17 0.20 0.15 0.52
Total (Population) 0.35 0.40 0.25 1.00
  • Calculate a weight for each cell based on a sample of n = 100 respondents with the following distribution:
    • Male, 18–34 (n = 12)
      • Weight = 0.18 / 0.12 = 1.50
    • Female, 55+ (n = 27)
      • Weight = ____
    • Male, 35-54 (n = 6)
      • Weight = ____

Example: Cell-Based Adjustment

Gender Age Group Sample n Sample % Population % Weight (P/S) Weighted n Weighted %
Male 18–34 12 0.12 0.18 1.50 18.0 0.18
Male 35–54 6 0.06 0.20 3.33 20.0 0.20
Male 55+ 8 0.08 0.10 1.25 10.0 0.10
Female 18–34 10 0.10 0.17 1.70 17.0 0.17
Female 35–54 37 0.37 0.20 0.54 20.0 0.20
Female 55+ 27 0.27 0.15 0.56 15.0 0.15
Total 100 1.00 1.00 100.0 1.00
  • Weighted proportions match population proportions
  • Weighted proportions sum to 1.0

Example: Cell-Based Adjustment

Gender Age Group Sample n Sample % Population % Weight (P/S) Weighted n Weighted %
Male 18–34 12 0.12 0.18 1.50 18.0 0.18
Male 35–54 0 0.00 0.20
Male 55+ 8 0.08 0.10 1.25 10.0 0.10
Female 18–34 10 0.10 0.17 1.70 17.0 0.17
Female 35–54 43 0.43 0.20 0.47 20.0 0.20
Female 55+ 27 0.27 0.15 0.56 15.0 0.15
Total 100 1.00 1.00 80.0 0.80*
  • Weighted proportions match population proportions
  • Weighted proportions DO NOT sum to 1.0 because a population cell (Male 35–54) is unrepresented in the sample
    • No sample cases exist for Male 35–54, so a weight cannot be computed
    • The population proportion (0.20) for that cell remains uncaptured, meaning weighting alone cannot correct the coverage gap

Raking (Iterative Proportional Fitting)

  • Adjusts the sample to match marginal distributions (e.g., age, gender, race, education) instead of joint distributions (e.g., age × gender × race × education)
  • Iteratively modifies weights until all weighted margins align with population benchmarks
  • Commonly used in modern survey weighting because it handles sparse cells better than traditional post-stratification

Example: Raking

Your sample of 100 respondents has no Male 35–54, so you can’t do cell-based post-stratification.

Gender Age Group Sample % Population %
Male 18–34 12 18
Male 35–54 0 20
Male 55+ 8 10
Female 18–34 10 17
Female 35–54 43 20
Female 55+ 27 15
Total 100 100

Example: Raking

Step 1: Compute Marginal Targets

  • Gender Margins
    • Population: 48% Male, 52 % Female
    • Sample: 20% Male, 80% Female
  • Age Margins
    • Population: 18–34 = 35%, 35–54 = 40%, 55+ = 25%
    • Sample: 18–34 = 22%, 35–54 = 43%, 55+ = 35%
  • These margins are all you need for raking
  • The joint distribution (each cell) doesn’t have to be fully populated

Example: Raking

Step 1: Iteration 1 – Adjust by Gender

  • Start everyone with weight = 1; then adjust weights iteratively to hit population margins
Gender Sample % Target % Adjustment Factor
Male 0.20 0.48 0.48 / 0.20 = 2.40
Female 0.80 0.52 0.52 / 0.80 = 0.65
  • Now every male gets 2.40, every female 0.65
  • Totals now match gender margins exactly, but not age

Example: Raking

Step 3: Iteration 2 – Adjust by Age

Age Group Weighted % (after Step 2) Target % Adjustment Factor
18–34 ≈ 0.26 0.35 0.35 / 0.26 = 1.35
35–54 ≈ 0.37 0.40 0.40 / 0.37 = 1.08
55+ ≈ 0.37 0.25 0.25 / 0.37 = 0.68
  • Multiply each case’s current weight by the factor for their age group
  • Now age margins align; gender margins drift again slightly

Example: Raking

Step 4: Iterate

  • Alternate adjustments for gender and age several times until both sets of margins converge
Margin Weighted % (Final) Population %
Male 48 48
Female 52 52
18–34 35 35
35–54 40 40
55+ 25 25
  • Even though Male 35–54 = 0, raking redistributes weights among existing males and 35–54-year-olds to match both margins

Raking

  • Once you do this age correction, the age distribution now matches perfectly, but the gender distribution has drifted slightly again

  • So, you’ll go back and forth—adjust gender → adjust age → adjust gender → adjust age—until both sets of margins are simultaneously aligned (typically within a tiny tolerance)

  • Raking is a balancing act:

    • Each round fixes one set of margins but perturbs the other slightly
    • After several iterations, the two sets of adjustments stabilize, and the weighted sample simultaneously matches the age and gender population margins

More on Survey Weighting

  • The most basic weighting method (raking) performs nearly as well as more elaborate techniques based on matching.
  • When it comes to accuracy, choosing the right variables for weighting is more important than choosing the right statistical method.

Common Variables and Benchmark Sources for Survey Weighting

Variable Example Benchmark Source(s)
Age American Community Survey (ACS); Current Population Survey (CPS)
Gender / Sex ACS; CPS
Race / Ethnicity ACS; CPS
Education ACS; CPS March Supplement
Income ACS; CPS; Small Area Income & Poverty Estimates (SAIPE)
Region / State ACS; Census; USPS Crosswalks
Urbanicity ACS; National Center for Health Statistics (NCHS)
Party Identification Pew Research Center’s National Public Opinion Reference Survey (NPORS)
Ideology (Liberal–Conservative) Pew NPORS; American National Election Studies (ANES)
Voter Registration / Turnout CPS Voting & Registration Supplement; Catalist / L2 voter files
Religious Affiliation Pew NPORS; General Social Survey (GSS)
Internet / Device Ownership / Social Media Use Pew NPORS; NTIA Internet Use Survey; ACS (broadband access)

Exercise: Data Detective!

  1. Review the survey instrument: skim the questionnaire to identify items that might reveal careless responding (e.g., instructed-response items, psychometric synonyms)
  2. Explore the dataset: calculate one or more indicators of careless responding
    • Instructed response items
    • Bogus items
    • Self-report checks
    • Response times
    • Invariability indices
    • Consistency indices
    • More advanced (MD)
  3. Flag potential cases of careless responding based on your chosen indicators
  4. Report out: share your findings with the group
    • What indicators did you use?
    • How many cases did you flag as potentially careless?
    • How might removing these cases impact the survey results?