February 10, 2026 12:34 pm

When randomized controlled trials (RCTs) aren’t feasible, observational studies step in. But they come with a risk: treatment assignment bias. For example, healthier individuals may be more likely to choose a treatment, skewing the observed effects.

Propensity Score Matching (PSM) helps solve this by mimicking randomization. It matches treated and untreated individuals with similar characteristics (covariates), reducing confounding and balancing groups.

In this post, we’ll explore:

  1. What is Propensity Score Matching?
  2. Steps: Estimation, Matching, and Balance Checking
  3. Python Code Example (using simulated data)
  4. Caveats and Best Practices

What is Propensity Score Matching?

Propensity score = the probability of receiving treatment given observed covariates.

Developed by Rosenbaum and Rubin (1983), the idea is simple: match treated and untreated units with similar propensity scores.

If we can match units well, we can estimate causal effects like:

Mathematical formula representing the Average Treatment effect on the Treated (ATT) in causal inference, displaying the expected difference in outcomes for treated versus untreated units.

where Y₁ is the outcome if treated, Y₀ if not, and T=1 for treated units.

Step-by-Step Guide to PSM

Step 1: Estimate Propensity Scores

Use logistic regression or machine learning to estimate the probability of treatment based on observed covariates.

Step 2: Match Treated and Untreated Units

Options:

  • Nearest neighbor matching
  • Caliper matching (within a threshold)
  • Kernel or Mahalanobis matching

Step 3: Check Covariate Balance

Compare standardized mean differences (SMDs) before and after matching. Well-matched samples should have small SMDs (<0.1).

Step 4: Estimate Treatment Effects

Use difference in means, regression, or weighted analysis on the matched sample.


🧑‍💻 Python Code Example

Let’s walk through a full code example using synthetic data.

import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors
import matplotlib.pyplot as plt
import seaborn as sns

#simulate the data
np.random.seed(42)
n = 1000

# Covariates
age = np.random.normal(50, 10, n)
income = np.random.normal(60000, 15000, n)

# Treatment assignment depends on covariates
p = 1 / (1 + np.exp(-0.01*(income - 60000) + 0.05*(age - 50)))
treatment = np.random.binomial(1, p)

# Outcome depends on treatment and covariates
outcome = 5*treatment + 0.1*income - 0.3*age + np.random.normal(0, 1000, n)

df = pd.DataFrame({'age': age, 'income': income, 'treatment': treatment, 'outcome': outcome})

#estimate propensity scores

X = df[['age', 'income']]
y = df['treatment']

model = LogisticRegression()
model.fit(X, y)

df['propensity_score'] = model.predict_proba(X)[:, 1]

#matching nearest neighbour
treated = df[df['treatment'] == 1]
control = df[df['treatment'] == 0]

# Fit Nearest Neighbors on control units
nn = NearestNeighbors(n_neighbors=1)
nn.fit(control[['propensity_score']])

# Find nearest neighbor for each treated unit
distances, indices = nn.kneighbors(treated[['propensity_score']])
matched_control = control.iloc[indices.flatten()].copy()
matched_treated = treated.reset_index(drop=True).copy()

matched_df = pd.concat([matched_treated, matched_control])

#check covariate balance
def standardized_mean_diff(var):
    treated_mean = matched_treated[var].mean()
    control_mean = matched_control[var].mean()
    pooled_std = np.sqrt((matched_treated[var].var() + matched_control[var].var()) / 2)
    return np.abs(treated_mean - control_mean) / pooled_std

for col in ['age', 'income', 'propensity_score']:
    print(f"SMD for {col}: {standardized_mean_diff(col):.3f}")

#estimate ATT
att = matched_treated['outcome'].mean() - matched_control['outcome'].mean()
print(f"Estimated ATT: {att:.2f}")

#Visualize
sns.kdeplot(df[df['treatment']==1]['propensity_score'], label='Treated (All)', color='blue')
sns.kdeplot(df[df['treatment']==0]['propensity_score'], label='Control (All)', color='red')
plt.title("Propensity Score Distribution Before Matching")
plt.legend()
plt.show()

Best Practices & Caveats

  1. Unobserved Confounding: PSM only controls for observed variables.
  2. Overlap Assumption: Treated and control groups must have similar propensity score ranges.
  3. Diagnostics Are Key: Always check balance using SMD or visual plots.
  4. Multiple Matches or Calipers: Try matching with replacement or setting a caliper (e.g., 0.05) to improve quality.
  5. Combine With Regression: You can still regress outcomes on covariates post-matching for bias correction.

Conclusion

Propensity Score Matching (PSM) is a vital technique in the data scientist’s and economist’s toolbox for drawing causal inferences from observational data. By estimating the probability of treatment based on observed covariates and matching similar individuals across treatment groups, PSM reduces bias and simulates the conditions of a randomized experiment.

However, it’s important to remember that PSM is only as strong as the covariates you include. If important confounders are omitted, matching will not correct for the hidden bias. That’s why thoughtful model specification, thorough balance checks, and transparent reporting are essential.

When done properly, PSM can uncover meaningful treatment effects from real-world data—offering powerful insights in healthcare, public policy, economics, and beyond.

In short: Propensity Score Matching won’t give you perfect answers, but it will give you better ones—especially when randomization isn’t an option.

Academic References

  1. Rosenbaum, P. R., & Rubin, D. B. (1983).
    The central role of the propensity score in observational studies for causal effects.
    Biometrika, 70(1), 41–55.
    https://doi.org/10.1093/biomet/70.1.41
  2. Austin, P. C. (2011).
    An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies.
    Multivariate Behavioral Research, 46(3), 399–424.
    https://doi.org/10.1080/00273171.2011.568786
  3. Stuart, E. A. (2010).
    Matching methods for causal inference: A review and a look forward.
    Statistical Science, 25(1), 1–21.
    https://doi.org/10.1214/09-STS313
  4. Guo, S., & Fraser, M. W. (2014).
    Propensity Score Analysis: Statistical Methods and Applications (2nd ed.).
    SAGE Publications.

Online Resources and Tutorials

  1. Harvard T.H. Chan School of Public Health – Causal Inference Book:
    https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
    (Free textbook by Hernán and Robins on causal inference in health research.)
  2. The Methodology Center at Penn State – PSM Tutorials:
    https://www.methodology.psu.edu/ra/most/psm/
  3. StatsModels Documentation (Python):
    https://www.statsmodels.org/stable/index.html

  • econml – for causal inference tools from Microsoft Research
  • causalml – from Uber for uplift and causal modeling
  • DoWhy – robust library for causal inference using structural assumptions
  • matchms – for matching scores, though more in mass spectrometry context

Discover more from SolutionShala

Subscribe now to keep reading and get access to the full archive.

Continue reading