Causal inference is the process of concluding whether a treatment, intervention, or exposure causes a change in an outcome, rather than just observing correlations. I have already discussed this in detail in a separate post.

Key Concepts in Causal Inference:

  1. Association vs. Causation
    • Association: Two variables are statistically related (e.g., ice cream sales and drowning incidents both rise in summer).
    • Causation: One variable directly affects another (e.g., smoking → lung cancer).
  2. Counterfactual Framework
    • The core idea is comparing what happened with the treatment vs. what would have happened without it (the “counterfactual”).
    • Since we can’t observe both, we use statistical methods to estimate the causal effect.
  3. Confounding & Bias
    • confounder is a variable that affects both treatment and outcome (e.g., wealth influences both health habits and heart disease risk).
    • Without accounting for confounders, we might mistake correlation for causation.

Mathematical Notation:


• Let Y be the outcome
• Let Z be the treatment or exposure, where Z is binary e.g., Z = 1 denotes treatment and Z = 0 denotes control
• Let X be some baseline or pre-treatment covariates
• We will use potential outcomes notation:
– Y (1) is the outcome when Z = 1 e.g., under treatment
– Y (0) is the outcome when Z = 0 e.g., under control
Everyone in the study has a potential Y (1) and Y (0) , regardless of what treatment they actually received. However, only one of Y (1) or Y (0) is actually observed. The causal effect is the difference between these two potential outcomes i.e., Y (1) – Y (0)

Average Treatment Effect (ATE)

It measures the average causal effect of a treatment (or intervention) across the whole population.

In plain words:

If everyone in a group got the treatment instead of none, how much would the outcome change on average?

Example:

Say we’re studying a job training program.

  • Treatment group: People who did the program
  • Control group: People who didn’t

Let’s say:

  • Average income with training = $40,000
  • Average income without training = $35,000

ATE = $40,000 – $35,000 = $5,000
So, the average treatment effect is $5,000 increase in income due to the training.

Key formula:

ATE=E[Y(1)−Y(0)]

Where:

  • Y(1) = outcome with treatment
  • Y(0) = outcome without treatment
  • E[⋅] = expected value (average)

Let’s dig a bit deeper into the variations of ATE and how they’re used.


1. Conditional Average Treatment Effect (CATE)

This is the ATE for a specific subgroup of the population.

Example: Suppose the job training program has a bigger impact on young people than on older folks.

Then:

  • CATE (age < 25) might be $7,000
  • CATE (age ≥ 25) might be $3,000

So CATE helps personalize or target interventions.


2. Average Treatment Effect on the Treated (ATT or TOT)

This measures the average effect of the treatment on those who actually received it.

Why it matters: Sometimes only a select group chooses or qualifies for treatment, and we want to know how well it worked for them, not the whole population.

Example: If only motivated people signed up for training, their ATT might be $6,000, even if ATE is $5,000.

ATT = E[Y(1) – Y(0) | Treated = 1]

So it’s conditional on actually getting treated.


3. Average Treatment Effect on the Untreated (ATU)

This flips the idea—what would the effect be if the untreated people had received treatment?

Useful for imagining what would happen if you expanded a program to new people.


Estimating ATE, ATT, etc.

In practice, we can’t observe both Y(1) and Y(0) for the same person, so we use methods like:

  • Randomized Controlled Trials (RCTs) → Gold standard
  • Matching → Compare similar treated and untreated individuals
  • Regression → Control for confounding variables
  • Instrumental Variables → When randomization isn’t possible
  • Difference-in-Differences → Compare trends over time

Python Code:

import numpy as np
import pandas as pd

np.random.seed(42)

# 1. Simulate 1000 people with a feature (e.g., age)
n = 1000
age = np.random.randint(18, 60, size=n)
treated = np.random.binomial(1, 0.5, size=n)  # Randomly assign treatment

# 2. Define potential outcomes
# Let's say treatment effect is stronger for younger people
true_effect = 5 - 0.05 * age  # CATE: effect decreases with age

# Base outcome without treatment
y0 = 20 + 0.1 * age + np.random.normal(0, 2, size=n)
# Outcome with treatment
y1 = y0 + true_effect

# 3. Reveal only one outcome based on treatment status
observed_y = treated * y1 + (1 - treated) * y0

# Create DataFrame
df = pd.DataFrame({
    'age': age,
    'treated': treated,
    'y0': y0,
    'y1': y1,
    'observed_y': observed_y,
    'true_effect': true_effect
})

# 4. Estimate effects
ate = np.mean(df['y1'] - df['y0'])
att = np.mean((df['y1'] - df['y0'])[df['treated'] == 1])
cate_young = np.mean((df['y1'] - df['y0'])[df['age'] < 25])

print(f"True ATE:  {ate:.2f}")
print(f"True ATT:  {att:.2f}")
print(f"CATE (age < 25): {cate_young:.2f}")

A sample output will look like

True ATE:  2.07
True ATT:  2.14
CATE (age < 25): 3.57

This shows:

  • The average treatment effect is around +2.07 units
  • The effect is stronger for younger people (CATE)
  • Those who got treated had a slightly higher average gain (ATT)

Discover more from SolutionShala

Subscribe now to keep reading and get access to the full archive.

Continue reading