January 28, 2026 10:51 pm

A Receiver Operating Characteristic (ROC) curve is a graphical tool used to evaluate the performance of a binary classification test (i.e., one that classifies outcomes into two categories like positive/negative, yes/no, disease/no disease).

A Brief History

The term “Receiver Operating Characteristic” originates from signal detection theory, developed during World War II. It was initially used to assess radar operators’ abilities to differentiate enemy targets from friendly units or background noise. Their effectiveness was referred to as the “receiver operating characteristics.” This methodology wasn’t applied to medical diagnostics until the 1970s.

What does it show?

The ROC curve plots:

  • True Positive Rate (TPR) — also known as sensitivity — on the y-axis
  • False Positive Rate (FPR) — which is 1 – specificity — on the x-axis

Each point on the curve represents the TPR and FPR at a different threshold (or “cutpoint”) used to decide whether a prediction is positive or negative.

ROC Curve displaying the relationship between True Positive Rate (Sensitivity) and False Positive Rate (1 - Specificity) for predicting Versicolor species, with an AUC of 0.669.
ROC Curve

This kind of graph is known as a Receiver Operating Characteristic (ROC) curve. It displays the relationship between the true positive rate and the false positive rate across various threshold values of a diagnostic test.

An ROC curve illustrates several important concepts:

  1. It highlights the balance between sensitivity and specificity—improving sensitivity usually leads to a reduction in specificity.
  2. A curve that closely hugs the left edge and then the top edge of the ROC space indicates a highly accurate test.
  3. A curve that lies near the 45-degree diagonal suggests the test is less accurate.
  4. The slope of the tangent at any given point on the curve represents the likelihood ratio (LR) at that threshold. For example, the LR for T4 < 5 is 52, corresponding to the steep leftmost section of the curve. In contrast, the LR for T4 > 9 is 0.2, which aligns with the flatter, rightmost part of the curve.
  5. The area under the ROC curve (AUC) provides a summary measure of the test’s overall accuracy.

If you want to generate an ROC curve like the one in the plot above using R, you can try the following code:

# Load necessary libraries
install.packages("pROC")
install.packages("ggplot2")
library(pROC)
library(ggplot2)

# Load iris dataset
data(iris)

# Convert Species to binary: 1 = versicolor, 0 = others
iris$binary_species <- ifelse(iris$Species == "versicolor", 1, 0)

# Fit a logistic regression model to predict versicolor
model <- glm(binary_species ~ Sepal.Length + Petal.Length, data = iris, family = binomial)

# Get predicted probabilities
predicted_probs <- predict(model, type = "response")

# Create ROC object
roc_obj <- roc(iris$binary_species, predicted_probs)

# Plot with ggroc
ggroc(roc_obj, color = "#16A085", size = 1.5) +
  ggtitle("ROC Curve: Predicting Versicolor Species") +
  xlab("False Positive Rate (1 - Specificity)") +
  ylab("True Positive Rate (Sensitivity)") +
  theme_minimal(base_size = 14) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "gray") +
  annotate("text", x = 0.6, y = 0.2,
           label = paste("AUC =", round(auc(roc_obj), 3)),
           size = 5, color = "#16A085")
A ROC curve comparing multiple binary classification tests, showing True Positive Rate on the y-axis and False Positive Rate on the x-axis. The graph includes curves labeled as 'Worthless,' 'Good,' and 'Excellent' with a gray background.
Comparing ROC curves (Source: Google images)

The graph above displays three ROC curves that represent diagnostic tests of varying quality—excellent, good, and worthless—on the same plot. The accuracy of each test is determined by its ability to distinguish between individuals with and without the condition being evaluated. This accuracy is quantified by the area under the ROC curve (AUC). An AUC of 1 indicates a perfect test, while an AUC of 0.5 suggests the test performs no better than chance.

A general grading scale for interpreting AUC values is as follows:

  • 0.90–1.00: Excellent (A)
  • 0.80–0.90: Good (B)
  • 0.70–0.80: Fair (C)
  • 0.60–0.70: Poor (D)
  • 0.50–0.60: Fail (F)

You might now wonder what the AUC truly represents. Essentially, it measures the discrimination ability of the test—that is, how well it can differentiate between those with and without the disease. Imagine randomly selecting one individual with the disease and one without, and performing the test on both. If the test result for the diseased individual is more “abnormal” than for the healthy one, that counts as a correct classification. The AUC reflects the proportion of such correctly classified pairs.

The actual calculation of the AUC involves more technical methods. Two common approaches are:

  1. A non-parametric method, which approximates the area using trapezoids under the curve.
  2. A parametric method, which uses maximum likelihood to fit a smooth curve to the data.

Software can compute both estimates and provide standard errors for comparing different tests or performance across populations.

You can read about AUC, Gini Coefficient and CAP curves in further details in this blog post.

Discover more from SolutionShala

Subscribe now to keep reading and get access to the full archive.

Continue reading