02 ; Equivalence of Finite-Sample Quantile and Counting Predicates for Split Conformal Prediction

Let’s now apply conformal prediction to regression problems.

There is a naive approach where we calculate residuals directly on the training dataset.

However, if you train a machine learning model $f (x)$ on a dataset $D_{train}$ , and then calculate your residuals $R_{i} = ∣ y_{i} - f (x_{i}) ∣$ using that exact same dataset, exchangeability no longer holds. The model overfits/optimizes on $D_{train}$ , minimizing the training residuals. This breaks exchangeability for $R_{n + 1}$ because the test point wasn’t part of that optimization profile.

Split Conformal Prediction

Let $D$ be partitioned into two non-overlapping sets:

Training Set $D_{1}$ : Used to optimize the model parameters to get $f (x)$ .
Calibration Set $D_{2}$ : Consisting of $n_{2}$ independent points.

Compute calibration residuals:

R_{i} = V (X_{i}, Y_{i}) = ∣ Y_{i} - f (X_{i}) ∣ for i \in D_{2}

Data Points (X_{i}, Y_{i}) \sim i.i.d. ⟹ Residuals R_{i} \sim Exchangeable

By sorting the calibration residuals ( $R_{(1)} < R_{(2)} < \dots < R_{(n_{2})}$ ) and choosing index $k = ⌈(1 - α) (n_{2} + 1)⌉$ , exchangeability guarantees:

P (V (X_{n + 1}, Y_{n + 1}) \leq R_{(k)}) \geq 1 - α

The prediction set constructed with this out-of-sample threshold $R_{(k)}$ yields:

\hat{C}_{n} (X_{n + 1}) = [f (X_{n + 1}) - R_{(k)}, f (X_{n + 1}) + R_{(k)}]

The upper bound of the coverage only holds if there are no ties (the no-ties condition):

P (Y_{n + 1} \in \hat{C}_{n} (X_{n + 1}) (X_{i}, Y_{i}), i \in D_{1}) \in [1 - α, 1 - α + \frac{1}{n _{2} + 1})

Score Functions

We can use any score function as long as it treats data symmetrically. The metric $V (x, y)$ is a conformity score function that quantifies how poorly a label $y$ fits an input $x$ given a frozen predictor $\hat{f}_{n_{1}}$ .

A score is negatively-oriented if lower values imply a better, more accurate model prediction (e.g., standard absolute residuals $V (x, y) = ∣ y - \hat{f}_{n_{1}} (x) ∣$ ).

The valid prediction set:

\hat{C}_{n} (x) = {y : V (x, y) \leq \overset{q}{^}_{n_{2}}}

Where $\overset{q}{^}_{n_{2}}$ is the $⌈(1 - α) (n_{2} + 1)⌉$ -th smallest score observed in the calibration set $D_{2}$ .

A score is positively-oriented if higher values imply a better match (common in classification settings).

We invert the operator and threshold for the positive case:

\hat{C}_{n} (x) = {y : V (x, y) \geq Y_{(⌊ α (n_{2} + 1)⌋)}}

Positive Orientation Sorting Framework

We define the valid prediction set $\hat{C}_{n} (x)$ with the following three equivalent statements.

The Order Statistic Formulation

\hat{C}_{n} (x) = {y : V (x, y) \leq ⌈(1 - α) (n_{2} + 1)⌉ smallest of R_{i}, i \in D_{2}}

The Empirical Quantile Formulation

\hat{C}_{n} (x) = {y : V (x, y) \leq Quantile (\frac{⌈( 1 - α ) ( n _{2} + 1 )⌉}{n _{2}}; \frac{1}{n _{2}} i \in D_{2} \sum δ_{R_{i}})}

The Empirical Counting Formulation

\hat{C}_{n} (x) = {y : \frac{1}{n _{2}} i \in D_{2} \sum 1 {R_{i} < V (x, y)} \leq \frac{⌈( 1 - α ) ( n _{2} + 1 )⌉}{n _{2}}}

ECDF and Generalized Inverse Derivation
Mapping Counting Measures back to Order Statistics

Current Status: Formally proved the bijection between the ECDF, the empirical quantile function, and historical order statistics for split conformal prediction.
Next Objective: It turns out that split conformal prediction is not good as it has a constant width across all data points which means it may undercover and overcover at the same time.

Gavin

02 ; Equivalence of Finite-Sample Quantile and Counting Predicates for Split Conformal Prediction

Split Conformal Prediction

Score Functions

Backlinks