Regression Model

03/15/2026

Most people think of regression as: data exists first → find a line that fits the data
But statistically, it's the opposite: a line (mean structure) exists first → data is generated randomly around it
This is the most important perspective shift in understanding regression

The statistical model starts with: \[ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i, \quad \epsilon_i \sim N(0, \sigma^2) \]
This means: there exists a true line in the world, but we can never observe it directly
Instead, we observe data that is the line + random noise
The generation process:
1. \(X_i\) is chosen
2. Conditional mean is computed: \(\mu_i = \beta_0 + \beta_1 X_i\)
3. \(Y_i\) is sampled from \(N(\mu_i, \sigma^2)\)

At each \(X\), \(Y\) follows a normal distribution: \[ Y \mid X = x \sim N(\beta_0 + \beta_1 x, \, \sigma^2) \]
A scatter plot is not just a collection of points — it's the top-down view of vertical normal distributions at each \(X\)
The vertical spread at each \(X\) is \(\text{Var}(Y \mid X) = \sigma^2\)

Scatter plot with regression line and conditional \(Y\) distributions at \(X = 2, 5, 8\)

The regression line connects the means of the vertical distributions: \[ \text{Regression line} = E(Y \mid X) = \beta_0 + \beta_1 X \]
So regression is not "fitting a line to data" — it's estimating the conditional expectation \(E(Y \mid X)\)
Equivalently, the regression line minimizes the expected squared error: \[ \beta_0 + \beta_1 X = \arg\min_f \, E\left[(Y - f(X))^2\right] \]

The full structure of regression: \[ \epsilon \sim N(0, \sigma^2) \] \[ Y = \beta_0 + \beta_1 X + \epsilon \] \[ Y \mid X \sim N(\beta_0 + \beta_1 X, \, \sigma^2) \] \[ E(Y \mid X) = \beta_0 + \beta_1 X \] \[ \text{Regression line} = E(Y \mid X) \]
Once this perspective is understood, confidence intervals, t-tests, ANOVA, and \(R^2\) all follow naturally
This generalizes to GLM, Bayesian regression, and Gaussian process regression — all share the same structure