Regression Model 03/15/2026 Regression is not fitting a line to data Most people think of regression as: data exists first → find a line that fits the data But statistically, it's the opposite: a line (mean structure) exists first → data is generated randomly around it This is the most important perspective shift in understanding regression Data Generating Process (DGP) The statistical model starts with: \[ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i, \quad \epsilon_i \sim N(0, \sigma^2) \] This means: there exists a true line in the world, but we can never observe it directly Instead, we observe data that is the line + random noise The generation process: \(X_i\) is chosen Conditional mean is computed: \(\mu_i = \beta_0 + \beta_1 X_i\) \(Y_i\) is sampled from \(N(\mu_i, \sigma^2)\) Conditional distribution of \(Y\) At each \(X\), \(Y\) follows a normal distribution: \[ Y \mid X = x \sim N(\beta_0 + \beta_1 x, \, \sigma^2) \] A scatter plot is not just a collection of points — it's the top-down view of vertical normal distributions at each \(X\) The vertical spread at each \(X\) is \(\text{Var}(Y \mid X) = \sigma^2\) Scatter plot with regression line and conditional \(Y\) distributions at \(X = 2, 5, 8\) What the regression line really is The regression line connects the means of the vertical distributions: \[ \text{Regression line} = E(Y \mid X) = \beta_0 + \beta_1 X \] So regression is not "fitting a line to data" — it's estimating the conditional expectation \(E(Y \mid X)\) Equivalently, the regression line minimizes the expected squared error: \[ \beta_0 + \beta_1 X = \arg\min_f \, E\left[(Y - f(X))^2\right] \] Summary The full structure of regression: \[ \epsilon \sim N(0, \sigma^2) \] \[ Y = \beta_0 + \beta_1 X + \epsilon \] \[ Y \mid X \sim N(\beta_0 + \beta_1 X, \, \sigma^2) \] \[ E(Y \mid X) = \beta_0 + \beta_1 X \] \[ \text{Regression line} = E(Y \mid X) \] Once this perspective is understood, confidence intervals, t-tests, ANOVA, and \(R^2\) all follow naturally This generalizes to GLM, Bayesian regression, and Gaussian process regression — all share the same structure