Glossary: M10 — Simple Linear Regression

Module: M10 Formulas: Formula Sheet Concept page: Regression Concept

Simple Linear Regression

A statistical model that estimates the linear relationship between one Dependent Variable and one Independent Variable. The estimated model form is:

$\hat{Y}_{i} = \hat{b}_{0} + \hat{b}_{1} X_{i}$

LOS: 10.a | Purpose: Explain variation in $Y$ , test whether $X$ helps explain $Y$ , and make predictions. | Related: Regression Concept

Dependent Variable

The variable whose variation is being explained or predicted in a regression model. Placed on the left-hand side of the regression equation. Also called the response variable or regressand.

LOS: 10.a | Notation: $Y$ | Contrast: Independent Variable

Independent Variable

The variable used to explain or predict the dependent variable. Placed on the right-hand side of the regression equation. Also called the explanatory variable, predictor, or regressor.

LOS: 10.a | Notation: $X$ | Contrast: Dependent Variable

Intercept

The estimated value of the dependent variable when the independent variable equals zero. The regression line’s crossing point on the $Y$ -axis.

LOS: 10.a | Notation: $\hat{b}_{0}$ | Caution: The intercept may not have a meaningful economic interpretation if $X = 0$ is outside the data range.

Slope Coefficient

The estimated change in the dependent variable associated with a one-unit increase in the independent variable.

$\hat{b}_{1} = \frac{Cov ( X , Y )}{s _{X}^{2}}$

LOS: 10.a | Interpretation: If $\hat{b}_{1} = 2.5$ , a one-unit increase in $X$ is associated with a 2.5-unit increase in $Y$ , on average.

Ordinary Least Squares (OLS)

The most common method for estimating regression coefficients. Minimizes the sum of squared residuals (vertical distances between observed and fitted values).

$min_{\hat{b}_{0}, \hat{b}_{1}} \sum_{i = 1}^{n} \overset{ε}{^}_{i}^{2} = min \sum_{i = 1}^{n} (Y_{i} - \hat{b}_{0} - \hat{b}_{1} X_{i})^{2}$

LOS: 10.b | Property: Produces the Best Linear Unbiased Estimator (BLUE) when classical assumptions hold.

Residual

The difference between the actual value $Y_{i}$ and the fitted value $\hat{Y}_{i}$ for observation $i$ . Also called the estimated error.

$\overset{ε}{^}_{i} = Y_{i} - \hat{Y}_{i} = Y_{i} - \hat{b}_{0} - \hat{b}_{1} X_{i}$

LOS: 10.b | Key: OLS minimizes $\sum \overset{ε}{^}_{i}^{2}$ . Residuals sum to zero: $\sum \overset{ε}{^}_{i} = 0$ .

Regression Line

The estimated line $\hat{Y} = \hat{b}_{0} + \hat{b}_{1} X$ that minimizes the sum of squared residuals. Passes through the point $(\overset{ˉ}{X}, \overset{ˉ}{Y})$ .

LOS: 10.b | Related: Ordinary Least Squares (OLS)

Sum of Squares Total (SST)

The total variation in the dependent variable around its mean. Decomposed into explained and unexplained variation.

$S S T = \sum_{i = 1}^{n} (Y_{i} - \overset{ˉ}{Y})^{2} = S S R + S S E$

LOS: 10.c | Related: Sum of Squares Regression (SSR), Sum of Squares Error (SSE)

Sum of Squares Regression (SSR)

The variation in $Y$ explained by the regression model — the portion attributable to the independent variable.

$S S R = \sum_{i = 1}^{n} (\hat{Y}_{i} - \overset{ˉ}{Y})^{2}$

LOS: 10.c | Also called: Explained sum of squares (ESS).

Sum of Squares Error (SSE)

The unexplained variation in $Y$ — the portion not captured by the regression model.

$S S E = \sum_{i = 1}^{n} (Y_{i} - \hat{Y}_{i})^{2} = \sum_{i = 1}^{n} \overset{ε}{^}_{i}^{2}$

LOS: 10.c | Also called: Residual sum of squares (RSS).

Coefficient of Determination (R²)

The proportion of the total variation in the dependent variable explained by the independent variable. Ranges from 0 to 1.

$R^{2} = \frac{S S R}{S S T} = 1 - \frac{S S E}{S S T}$

LOS: 10.d | Interpretation: $R^{2} = 0.72$ means 72% of the variation in $Y$ is explained by $X$ . | Note: In simple linear regression, $R^{2} = r_{X Y}^{2}$ (squared Pearson correlation).

Standard Error of Estimate (SEE)

A measure of the average magnitude of the regression residuals. Indicates how closely the regression line fits the data.

$S E E = \frac{S S E}{n - 2} = M S E$

LOS: 10.d | Key: Lower SEE → better fit. Used to construct prediction intervals. | Related: Mean Square Error (MSE)

Mean Square Regression (MSR)

The average explained variation per degree of freedom used by the regression model.

$M S R = \frac{S S R}{k}$

where $k$ = number of independent variables (= 1 for simple linear regression).

LOS: 10.e | Related: F-Statistic

Mean Square Error (MSE)

The average unexplained variation per degree of freedom — an estimate of the error variance.

$M S E = \frac{S S E}{n - k - 1} = \frac{S S E}{n - 2} (simple regression)$

LOS: 10.e | Related: Standard Error of Estimate (SEE)

ANOVA (Analysis of Variance)

A table that partitions the total variation in $Y$ into explained (regression) and unexplained (error) components. Used to test the overall significance of the regression model.

Source	SS	df	MS	F
Regression	SSR	$k$	MSR	MSR/MSE
Error	SSE	$n - k - 1$	MSE
Total	SST	$n - 1$

LOS: 10.e | Related: F-Statistic, F-Test

F-Statistic

The ratio of mean square regression to mean square error. Tests the null hypothesis that all regression slope coefficients are zero (model has no explanatory power).

$F = \frac{M S R}{M S E} = \frac{S S R / k}{S S E / ( n - k - 1 )}$

LOS: 10.e | Decision: Reject $H_{0}$ if $F > F_{critical}$ (upper-tailed test). | In simple regression: $F = t_{\hat{b}_{1}}^{2}$ .

t-Test for Slope

Test of whether the slope coefficient is statistically significantly different from zero (or another hypothesized value).

$t = \frac{b ^ _{1} - b _{1, 0}}{s _{\hat{b}_{1}}} df = n - 2$

LOS: 10.f | $H_{0}$ : $b_{1} = 0$ (no linear relationship between $X$ and $Y$ ).

t-Test for Correlation

Test of whether the population correlation coefficient is statistically significantly different from zero.

$t = \frac{r n - 2}{1 - r ^{2}} df = n - 2$

LOS: 10.f | Note: In simple linear regression, this test is equivalent to the t-Test for Slope.

t-Test for Intercept

Test of whether the intercept is statistically significantly different from zero.

$t = \frac{b ^ _{0} - b _{0, 0}}{s _{\hat{b}_{0}}} df = n - 2$

LOS: 10.f | Note: Often less economically meaningful than the slope test.

Prediction Interval

An interval estimate for an individual value of $Y$ given a specific value of $X$ . Wider than a confidence interval because it must account for both model uncertainty and individual error.

$\hat{Y} \pm t_{α /2, n - 2} \times s_{f}$

LOS: 10.g | Related: Standard Error of Forecast

Standard Error of Forecast

The standard deviation of the forecast error for a predicted value of $Y$ at a given $X = X_{0}$ . Larger when $X_{0}$ is far from $\overset{ˉ}{X}$ or sample size is small.

$s_{f} = S E E 1 + \frac{1}{n} + \frac{( X _{0} - X ˉ ) ^{2}}{\sum ( X _{i} - X ˉ ) ^{2}}$

LOS: 10.g | Related: Standard Error of Estimate (SEE), Prediction Interval

Homoskedasticity

The assumption that the variance of regression errors is constant across all values of the independent variable. A required classical regression assumption.

$Var (ε_{i}) = σ^{2} for all i$

LOS: 10.h | Contrast: Heteroskedasticity

Heteroskedasticity

Violation of the Homoskedasticity assumption — the variance of the regression errors is not constant across all observations. Common in financial data.

LOS: 10.h | Consequence: OLS standard errors are biased → unreliable $t$ - and $F$ -statistics. | Detection: Breusch-Pagan test; visual inspection of residual plot.

Linearity Assumption

The assumption that the relationship between the dependent and independent variables is linear. Required for OLS estimates to be unbiased.

LOS: 10.h | Violation remedy: Transform variables using Log-Lin Model, Lin-Log Model, or Log-Log Model.

Independence Assumption

The assumption that the regression errors are uncorrelated with each other and with the independent variable. Violation (serial correlation) is common in time-series data.

LOS: 10.h | Detection: Durbin-Watson test for serial correlation.

Normality Assumption

The assumption that the regression errors are normally distributed. Required for valid inference in small samples (t- and F-tests).

LOS: 10.h | Note: By the CLT, this assumption is less critical in large samples.

Cross-Sectional Regression

A regression using data from multiple subjects (firms, individuals, countries) observed at a single point in time.

LOS: 10.i | Related: Cross-Sectional Data

Time-Series Regression

A regression using data from a single subject observed at multiple points in time.

LOS: 10.i | Concern: Potential serial correlation in errors. | Related: Time Series Data

Indicator Variable

A binary variable that takes the value 1 if a condition is met and 0 otherwise. Used to represent categorical variables in regression. Also called a dummy variable.

$D_{i} = {10 if condition is true otherwise$

LOS: 10.j | Application: Capturing structural differences between groups (e.g., recession vs. non-recession periods).

Log-Lin Model

A regression model where the dependent variable is in log form and the independent variable is in level form.

$ln (Y_{i}) = b_{0} + b_{1} X_{i} + ε_{i}$

LOS: 10.k | Interpretation: A one-unit increase in $X$ is associated with a $b_{1} \times 100%$ change in $Y$ .

Lin-Log Model

A regression model where the dependent variable is in level form and the independent variable is in log form.

$Y_{i} = b_{0} + b_{1} ln (X_{i}) + ε_{i}$

LOS: 10.k | Interpretation: A 1% increase in $X$ is associated with a $b_{1} /100$ unit change in $Y$ .

Log-Log Model

A regression model where both the dependent and independent variables are in log form.

$ln (Y_{i}) = b_{0} + b_{1} ln (X_{i}) + ε_{i}$

LOS: 10.k | Interpretation: A 1% increase in $X$ is associated with a $b_{1} %$ change in $Y$ . The slope $b_{1}$ is the elasticity of $Y$ with respect to $X$ .

Wiki Hub

Explorer

Glossary: M10 — Simple Linear Regression

Glossary: M10 — Simple Linear Regression

Simple Linear Regression

Dependent Variable

Independent Variable

Intercept

Slope Coefficient

Ordinary Least Squares (OLS)

Residual

Regression Line

Sum of Squares Total (SST)

Sum of Squares Regression (SSR)

Sum of Squares Error (SSE)

Coefficient of Determination (R²)

Standard Error of Estimate (SEE)

Mean Square Regression (MSR)

Mean Square Error (MSE)

ANOVA (Analysis of Variance)

F-Statistic

t-Test for Slope

t-Test for Correlation

t-Test for Intercept

Prediction Interval

Standard Error of Forecast

Homoskedasticity

Heteroskedasticity

Linearity Assumption

Independence Assumption

Normality Assumption

Cross-Sectional Regression

Time-Series Regression

Indicator Variable

Log-Lin Model

Lin-Log Model

Log-Log Model

Graph View

Table of Contents