Glossary: M10 — Simple Linear Regression

Module: M10 Formulas: Formula Sheet Concept page: Regression Concept


Simple Linear Regression

A statistical model that estimates the linear relationship between one Dependent Variable and one Independent Variable. The estimated model form is:

LOS: 10.a | Purpose: Explain variation in , test whether helps explain , and make predictions. | Related: Regression Concept


Dependent Variable

The variable whose variation is being explained or predicted in a regression model. Placed on the left-hand side of the regression equation. Also called the response variable or regressand.

LOS: 10.a | Notation: | Contrast: Independent Variable


Independent Variable

The variable used to explain or predict the dependent variable. Placed on the right-hand side of the regression equation. Also called the explanatory variable, predictor, or regressor.

LOS: 10.a | Notation: | Contrast: Dependent Variable


Intercept

The estimated value of the dependent variable when the independent variable equals zero. The regression line’s crossing point on the -axis.

LOS: 10.a | Notation: | Caution: The intercept may not have a meaningful economic interpretation if is outside the data range.


Slope Coefficient

The estimated change in the dependent variable associated with a one-unit increase in the independent variable.

LOS: 10.a | Interpretation: If , a one-unit increase in is associated with a 2.5-unit increase in , on average.


Ordinary Least Squares (OLS)

The most common method for estimating regression coefficients. Minimizes the sum of squared residuals (vertical distances between observed and fitted values).

LOS: 10.b | Property: Produces the Best Linear Unbiased Estimator (BLUE) when classical assumptions hold.


Residual

The difference between the actual value and the fitted value for observation . Also called the estimated error.

LOS: 10.b | Key: OLS minimizes . Residuals sum to zero: .


Regression Line

The estimated line that minimizes the sum of squared residuals. Passes through the point .

LOS: 10.b | Related: Ordinary Least Squares (OLS)


Sum of Squares Total (SST)

The total variation in the dependent variable around its mean. Decomposed into explained and unexplained variation.

LOS: 10.c | Related: Sum of Squares Regression (SSR), Sum of Squares Error (SSE)


Sum of Squares Regression (SSR)

The variation in explained by the regression model — the portion attributable to the independent variable.

LOS: 10.c | Also called: Explained sum of squares (ESS).


Sum of Squares Error (SSE)

The unexplained variation in — the portion not captured by the regression model.

LOS: 10.c | Also called: Residual sum of squares (RSS).


Coefficient of Determination (R²)

The proportion of the total variation in the dependent variable explained by the independent variable. Ranges from 0 to 1.

LOS: 10.d | Interpretation: means 72% of the variation in is explained by . | Note: In simple linear regression, (squared Pearson correlation).


Standard Error of Estimate (SEE)

A measure of the average magnitude of the regression residuals. Indicates how closely the regression line fits the data.

LOS: 10.d | Key: Lower SEE → better fit. Used to construct prediction intervals. | Related: Mean Square Error (MSE)


Mean Square Regression (MSR)

The average explained variation per degree of freedom used by the regression model.

where = number of independent variables (= 1 for simple linear regression).

LOS: 10.e | Related: F-Statistic


Mean Square Error (MSE)

The average unexplained variation per degree of freedom — an estimate of the error variance.

LOS: 10.e | Related: Standard Error of Estimate (SEE)


ANOVA (Analysis of Variance)

A table that partitions the total variation in into explained (regression) and unexplained (error) components. Used to test the overall significance of the regression model.

SourceSSdfMSF
RegressionSSRMSRMSR/MSE
ErrorSSEMSE
TotalSST

LOS: 10.e | Related: F-Statistic, F-Test


F-Statistic

The ratio of mean square regression to mean square error. Tests the null hypothesis that all regression slope coefficients are zero (model has no explanatory power).

LOS: 10.e | Decision: Reject if (upper-tailed test). | In simple regression: .


t-Test for Slope

Test of whether the slope coefficient is statistically significantly different from zero (or another hypothesized value).

LOS: 10.f | : (no linear relationship between and ).


t-Test for Correlation

Test of whether the population correlation coefficient is statistically significantly different from zero.

LOS: 10.f | Note: In simple linear regression, this test is equivalent to the t-Test for Slope.


t-Test for Intercept

Test of whether the intercept is statistically significantly different from zero.

LOS: 10.f | Note: Often less economically meaningful than the slope test.


Prediction Interval

An interval estimate for an individual value of given a specific value of . Wider than a confidence interval because it must account for both model uncertainty and individual error.

LOS: 10.g | Related: Standard Error of Forecast


Standard Error of Forecast

The standard deviation of the forecast error for a predicted value of at a given . Larger when is far from or sample size is small.

LOS: 10.g | Related: Standard Error of Estimate (SEE), Prediction Interval


Homoskedasticity

The assumption that the variance of regression errors is constant across all values of the independent variable. A required classical regression assumption.

LOS: 10.h | Contrast: Heteroskedasticity


Heteroskedasticity

Violation of the Homoskedasticity assumption — the variance of the regression errors is not constant across all observations. Common in financial data.

LOS: 10.h | Consequence: OLS standard errors are biased → unreliable - and -statistics. | Detection: Breusch-Pagan test; visual inspection of residual plot.


Linearity Assumption

The assumption that the relationship between the dependent and independent variables is linear. Required for OLS estimates to be unbiased.

LOS: 10.h | Violation remedy: Transform variables using Log-Lin Model, Lin-Log Model, or Log-Log Model.


Independence Assumption

The assumption that the regression errors are uncorrelated with each other and with the independent variable. Violation (serial correlation) is common in time-series data.

LOS: 10.h | Detection: Durbin-Watson test for serial correlation.


Normality Assumption

The assumption that the regression errors are normally distributed. Required for valid inference in small samples (t- and F-tests).

LOS: 10.h | Note: By the CLT, this assumption is less critical in large samples.


Cross-Sectional Regression

A regression using data from multiple subjects (firms, individuals, countries) observed at a single point in time.

LOS: 10.i | Related: Cross-Sectional Data


Time-Series Regression

A regression using data from a single subject observed at multiple points in time.

LOS: 10.i | Concern: Potential serial correlation in errors. | Related: Time Series Data


Indicator Variable

A binary variable that takes the value 1 if a condition is met and 0 otherwise. Used to represent categorical variables in regression. Also called a dummy variable.

LOS: 10.j | Application: Capturing structural differences between groups (e.g., recession vs. non-recession periods).


Log-Lin Model

A regression model where the dependent variable is in log form and the independent variable is in level form.

LOS: 10.k | Interpretation: A one-unit increase in is associated with a change in .


Lin-Log Model

A regression model where the dependent variable is in level form and the independent variable is in log form.

LOS: 10.k | Interpretation: A 1% increase in is associated with a unit change in .


Log-Log Model

A regression model where both the dependent and independent variables are in log form.

LOS: 10.k | Interpretation: A 1% increase in is associated with a change in . The slope is the elasticity of with respect to .