Glossary: M10 — Simple Linear Regression
Module: M10 Formulas: Formula Sheet Concept page: Regression Concept
Simple Linear Regression
A statistical model that estimates the linear relationship between one Dependent Variable and one Independent Variable. The estimated model form is:
LOS: 10.a | Purpose: Explain variation in , test whether helps explain , and make predictions. | Related: Regression Concept
Dependent Variable
The variable whose variation is being explained or predicted in a regression model. Placed on the left-hand side of the regression equation. Also called the response variable or regressand.
LOS: 10.a | Notation: | Contrast: Independent Variable
Independent Variable
The variable used to explain or predict the dependent variable. Placed on the right-hand side of the regression equation. Also called the explanatory variable, predictor, or regressor.
LOS: 10.a | Notation: | Contrast: Dependent Variable
Intercept
The estimated value of the dependent variable when the independent variable equals zero. The regression line’s crossing point on the -axis.
LOS: 10.a | Notation: | Caution: The intercept may not have a meaningful economic interpretation if is outside the data range.
Slope Coefficient
The estimated change in the dependent variable associated with a one-unit increase in the independent variable.
LOS: 10.a | Interpretation: If , a one-unit increase in is associated with a 2.5-unit increase in , on average.
Ordinary Least Squares (OLS)
The most common method for estimating regression coefficients. Minimizes the sum of squared residuals (vertical distances between observed and fitted values).
LOS: 10.b | Property: Produces the Best Linear Unbiased Estimator (BLUE) when classical assumptions hold.
Residual
The difference between the actual value and the fitted value for observation . Also called the estimated error.
LOS: 10.b | Key: OLS minimizes . Residuals sum to zero: .
Regression Line
The estimated line that minimizes the sum of squared residuals. Passes through the point .
LOS: 10.b | Related: Ordinary Least Squares (OLS)
Sum of Squares Total (SST)
The total variation in the dependent variable around its mean. Decomposed into explained and unexplained variation.
LOS: 10.c | Related: Sum of Squares Regression (SSR), Sum of Squares Error (SSE)
Sum of Squares Regression (SSR)
The variation in explained by the regression model — the portion attributable to the independent variable.
LOS: 10.c | Also called: Explained sum of squares (ESS).
Sum of Squares Error (SSE)
The unexplained variation in — the portion not captured by the regression model.
LOS: 10.c | Also called: Residual sum of squares (RSS).
Coefficient of Determination (R²)
The proportion of the total variation in the dependent variable explained by the independent variable. Ranges from 0 to 1.
LOS: 10.d | Interpretation: means 72% of the variation in is explained by . | Note: In simple linear regression, (squared Pearson correlation).
Standard Error of Estimate (SEE)
A measure of the average magnitude of the regression residuals. Indicates how closely the regression line fits the data.
LOS: 10.d | Key: Lower SEE → better fit. Used to construct prediction intervals. | Related: Mean Square Error (MSE)
Mean Square Regression (MSR)
The average explained variation per degree of freedom used by the regression model.
where = number of independent variables (= 1 for simple linear regression).
LOS: 10.e | Related: F-Statistic
Mean Square Error (MSE)
The average unexplained variation per degree of freedom — an estimate of the error variance.
LOS: 10.e | Related: Standard Error of Estimate (SEE)
ANOVA (Analysis of Variance)
A table that partitions the total variation in into explained (regression) and unexplained (error) components. Used to test the overall significance of the regression model.
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Regression | SSR | MSR | MSR/MSE | |
| Error | SSE | MSE | ||
| Total | SST |
LOS: 10.e | Related: F-Statistic, F-Test
F-Statistic
The ratio of mean square regression to mean square error. Tests the null hypothesis that all regression slope coefficients are zero (model has no explanatory power).
LOS: 10.e | Decision: Reject if (upper-tailed test). | In simple regression: .
t-Test for Slope
Test of whether the slope coefficient is statistically significantly different from zero (or another hypothesized value).
LOS: 10.f | : (no linear relationship between and ).
t-Test for Correlation
Test of whether the population correlation coefficient is statistically significantly different from zero.
LOS: 10.f | Note: In simple linear regression, this test is equivalent to the t-Test for Slope.
t-Test for Intercept
Test of whether the intercept is statistically significantly different from zero.
LOS: 10.f | Note: Often less economically meaningful than the slope test.
Prediction Interval
An interval estimate for an individual value of given a specific value of . Wider than a confidence interval because it must account for both model uncertainty and individual error.
LOS: 10.g | Related: Standard Error of Forecast
Standard Error of Forecast
The standard deviation of the forecast error for a predicted value of at a given . Larger when is far from or sample size is small.
LOS: 10.g | Related: Standard Error of Estimate (SEE), Prediction Interval
Homoskedasticity
The assumption that the variance of regression errors is constant across all values of the independent variable. A required classical regression assumption.
LOS: 10.h | Contrast: Heteroskedasticity
Heteroskedasticity
Violation of the Homoskedasticity assumption — the variance of the regression errors is not constant across all observations. Common in financial data.
LOS: 10.h | Consequence: OLS standard errors are biased → unreliable - and -statistics. | Detection: Breusch-Pagan test; visual inspection of residual plot.
Linearity Assumption
The assumption that the relationship between the dependent and independent variables is linear. Required for OLS estimates to be unbiased.
LOS: 10.h | Violation remedy: Transform variables using Log-Lin Model, Lin-Log Model, or Log-Log Model.
Independence Assumption
The assumption that the regression errors are uncorrelated with each other and with the independent variable. Violation (serial correlation) is common in time-series data.
LOS: 10.h | Detection: Durbin-Watson test for serial correlation.
Normality Assumption
The assumption that the regression errors are normally distributed. Required for valid inference in small samples (t- and F-tests).
LOS: 10.h | Note: By the CLT, this assumption is less critical in large samples.
Cross-Sectional Regression
A regression using data from multiple subjects (firms, individuals, countries) observed at a single point in time.
LOS: 10.i | Related: Cross-Sectional Data
Time-Series Regression
A regression using data from a single subject observed at multiple points in time.
LOS: 10.i | Concern: Potential serial correlation in errors. | Related: Time Series Data
Indicator Variable
A binary variable that takes the value 1 if a condition is met and 0 otherwise. Used to represent categorical variables in regression. Also called a dummy variable.
LOS: 10.j | Application: Capturing structural differences between groups (e.g., recession vs. non-recession periods).
Log-Lin Model
A regression model where the dependent variable is in log form and the independent variable is in level form.
LOS: 10.k | Interpretation: A one-unit increase in is associated with a change in .
Lin-Log Model
A regression model where the dependent variable is in level form and the independent variable is in log form.
LOS: 10.k | Interpretation: A 1% increase in is associated with a unit change in .
Log-Log Model
A regression model where both the dependent and independent variables are in log form.
LOS: 10.k | Interpretation: A 1% increase in is associated with a change in . The slope is the elasticity of with respect to .