Linear Regression
1. Define Regression Analysis.
A statistical method used for modelling the relationship between at least one independent variables (features) and a dependent variable (target variable).
2. Define multiple linear regressions.
When there are two or more independent variables it is referred to as multiple linear regression.
3. What are some of the linear regression use-cases?
- Evaluating the relationship between variables.
- Prediction
- Forecasting
4. What is the main objective of the linear regression?
The goal of linear regression is to find the best-fitting linear equation that represents the relationship between the dependent and independent variables.
5. What is the main assumption about the relationship between dependent variable and the independent variable?
Linear regression assumes that the relationship between the dependent variable and the independent variables is linear. This means that a change in the independent variable(s) will result in a proportional change in the dependent variable.
6. List the assumptions that linear regression makes.
- Linearity: The relationship between variables is linear.
- Independence: The residuals are independent of each other.
- Homoscedasticity: The variance of the residuals is constant across all levels of the independent variable(s).
- Normality: The residuals follow a normal distribution.
7. What does the linear regression coefficients interpretation represent?
The coefficients in a linear regression model have clear and intuitive interpretations. They represent the change in the dependent variable associated with a one-unit change in the independent variable while holding all other variables constant.
8. List the advantages of linear regression.
- Efficient.
- Simple and intuitive.
- Easy to interpret model’s result.
9. Explain the disadvantage of the linearity assumption in linear regression.
Linear regression assumes that the relationship between the independent and dependent variables is linear. If the relationship is non-linear, linear regression may provide inaccurate results. Additionally, linear regression may not capture complex relationships in the data, such as interactions between variables or non-linear patterns. In such cases, more advanced models may be necessary.
10. Explain the disadvantage of linear regression being sensitive to outliers.
Linear regression can be sensitive to outliers, meaning that extreme data points can disproportionately influence the model’s parameters and predictions.
11. Explain the disadvantage of assuming that the errors in linear regression are independent.
Linear regression assumes that the errors (residuals) are independent and have constant variance (homoscedasticity). Violations of these assumptions can lead to unreliable results.
12. Explain the disadvantage of multicollinearity in linear regression.
When independent variables in a linear regression model are highly correlated, it can lead to multicollinearity issues, making it difficult to interpret the individual contributions of each variable.
13. List the disadvantages of linear regression.
- Linearity assumption.
- Limited to linear relationships.
- Sensitivity to outliers.
- Independence of errors.
- Multicollinearity.
14. A simple linear regression can be represented by the equation $Y = \beta_0 + \beta_1X + \epsilon$. Explain the individual terms and their factors.
- $Y$ represents the dependent variable.
- $X$ represents the independent variable.
- $\beta_0$ is the intercept, which represents the value of Y when X is 0.
- $\beta_1$ is the slope coefficient, which represents the change in Y for a one-unit change in X.
- $\epsilon$ represents the error term, which accounts for the variability in Y that is not explained by the linear relationship with X.
15. A simple linear regression can be represented by the equation $Y = \beta_0 + \beta_1X + \epsilon$. State and explain the method used to estimate the coefficients.
Least squares method is used to estimate the coefficients by minimising the sum of squared differences between the predicted values by the regression equation and the actual values of the dependent variables.
16. Explain the term residuals.
Residuals are the differences between the observed values of the dependent variable and the values predicted by the regression equation.
17. List the metrics used to evaluate linear regressions.
- R-Squared
- Adjusted R-Squared
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- Mean Absolute Error (MAE)
- Akaike Information Criterion (AIC)
- Bayesian Information Criterion (BIC)