Logistic Regression

1. Explain logistic regression.

A statistical model used for classification and estimates the probability of an event occurring - usually binary (dichotomous) outcomes, such as win or loss, Yes or no.

2. Explain the main difference between linear regression and logistic regression.

Like linear regression, logistic regression is also used to determine the relationship between one or more independent variables and a dependent variable. However, it is applied to make prediction on a categorical variable as opposed to a continuous one.

3. Why can’t we use the Ordinary Least Squares (OLS) method to estimate the best-fitting line for logistic regression?

In logistic regression, the probabilities are transformed into log(odds) scale to fit a linear model. This transformation allows for a range of values from negative infinity to positive infinity, making it compatible with linear modeling. However, binary outcomes can lead to some data points approaching values near positive and negative infinity, which makes it difficult to calculate the residuals. Consequently, the Ordinary Least Squares (OLS) method is not suitable, as it relies on the summation of residuals to estimate the best-fitting line.

4. Given the logistic regression equation

y = - 2 + 1.5 w

. Where,

w

is the weight and we are trying to estimate the log(odds of obesity). What is the log(odds of obesity) when

w = 0

If $w$ is 0, the log(odds of obesity) is -2.

5. Given the logistic regression equation

y = - 2 + 1.5 w

. Where,

w

is the weight and we are trying to estimate the log(odds of obesity). Interpret the slope 1.5.

For every one unit of weight gained, the log(odds of obesity) increases by 1.5.

6. Provide the logistic regression.

The logistic regression is a sigmoid function that compresses the space from 0 to 1. This allows the output to be interpreted as the probability of belonging to a specific class.

$p (x) = \frac{1}{1 + e^{- (β_{0} + β_{1} x)}}$

7. State the method to estimate the best-fitting line for logistic regression.

Maximum Likelihood estimator.

8. For logistic regression, provide the steps to estimate the best fitting line.

Transform the probabilities into log(odds) to enable linear modelling.
Fit a line in the logit space (aka log-odds space).
Project the original data points onto the line to calculate the log(odds) values.
Transform the log(odds) into probabilities.
Calculate the log likelihood for the observed status.
Repeat the steps 1-5:
- Iterate until the maximum likelihood is found and that would be the best fitted line.

9. In logistic regression, what is the purpose of transforming the probabilities into log(odds)?

The purpose is to model the relationship between the independent variables and the binary variable in a way that allows for linear modelling. By taking the logarithm of the odds, we convert the non-linear probability values (ranging from 0 to 1) into a linear scale that can be analysed using linear regression techniques. This linearity simplifies the estimation of coefficients and makes the model interpretable.

10. Provide the formula to transform the probabilities into log(odds).

The probability is transformed to the log(odds) scale

$\log (o d d s) = \log (\frac{p}{1 - p})$

Where, $p$ is the probability observed.

11. Given the log of odds function

\log (o d d s) = \log (\frac{p}{1 - p})

, we need to predict the probabilities instead of the log of odds. Detail the steps to convert log(odds) into probabilities.

Exponentiate both sides:

$e^{\log (\frac{p}{1 - p})} = e^{\log (o d d s)}$ $\frac{p}{1 - p} = e^{\log (o d d s)}$
- Using the rule: $e^{l n (x)} = x$ , we get $\frac{p}{1 - p} = e^{\log (o d d s)}$
Rearrange the equation to solve for $p .$

$\frac{p}{1 - p} = e^{\log (o d d s)}$ $p = (1 - p) e^{\log (o d d s)}$ $p = e^{\log (o d d s)} - p (e^{\log (o d d s)})$ $p + p (e^{\log (o d d s)}) = e^{\log (o d d s)}$ $p (1 + e^{\log (o d d s)}) = e^{\log (o d d s)}$ $p = \frac{e^{\log (o d d s)}}{1 + e^{\log (o d d s)}}$

12. Given that the log(odds) = -3.48, convert the log(odds) into a probability.

p = \frac{e^{\log (o d d s)}}{1 + e^{\log (o d d s)}}

e^{- 3.48} = 0.031

p = \frac{0.031}{1 + 0.031} = 0.03

13. Transform the equation

p = \frac{e^{\log (o d d s)}}{1 + e^{\log (o d d s)}}

into sigmoid function, which is the generalised logistic regression equation.

$p = \frac{e^{\log (o d d s)}}{1 + e^{\log (o d d s)}}$

$p = \frac{1}{1 + e^{- \log (o d d s)}}$

$p (x) = \frac{1}{1 + e^{- (β_{0} + β_{1} x)}}$

Knowing that $\frac{1}{1 + e^{- x}} = \frac{e^{x}}{1 + e^{x}}$
Substitute $\log (o d d s)$ with $β_{0} + β_{1} x$

Supplementary questions to help you understand the concepts better.

1. What is the log(odds) when the probability of an outcome is 1?

$\log (o d d s) = \log (\frac{p}{1 - p})$ $l o g (\frac{1}{1 - 1}) = \log (\frac{1}{0}) = \log (1) - \log (0)$

$\log (0)$ is defined as negative infinity. $\log (1) - - \infty = \infty$ so the whole term will be positive infinity.

2. What is the log(odds) when the probability of an outcome is 0?

$\log (o d d s) = \log (\frac{p}{1 - p})$ $l o g (\frac{0}{1 - 0}) = \log (\frac{0}{1}) = \log (0) - \log (1)$

$\log (0)$ is defined as negative infinity. $- \infty + \log (1) = - \infty$ so the whole term will be negative infinity.

3. Explain the use-case of maximum likelihood estimator (MLE).

The goal of MLE is to determine the optimal parameters for a chosen probability distribution that best explain the observed data. It involves identifying the type of distribution that fits the data and finding the specific parameter values that maximise the likelihood of observing the given data. Simply put, MLE helps us figure out the best-fitting distribution and where to center it, making the observed data most probable under that distribution.

4. Explain odds and provide the equation.

Odds provide a measure of the likelihood of a particular outcome by expressing the likelihood of an event happening compared to the likelihood of it not happening.

$o = \frac{p}{(1 - p)}$

5. Explain the difference between odds and probabilities

Probability is a measure of how likely an event is to occur, expressed as a value between 0 and 1. Whereas, odds represent the ratio of the probability of an event occurring to the probability of it not occurring.

6. Explain why do we use log odds?

To solve the asymmetric issues in odds. Odds that are against will be bounded to 0 and 1. However odds are in favour can range between 1 to infinity.
Therefore, using log of odds makes the final values symmetric. making it easier to interpret and perform statistical analysis on the values.
For example, applying log to the odds are against 1 to 6 and odds are in favour 6 to 1, will have result..

$\log (1 / 6) = - 1.79$ $l o g (6 / 1) = 1.79$

7. Prove

e^{l n (x)} = x

e^{l n (x)} = x

\ln (e^{l n (x)}) = \ln (x)

\ln (x) \times l n (e) = \ln (x)

\ln (x) \times 1 = \ln (x)

x = x

8. Prove

\frac{1}{1 + e^{- x}} = \frac{e^{x}}{1 + e^{x}}

\frac{1}{1 + e^{- x}} = \frac{1}{1 + \frac{1}{e^{x}}}

\frac{1}{1 + \frac{1}{e^{x}}} \times \frac{e^{x}}{e^{x}} = \frac{e^{x}}{1 + e^{x}}

Last updated on 20 Jan 2024