Logistic Regression

yueyuan
1 min readJul 30, 2020

Part 1: Coefficient

Part 2

Part 3: R-squared and p-value

In linear regression, R-squared and p value are calculated using the residuals. In brief, we square the residuals and then add them up. We can this SS(fit), for sum of squares for the residuals around the best fitting line. And we compare that to the sum of squared residuals around the worst fitting line, the mean of the y-axis values. We call this SS(mean).

R-squared is the percentage of variation around the mean that goes away when you fit a line to the data.

R-square = SS(mean)-SS(fit)/SS(mean)

It goes from 0 to 1.

Difference with linear regression

One big difference between linear regression and logistic regression is how the line is fit to the data.

With linear regression, we fit the line using “least squares”. In other words, we find the line that minimizes the sum of the squares of these residuals. We also use the residuals to calculate R² and to compare simple models to complicated models.

For logistic regression, like linear regression, we need to find a measure of a good fit to compare to a measure of a bad fit. Unfortunately, the residuals for logistic regression are all infinite, so we can’t use them. But we can project the data onto the best fitting line and then we translate the log(odds) back to probabilities. Lastly, calculate the log-likelihood of the data given the best fitting squiggle.

--

--