Linear Regression

2 min readOct 23, 2020

Linear Regression

What are the main assumptions of a linear regression?

(1) relationship between y and x is linear (2) residual error from the regression fit are all normally distributed

What are the most common types of linear regression?/What are the most common estimation techniques for linear regression?

(1) Ordinary least squares (2) Generalized least squares (3) penalized least squares: L1(LASSO), L2(Ridge)

Logistic Regression

Formula and how to use for binary classification?

f(x) = delta(t) = 1/(1+e^-t) , t=beta0 + beta1*x

make (-inf), inf to (0, 1)

use the result of f(x) as the probability of the data point being in the positive class.

Decision Tree

How does a decision tree decide on its splits?

To find the feature that best splits the target class into the purest possible children nodes, use information gain to decide on the splitting criterion.

This measure of purity is called the information. It represents the expected amount of information that would be needed to specify whether a new instance should be classified 0 or 1, given the example that reached the node.

Entropy is a measure of impurity(the opposite). It is defined for a binary class with values a/b as: -p(a)*log(p(a))-p(b)*log(p(b)).

By comparing the entropy before and after the split, we obtain a measure of information gain, or how much information we gained by doing the split using that particular feature:

information_gain = entropy_before — entropy_after

What advantages does a decision tree has over other machine learning methods?

Very easy to interpret and understand

works on both continuous and categorical feature

no normalization or scaling necessary

prediction algorithm runs very fast, especially on big data

Random forest vs gradient boosting etc

Boosting trees: iterative algorithm where each execution is based on previous results. (1) reassign weights to samples based on the results of previous iterations of classifications. (2) harder to classify points get weighted more.

random forest: (1) apply bootstrap aggregation to train many different trees. (2) this creates an ensemble of different individual decision trees.

In random forest algorithm, instead of using information gain or gini index for calculating the root node, the process of finding the root node and splitting the feature nodes will happen randomly.

Naive Bayes methods

Given data set of features X and labels y, what assumptions are made?

Each feature is independent of each other.

SVM

How SVM works?

It attempt to find a hyperplane that separates classes by maximizing the margin.

It can employ the kernel trick which can map linear non-separable inputs into a higher dimension where they become more easily separable.

It can perform non-linear classification.

Overfitting

what is overfitting? what cause it? How to avoid?

It overfit the training data, does not generalize well.

Increase training data size; regularization; early stopping; k-fold cross validation.

accuracy, precision and recall

Differences?

Accuracy = (TP + TN)/(TP+FP+FN+TN)

Precision = TP/(TP++FP) the ratio of correctly predicted positive observations to the total predicted positive observations.

Recall (Sensitivity) = TP/(TP+FN) the ratio of correctly predicted positive observations to the all observations in actual class.

Written by yueyuan