﻿ Logistic Regression - Data Science from Scratch: First Principles with Python (2015) ﻿

## Data Science from Scratch: First Principles with Python (2015)

### Chapter 16.Logistic Regression

A lot of people say there’s a fine line between genius and insanity. I don’t think there’s a fine line, I actually think there’s a yawning gulf.

Bill Bailey

# The Problem

``x` `=` `[[``1]` `+` `row[:2]` `for` `row` `in` `data]`  `# each element is [1, experience, salary]``
``y` `=` `[``row[2]` `for` `row` `in` `data]`         `# each element is paid_account``

``rescaled_x` `=` `rescale(x)``
``beta` `=` `estimate_beta(rescaled_x,` `y)`  `# [0.26, 0.43, -0.43]``
``predictions` `=` `[``predict(x_i,` `beta)` `for` `x_i` `in` `rescaled_x]``
` `
``plt.scatter(predictions,` `y)``
``plt.xlabel("predicted")``
``plt.ylabel("actual")``
``plt.show()``

§  We’d like for our predicted outputs to be 0 or 1, to indicate class membership. It’s fine if they’re between 0 and 1, since we can interpret these as probabilities — an output of 0.25 could mean 25% chance of being a paid member. But the outputs of the linear model can be huge positive numbers or even negative numbers, which it’s not clear how to interpret. Indeed, here a lot of our predictions were negative.

§  The linear regression model assumed that the errors were uncorrelated with the columns of x. But here, the regression coefficent for `experience` is 0.43, indicating that more experience leads to a greater likelihood of a premium account. This means that our model outputs very large values for people with lots of experience. But we know that the actual values must be at most 1, which means that necessarily very large outputs (and therefore very large values of `experience`) correspond to very large negative values of the error term. Because this is the case, our estimate of`beta` is biased.

# The Logistic Function

``def` `logistic(x):``
`    `return` `1.0` `/` `(``1` `+` `math.exp(-x))``
``def` `logistic_prime(x):``
`    `return` `logistic(x)` `*` `(``1` `-` `logistic(x))``

``def` `logistic_log_likelihood_i(x_i,` `y_i,` `beta):``
`    `if` `y_i` `==` `1:``
`        `return` `math.log(logistic(dot(x_i,` `beta)))``
`    `else:``
`        `return` `math.log(1` `-` `logistic(dot(x_i,` `beta)))``
``def` `logistic_log_likelihood(x,` `y,` `beta):``
`    `return` `sum(logistic_log_likelihood_i(x_i,` `y_i,` `beta)``
`               `for` `x_i,` `y_i` `in` `zip(x,` `y))``
``def` `logistic_log_partial_ij(x_i,` `y_i,` `beta,` `j):``
`    `"""here i is the index of the data point,``
``    j the index of the derivative"""``
` `
`    `return` `(``y_i` `-` `logistic(dot(x_i,` `beta)))` `*` `x_i[j]``
` `
``def` `logistic_log_gradient_i(x_i,` `y_i,` `beta):``
`    `"""the gradient of the log likelihood``
``    corresponding to the ith data point"""``
` `
`    `return` `[``logistic_log_partial_ij(x_i,` `y_i,` `beta,` `j)``
`            `for` `j,` `_` `in` `enumerate(beta)]``
` `
``def` `logistic_log_gradient(x,` `y,` `beta):``
`    `return` `reduce(vector_add,``
`                  `[``logistic_log_gradient_i(x_i,` `y_i,` `beta)``
`                   `for` `x_i,` `y_i` `in` `zip(x,y)])``

# Applying the Model

``random.seed(0)``
``x_train,` `x_test,` `y_train,` `y_test` `=` `train_test_split(rescaled_x,` `y,` `0.33)``
` `
``# want to maximize log likelihood on the training data``
``fn` `=` `partial(logistic_log_likelihood,` `x_train,` `y_train)``
``gradient_fn` `=` `partial(logistic_log_gradient,` `x_train,` `y_train)``
` `
``# pick a random starting point``
``beta_0` `=` `[``random.random()` `for` `_` `in` `range(3)]``
` `
``# and maximize using gradient descent``
``beta_hat` `=` `maximize_batch(fn,` `gradient_fn,` `beta_0)``
``beta_hat` `=` `maximize_stochastic(logistic_log_likelihood_i,``
`                               `logistic_log_gradient_i,``
`                               `x_train,` `y_train,` `beta_0)``
``beta_hat` `=` `[-``1.90,` `4.05,` `-``3.87]``
``beta_hat_unscaled` `=` `[``7.61,` `1.42,` `-``0.000249]``

# Goodness of Fit

``true_positives` `=` `false_positives` `=` `true_negatives` `=` `false_negatives` `=` `0``
` `
``for` `x_i,` `y_i` `in` `zip(x_test,` `y_test):``
`    `predict` `=` `logistic(dot(beta_hat,` `x_i))``
` `
`    `if` `y_i` `==` `1` `and` `predict` `>=` `0.5:`  `# TP: paid and we predict paid``
`        `true_positives` `+=` `1``
`    `elif` `y_i` `==` `1:`                   `# FN: paid and we predict unpaid``
`        `false_negatives` `+=` `1``
`    `elif` `predict` `>=` `0.5:`             `# FP: unpaid and we predict paid``
`        `false_positives` `+=` `1``
`    `else:`                            `# TN: unpaid and we predict unpaid``
`        `true_negatives` `+=` `1``
` `
``precision` `=` `true_positives` `/` `(``true_positives` `+` `false_positives)``
``recall` `=` `true_positives` `/` `(``true_positives` `+` `false_negatives)``
``predictions` `=` `[``logistic(dot(beta_hat,` `x_i))` `for` `x_i` `in` `x_test]``
``plt.scatter(predictions,` `y_test)``
``plt.xlabel("predicted probability")``
``plt.ylabel("actual outcome")``
``plt.title("Logistic Regression Predicted vs. Actual")``
``plt.show()``
Support Vector Machines
For Further Investigation

§  scikit-learn has modules for both Logistic Regression and Support Vector Machines.

§  libsvm is the support vector machine implementation that scikit-learn is using behind the scenes. Its website has a variety of useful documentation about support vector machines.

﻿

﻿