Logistic regression statistics assignment

Logistic regression statistics assignment
Implement the logistic regression algorithm using Gradient Descent in
analogy with neural networks (as described in the lectures, Chapter 4, or
[1], Sections 11.3–11.5) for p attributes. The objective function, which
your program should minimize, can be either the training MSE
MSE := 1
(yi − p(xi))2
(cf. “Brier loss” on slide 99 in Chapter 4) or the negative log-likelihood
NLL := −
log p(xi) −
log(1 − p(xi))
(as described on slide 44 of Chapter 4); the choice is yours. (Be careful
not to confuse p as the number of attributes and p(x) as the predicted
probability of 1.) Your program can be written either in R or in MATLAB. You are not allowed to use any existing implementations of logistic

Logistic regression statistics assignment
regression or Gradient Descent in R, MATLAB, or any other language,
and should code logistic regression from first principles. However, you are
allowed to set the number of attributes p to a specific value that allows
you to do the following tasks.
2. Use the Auto data set. Create a new variable high that takes values
high := (
1 if mpg ≥ 23
0 otherwise (i.e., if mpg ≤ 22).
Apply your program to the Auto data set and new variable to predict
high given horsepower, weight, year, and origin. (In other words,
high is the label and horsepower, weight, year, and origin are the
attributes.) Since origin is a qualitative variable, you will have to create
appropriate dummy variables. Normalize the attributes, as described in
Lab Worksheet 4, Section 5 or Exercise 9 (or Section 4.6.6 of [2]).
3. Split the data set randomly into two equal parts, which will serve as the
training set and the test set. Use your birthday (in the format MMDD)
as the seed for the pseudorandom number generator. The same training
and test sets should be used throughout this assignment.
4. Train your algorithm on the training set using independent random numbers in the range [−0.7, 0.7] as the initial weights. Find the MSE on the
test set (it is defined by (1) except that the average should be over the
test rather than training set). Try different values of the learning rate η
and of the number of training steps (so that your stopping rule is to stop
after a given number of steps). Give a small table of test and training
MSEs in your report.
Optional: Try different stopping rules, such as: stop when the value of
the objective function (the training MSE) does not change by more than
1% of its initial value over the last 10 training steps.
6. Run your logistic regression program for a fixed value of η and for a fixed
stopping rule (producing reasonable results in your experiments so far)
100 times, for different values of the initial weights (produced as above,
as independent random numbers in [−0.7, 0.7]). In each of the 100 cases
compute the test MSE and show it in your report as a boxplot.
7. Optional: Redo the experiments in items 4–5 modifying the training
procedure as follows. Instead of training logistic regression once using
Gradient Descent, train it 4 times using Gradient Descent with different
values of the initial weights and then choose the prediction rule with the
best training MSE.

Place an order with us on https://assignmentsproficient.com/ and expect the best grades and help from qualified writers

"Looking for a Similar Assignment? Order now and Get 10% Discount! Use Code "Newclient"