ML Linear Regression

October 02, 2020

Points and Lines

In the last exercise, you were probably able to make a rough estimate about the next data point for Sandra’s lemonade stand without thinking too hard about it. For our program to make the same level of guess, we have to determine what a line would look like through those data points.

A line is determined by its slope and its intercept. In other words, for each point y on a line we can say:

 $y = m x + b$

where m is the slope, and b is the intercept. y is a given point on the y-axis, and it corresponds to a given x on the x-axis.

The slope is a measure of how steep the line is, while the intercept is a measure of where the line hits the y-axis.

When we perform Linear Regression, the goal is to get the “best” m and b for our data.

import codecademylib3_seaborn
import matplotlib.pyplot as plt
months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
revenue = [52, 74, 79, 95, 115, 110, 129, 126, 147, 146, 156, 184]

#slope:
m = 10
#intercept:
b = 55

y = [m*month + b for month in months]

plt.plot(months, revenue, "o")
plt.plot(months,y)

plt.show()
To make the line steeper, increase the m value. To move the line up, increase the b value.

Loss
When we think about how we can assign a slope and intercept to fit a set of points, we have to define what the best fit is.
For each data point, we calculate loss, a number that measures how bad the model’s (in this case, the line’s) prediction was. You may have seen this being referred to as error.
We can think about loss as the squared distance from the point to the line. We do the squared distance (instead of just the distance) so that points above and below the line both contribute to total loss in the same way:
In this example:
For point A, the squared distance is 9 (3²)
For point B, the squared distance is 1 (1²)
So the total loss, with this model, is 10. If we found a line that had less loss than 10, that line would be a better model for this data.


x = [1, 2, 3]
y = [5, 1, 3]

#y = x
m1 = 1
b1 = 0

#y = 0.5x + 1
m2 = 0.5
b2 = 1
y_predicted1 = [m1*x_value + b1 for x_value in x]
y_predicted2 = [m2*x_value + b2 for x_value in x]
total_loss1 = 0
total_loss2 = 0

for i in range(len(y)):
  total_loss1 += (y[i] - y_predicted1[i]) ** 2

for i in range(len(y)):
  total_loss2 += (y[i] - y_predicted2[i]) ** 2

print(total_loss1)
print(total_loss2)
better_fit = 2

Search This Blog

Python Data Science Coding Reference

ML Linear Regression

Comments

Post a Comment

Popular posts from this blog

Binomial Test in Python

Python Syntax and Functions Part2 (Summary Statistics)

Slicing and Indexing in Python Pandas