linear regression python implementation
This is a python implementation of Linear Regression. This repository is based on linear_regression_demo. The original code doesn’t actually implement all the fundamental codes. Of course, we can use libraries like sklearn to use linear regression. But I hope this codes help you study the basic principal of it.
To use the demo.py
which uses sklearn
, you need to install it first. For visualization, you will need matplotlib
. And linear-regression.py
uses numpy
libraries for data-processing. So you need to install three dependencies below.
You can just run pip install -r requirements.txt
in terminal to install the necessary dependencies. Here is a link to pip if you don’t already have it.
Type python demo.py
into terminal and you’ll see the scattered data and best fit line. demo.py
uses sklearn
library for learning.
If you want learn how the linear regression works, you’d better read the linear-regression.py
which implement the model with numpy
. There are three hyper parameters.
epoch : The number times that the learning algorithm will work through the entire training dataset.
learning rate : How much the weights would be updated.
train data ratio : How much to use as learning data. The rest of it would be used for validating.
I set default values for the hyper parameters, but I recommend you to edit them and observe the changes.
Because of train data ratio
, there could be a little difference between demo.py
and linear-regression.py
.
Type python linear-regression.py data_name.txt
into terminal.
In fire_and_theft.txt
, you can find two columns of float data that indicate fire and theft occurrence in Chicago.
With demo.py
and fire_and_theft.txt
, you will get a plot below.
With linear-regression.py
and fire_and_theft.txt
, you will get plots below.
We are going to find out the relationship between fire and theft occurrence. Specifically, we will get a function of y = w*x + b
which defines the relationship. In this case, you will get a single value of w
and b
.
w
stands for weightb
stands for bias
If you are using single attribute input, it will make a graph image of scattered data and result function.
I tried to make the model be able to be applied even if there are multiple attributes.
In pm2.5.txt
, you can find several columns. Attribute information is below.
month, day, hour is what it is.
DEWP : Dew Point
TEMP : Temperature
PRES : Pressure
lws : Cumulated wind speed
lr : Cumulated hours of rain
We will say the number of attributes as k
. With pm2.5.txt
, k
will be 8. Then, we’ll try to figure out the formular of y = w1*x1 + w2*x2 + w3*x3 + w4*x4 + w5*x5 + w6*x6 + w7*x7 + w8*x8 + b
.
All w
s are handled by a matrix. So don’t be confused.
When you test pm2.5.text
with linear regression, you will find that it’s really hard to stabilize the model. It’s because of the scale difference among the attributes. So you need to pre-process the data.
pm2.5_normalized.txt
is literally normalized data. You will get a better result with this data.
If you are trying without display, you can run without_display.py
. It will create image files of graph.png
and cost.png
so that you can check to result later.
Enjoy the codes.