项目作者: satishrath185

项目描述 :
Analyzing Bike sharing trend and predict demand of Bike Rental
高级语言: Jupyter Notebook
项目地址: git://github.com/satishrath185/Bike-Sharing.git
创建时间: 2020-08-21T07:14:11Z
项目社区:https://github.com/satishrath185/Bike-Sharing

开源协议:

下载


Bike-Sharing

Analyzing Bike sharing trend and demand forecasting

Business Value

Health and environment becoming trending topics, usage of bicycles as a mode of transportation has gained traction in recent years. To encourage bike usage, cities across the world have successfully rolled out bike sharing programs. Under such schemes, riders can rent bicycles using manual automated kiosks spread across the city for defined periods. In most cases, riders can pick up bikes from one location and return them to any other designated place.
The bike sharing platforms from across the world are hotspots of all sorts of data, ranging from travel time, start and end location, demographics of riders, and so on. This data along with alternate sources of information such as weather, traffic, terrain, and so on makes it an attractive proposition for different research areas.

Problem Statement

To predict demand of bike rental for this program

Data

Each row of data represents a bike demand under the given for the given attrubutes and each column contains represents the attributes.

dteday : Date

Season : Season of the year

yr : we have data for two years, so this represents the year to which the specific data belongs to

mnth : Month of the Year

hr : Time/Hour of the day (24 hours format)

yr : Whether the day was a holiday

weekday : Day of the week

workingday : Whether the day was a working day or a holiday

weathersit , temp , atemp , hum , windspeed : Weather conditions for the day

casual , registered : Non-Registered and registered users

cnt : total count/demand for the particular condition on the date

Approach

  • Loading Data

    • Renaming attributes
    • Performing type casting
  • Data Exploration and Visualization

  • Spliting Data for Train and test

  • Data Preprocessing

    • Label Encoding

    • One Hot Encoding of Categorical Values

  • Training Model

    • Linear Regression
    • Decision Tree

In order to measure the performance of the model, Mean Squared Error and R-square for the model is used.

Data Exploration and Visualization

Data Info

Data Distribution

Seasonwise Demand

Daily Hourly Distribution of Demand

Monthly Distribution of Demand

Year Wise Demand

Demand for Registered users and weather conditions

Outlier for Demand Hourswise

Working day and Holiday Demand Distribution

Correlation Heat Map of All Features

Model Building and Training

  1. Linear Regression

    Training Data

    Cross validation

    R-squared::[0.26906982 0.22635143 0.26170837 0.25691577 0.32148609 0.31196355 0.26400442 0.28760857 0.27964318 0.31554976]

    MSE :: [-23850.18552648 -24978.92079695 -24408.88562232 -22517.19952749 -22117.37661082 -24746.90893471 -25327.37663767 -25328.57026724 -25094.24038362 -21876.91293786]

  1. __Testing Model__
  2. __R-squared__ ::0.2867680055878652
  3. __MSE__: 22753.14
  1. Decision Tree Model

    Training Data

    Grid search with Cross validation

    R-Squared::0.29472840660973104

    Best Hyperparameters::{‘criterion’: ‘mse’, ‘max_depth’: 8, ‘max_leaf_nodes’: 500, ‘min_samples_leaf’: 20, ‘min_samples_split’: 5}

    Avg R-squared::0.30744642968000135

    MSE::-23089.90043082429

  1. __Testing Model__
  2. __R-squared__::0.3238351376141839
  3. __MSE__: 21570.64

Conclusion

From the Model Comparison we see that Decision Tree Model has higher R-squared value and lower Mean Square Error over the Linear Regressioni Model. Hence it is more favourable, however we can experiment further with more algorithms to have even a better R-square and lower MSE.