BI商业智能-Bike-Sharing-PROSAGA-码农传奇

Analyzing Bike sharing trend and demand forecasting

Business Value

Health and environment becoming trending topics, usage of bicycles as a mode of transportation has gained traction in recent years. To encourage bike usage, cities across the world have successfully rolled out bike sharing programs. Under such schemes, riders can rent bicycles using manual automated kiosks spread across the city for defined periods. In most cases, riders can pick up bikes from one location and return them to any other designated place.
The bike sharing platforms from across the world are hotspots of all sorts of data, ranging from travel time, start and end location, demographics of riders, and so on. This data along with alternate sources of information such as weather, traffic, terrain, and so on makes it an attractive proposition for different research areas.

Problem Statement

To predict demand of bike rental for this program

Data

Each row of data represents a bike demand under the given for the given attrubutes and each column contains represents the attributes.

dteday : Date

Season : Season of the year

yr : we have data for two years, so this represents the year to which the specific data belongs to

mnth : Month of the Year

hr : Time/Hour of the day (24 hours format)

yr : Whether the day was a holiday

weekday : Day of the week

workingday : Whether the day was a working day or a holiday

weathersit , temp , atemp , hum , windspeed : Weather conditions for the day

casual , registered : Non-Registered and registered users

cnt : total count/demand for the particular condition on the date

Approach

Loading Data
- Renaming attributes
- Performing type casting
Data Exploration and Visualization
Spliting Data for Train and test
Data Preprocessing
- Label Encoding
- One Hot Encoding of Categorical Values
Training Model
- Linear Regression
- Decision Tree

In order to measure the performance of the model, Mean Squared Error and R-square for the model is used.

Data Exploration and Visualization

Data Info

Data Distribution

Seasonwise Demand

Daily Hourly Distribution of Demand

Monthly Distribution of Demand

Year Wise Demand

Demand for Registered users and weather conditions

Outlier for Demand Hourswise

Working day and Holiday Demand Distribution

Correlation Heat Map of All Features

Model Building and Training

Linear Regression
Training Data

Cross validation

R-squared::[0.26906982 0.22635143 0.26170837 0.25691577 0.32148609 0.31196355 0.26400442 0.28760857 0.27964318 0.31554976]

MSE :: [-23850.18552648 -24978.92079695 -24408.88562232 -22517.19952749 -22117.37661082 -24746.90893471 -25327.37663767 -25328.57026724 -25094.24038362 -21876.91293786]

__Testing Model__
__R-squared__ ::0.2867680055878652
__MSE__: 22753.14

Decision Tree Model
Training Data

Grid search with Cross validation

R-Squared::0.29472840660973104

Best Hyperparameters::{‘criterion’: ‘mse’, ‘max_depth’: 8, ‘max_leaf_nodes’: 500, ‘min_samples_leaf’: 20, ‘min_samples_split’: 5}

Avg R-squared::0.30744642968000135

MSE::-23089.90043082429

__Testing Model__
__R-squared__::0.3238351376141839
__MSE__: 21570.64

Conclusion

From the Model Comparison we see that Decision Tree Model has higher R-squared value and lower Mean Square Error over the Linear Regressioni Model. Hence it is more favourable, however we can experiment further with more algorithms to have even a better R-square and lower MSE.