Analyzing Bike sharing trend and predict demand of Bike Rental
Analyzing Bike sharing trend and demand forecasting
Health and environment becoming trending topics, usage of bicycles as a mode of transportation has gained traction in recent years. To encourage bike usage, cities across the world have successfully rolled out bike sharing programs. Under such schemes, riders can rent bicycles using manual automated kiosks spread across the city for defined periods. In most cases, riders can pick up bikes from one location and return them to any other designated place.
The bike sharing platforms from across the world are hotspots of all sorts of data, ranging from travel time, start and end location, demographics of riders, and so on. This data along with alternate sources of information such as weather, traffic, terrain, and so on makes it an attractive proposition for different research areas.
To predict demand of bike rental for this program
Each row of data represents a bike demand under the given for the given attrubutes and each column contains represents the attributes.
dteday : Date
Season : Season of the year
yr : we have data for two years, so this represents the year to which the specific data belongs to
mnth : Month of the Year
hr : Time/Hour of the day (24 hours format)
yr : Whether the day was a holiday
weekday : Day of the week
workingday : Whether the day was a working day or a holiday
weathersit , temp , atemp , hum , windspeed : Weather conditions for the day
casual , registered : Non-Registered and registered users
cnt : total count/demand for the particular condition on the date
Loading Data
Data Exploration and Visualization
Spliting Data for Train and test
Data Preprocessing
Label Encoding
One Hot Encoding of Categorical Values
Training Model
In order to measure the performance of the model, Mean Squared Error and R-square for the model is used.
Data Info
Data Distribution
Seasonwise Demand
Daily Hourly Distribution of Demand
Monthly Distribution of Demand
Year Wise Demand
Demand for Registered users and weather conditions
Outlier for Demand Hourswise
Working day and Holiday Demand Distribution
Correlation Heat Map of All Features
Training Data
Cross validation
R-squared::[0.26906982 0.22635143 0.26170837 0.25691577 0.32148609 0.31196355 0.26400442 0.28760857 0.27964318 0.31554976]
MSE :: [-23850.18552648 -24978.92079695 -24408.88562232 -22517.19952749 -22117.37661082 -24746.90893471 -25327.37663767 -25328.57026724 -25094.24038362 -21876.91293786]
__Testing Model__
__R-squared__ ::0.2867680055878652
__MSE__: 22753.14
Training Data
Grid search with Cross validation
R-Squared::0.29472840660973104
Best Hyperparameters::{‘criterion’: ‘mse’, ‘max_depth’: 8, ‘max_leaf_nodes’: 500, ‘min_samples_leaf’: 20, ‘min_samples_split’: 5}
Avg R-squared::0.30744642968000135
MSE::-23089.90043082429
__Testing Model__
__R-squared__::0.3238351376141839
__MSE__: 21570.64
From the Model Comparison we see that Decision Tree Model has higher R-squared value and lower Mean Square Error over the Linear Regressioni Model. Hence it is more favourable, however we can experiment further with more algorithms to have even a better R-square and lower MSE.