项目作者: sanjeevai

项目描述 :
Rank-based, collaborative filtering and matrix factorisation techniques for Recommendation Engine for IBM Watson Studio platform
高级语言: HTML
项目地址: git://github.com/sanjeevai/Recommendations_with_IBM.git
创建时间: 2019-01-25T13:05:34Z
项目社区:https://github.com/sanjeevai/Recommendations_with_IBM

开源协议:

下载


Data Scientist Nanodegree

Recommendation Engines

Project: Recommendations with IBM

@sanjeevai/recommendations-with-ibm-7f89d25375fc">Blog Version

Table of Contents


Project Introduction

For this project I will analyze the interactions that users have with articles
on the IBM Watson Studio platform, and make recommendations to them about
new articles I think they will like. Below is an example of what the
dashboard could look like displaying articles on the IBM Watson Platform.

In order to determine which articles to show to each user, I will be performing a study of the data available on the IBM Watson Studio platform.

dashboard

Exploratory Data Analysis

user-item-int

Most of the users have maximum 3 interactions with any article on the platform and this distribution is highly skewed because interactions are less.

Rank Based Recommendations

This type of recommendation system provide the top articles view in this
dataset.

We can set how many recommendations to provide.

User-user Based Collaborative Filtering

We provide a user_id for which we want recommendations. Then we sort each user
based on similarity with the given user_id.

For each sorted user, we find the articles this sorted user has interacted with
to add to recommedations list.

Then we select the top m recommendations, m being the number of recommendations
to provide for a specific user_id.

Matrix Factorisation

In this section we first perform SVD on the user_item interactions matrix. We
then see the behaviour of accuracy with the number of latent features. Since the
data is highly imbalanced, we also check the variation of F1 score with the
number of latent features. F1 score increases upto a limit and then drops
asymptotically.

We have a highly imbalanced data set because of less interactions on the platform.

Conclusion

There were only 20 customer for which we can try and provide recommendation. If
we had more data then performance of our recommendation engine could be
evaluated more efficiently. We have a highly imbalanced data because of many
zeroes in the user-item interaction matrix. I will try content recommendation in
future iteractions to tackle the cold start problem.

Files

  1. .
  2. ├── Recommendations_with_IBM.html----------# HTML EXPORT OF JUPYTER NOTEBOOK
  3. ├── Recommendations_with_IBM.ipynb---------# ANALYSIS NOTEBOOK
  4. ├── data
  5. ├── articles_community.csv-------------# INFORMATION ABOUT ARTICLES
  6. └── user-item-interactions.csv---------# USER-ARTICLE INTERACTIONS
  7. ├── project_tests.py-----------------------# UNIT TESTS FOR PROJECT
  8. ├── top_10.p-------------------------------# BINARY FILE TO CHECK MY SOLUTION
  9. ├── top_20.p-------------------------------# BINARY FILE TO CHECK MY SOLUTION
  10. ├── top_5.p--------------------------------# BINARY FILE TO CHECK MY SOLUTION
  11. ├── user_item_matrix.p---------------------# BINARY FILE TO CHECK MY SOLUTION
  12. └── visuals.py-----------------------------# CUSTOM PLOTS CREATED IN PLOTLY

Software and Libraries

This is project uses Python 3.6.6 and the necessary libraries are mentioned in requirements file.