项目作者: saidsef

项目描述 :
Classify news articles into different categories using Machine Learning
高级语言: Jupyter Notebook
项目地址: git://github.com/saidsef/ml-classifier.git
创建时间: 2018-06-28T17:02:35Z
项目社区:https://github.com/saidsef/ml-classifier

开源协议:MIT License

下载


Machine Learning - News Articles classification with sklearn CI Tagging Release

Classify news articles into different categories using Machine Learning. The dataset consists of 6000 documents and 47 categories.

My goal is to show you how to create a predictive model(s) that will classification labels for news articles.

Objective

  • To classify news articles
  • Learn the basics of natural language processing
  • Build models using sklearn and choose the best one
  • Use sklearn’s make_pipeline class
  • Learn how to turn it into a service
  • Learn how to make it composable and portable
  • Profit?

Prerequisite

  • Python >= v3.11
  • Jupyter Notebook
  • Some knowledge of Machine Learning

Python Libs

  • NumPy
  • Pandas
  • SciPy
  • Matplotlib
  • Jupyter
  • Scikit-learn (the library that we will use later in this post when creating the classifier model(s))

We Will

  • Apply some preprocessing steps to prepare the data.
  • We will perform a descriptive analysis of the data to better understand the main characteristics that they have
  • We will continue by practicing how to train different machine learning models using scikit-learn. It is one of the most popular python libraries for machine learning
  • We will also use a subset of the dataset for training purposes
  • We will iterate and evaluate the learned models by using unseen data. Later, we will compare them until we find a good models that meets our expectations, and use a VotingClassifier soft voting for unfitted estimators.
  • Once we have chosen the candidate model(s), we will use it to perform predictions and to create a simple web application that consumes this predictive model

Getting started with the machine learning tutorial

See Jupyter Notebook

Deployment

As a container:

  1. docker run -d -p 7070:7070 docker.io/saidsef/ml-classifier:latest

As a Python application:

  1. pip3 install -r requirements.txt
  2. PORT=7070 classifier-ml.py

JSON Format

Payload format should be JSON format

  1. { "body": "text-goes-here" }

The Request

The quest must be POST and JSON format:

  1. curl -XPOST http://localhost:7070/api/v1/news -H 'Content-Type: application/json' -d @test/test.json

Response will be json format:

  1. {
  2. "score": 1,
  3. "category": "Opinion"
  4. }

Kubernetes

  1. kubectl apply -k ./deployment