项目作者: antoniod20

项目描述 :
Reproducible pipeline from Twitter API using DVC
高级语言: Python
项目地址: git://github.com/antoniod20/dvc-twitter.git
创建时间: 2020-12-17T17:02:58Z
项目社区:https://github.com/antoniod20/dvc-twitter

开源协议:

下载


Reproducible pipeline from Twitter API using DVC

In this project I built a pipeline using DVC from my previously created notebook, called the Twitter API.
Due to the size of my notebook, I only put the most important parts of my work into the pipeline.
This parts are:

  • Creation of dataset (I reduced the size of the dataset due to time reasons)
  • Creation of a NetworkX graph
  • Generation of the image of the graph

The pipeline graph is the following:

  1. +-------+
  2. | fetch |
  3. +-------+
  4. *
  5. *
  6. *
  7. +-------+
  8. | graph |
  9. +-------+
  10. *
  11. *
  12. *
  13. +------------+
  14. | egonetwork |
  15. +------------+

Setup

Download

To download the project, proceed with cloning.

  1. git clone https://github.com/antoniod20/dvc-twitter.git

Configuration

The project was carried out with Python 3.6.9. It is therefore advisable to have a version of Python at least higher than version 3 installed.
To install all the libraries needed to run the project, it is necessary to run this command line:

  1. pip install -r src/requirements.txt

Run

To launch the pipeline, the following steps must be run:

  • First command
    1. cd .\dvc-twitter\dvc-twitter-api\
  • Second command
    1. dvc repro

Resources & Libraries

  • Tweepy - Twitter API
  • NetworkX - Useful to handle the study of graphs and networks
  • Pandas - Useful to handle the CSV file
  • Matplotlib - Provides functions for embedding plots into applications

Author