项目作者: ponder-lab

项目描述 :
Python script to mine for GitHub issues + comments and classify them.
高级语言: Python
项目地址: git://github.com/ponder-lab/GitHub-Issue-Classifier.git
创建时间: 2021-02-08T04:06:06Z
项目社区:https://github.com/ponder-lab/GitHub-Issue-Classifier

开源协议:MIT License

下载


GitHub-Issue-Classifier

Build Status DOI

Python script to mine for GitHub issues + comments and classify them using analysis and detection of information types of open source software issue discussions. Our tool automates the process of querying GitHub for particular issues and feeding them into classification models. It supports both interactive and non-interactive modes.

CLI Tool Screenshot

Setup:

1) Ensure Python 3.9.1 and corrosponding pip 3.9 are installed
2) Install requirements: pip3.9 install -r requirements.txt
3) Download nltk stop word packages (only need to do it once per environment)

  • Download nltk stop words packages wordnet and stopwords via the python terminal
    1. import nltk
    2. nltk.download('wordnet')
    3. nltk.download('stopwords')
    • If you run intl SSL error, make sure python ssl certificates are installed: bash /Applications/Python\ 3.9/Install\ Certificates.command
      4) Add a GitHub Personal Access Token in the access token file: access_token.json. Replace <YOUR_GITHUB_PERSONAL_ACCESS_TOKEN> with your token.

Instructions on how to create a GitHub Personal Access Token.

Run Instructions:

There are two ways to run this program, either through the interactive command line, or by directly passing in command line argument via argparser:

Below is the -h help/man page:

  1. Usage: python mine-issues.py [-h] [-i] [-v] [-m MAX_RESULTS] [-s SORT_BY] [-p PREFIX_FILENAME] query
  2. positional arguments:
  3. query
  4. optional arguments:
  5. -h, --help show this help message and exit
  6. -i, --interactive **toggle the interactive CLI**
  7. -v, --verbose print additional logs
  8. -m MAX_RESULTS, --max-results MAX_RESULTS
  9. (int) max results to query
  10. -s SORT_BY, --sort-by SORT_BY
  11. sort by one of: [comments, best-match]
  12. -p PREFIX_FILENAME, --prefix-filename PREFIX_FILENAME
  13. (string) file name prefix for result output files

Usage:

Run python3.9 mine-issues.py <QUERY> with the following optional parameters:

-i or --interactive: will cause the script to trigger the interactive CLI and ignore all other params. See screenshot above of the interface.

-v or --verbose: will print out extra logging such as printing out the entire result object in neat JSON format.

-m or --max-results: filter the number of results we want to retrieve from the search query (1000 by default).

-s or --sort-by: Pick either comments or best-match to sort the search query result by (comments by default).

-p or --prefix-filename: String to prefix to the resulting file name (results_ by default).

-f or --filter: String of one of the 16 categories to filter out from results. Can provide multiple args (i.e -f foo -f bar …)

Categories that can be filtered:

['Expected Behaviour', 'Motivation', 'Observed Bug Behaviour', 'Bug Reproduction', 'Investigation and Exploration', 'Solution Discussion', 'Contribution and Commitment', 'Task Progression', 'Testing', 'Future Plan', 'New Issues and Requests', 'Solution Usage', 'WorkArounds', 'Issue Content Management', 'Action on Issue', 'Social Conversation']

Testing

Running tests to ensure that the script is functioning properly. Travis CI build also runs this as part of build status checks.
1) Ensure that pytest is properly set up for your python 3.9 env.
2) Simply run pytest to check that all tests are passing.

Folder Structure

/models - Contains the serialized files of the classification model.

/utils - Contains utility function files, such as IO, filtering results and processing comments.

/test - Contains test file being ran by pytest.

/result - Folder to output result files to. Contains a .gitignore to ignore all files in this folder to prevent results from being committed.

/config - Contains configuration files for the app to run, such as the personal access token.

Citation

Please cite our tool using the bibliographic information on Zenodo.