项目作者: matjazmav

项目描述 :
Offensive language exploratory analysis
高级语言: Jupyter Notebook
项目地址: git://github.com/matjazmav/fri-2021-nlp-project.git
创建时间: 2021-03-03T15:06:47Z
项目社区:https://github.com/matjazmav/fri-2021-nlp-project

开源协议:

下载


Cross-Lingual Offensive Language Identification

Authors: Nikolina Grabovica, Selma Halilčević, Matjaž Mav

Advisors: Slavko Žitnik

Organization: University of Ljubljana, Faculty of Computer and Information Science

Course: Natural Language Processing 2020/2021


Description

In this short paper we reviewed a few publicly available datasets and a few different methods for offensivelanguage identification. We explored traditional methods using handcrafted features, contextual embeddings andembedding alignment methods and current state of the art transformer models.

Report: report.pdf


Requirements

Installation

Folder structure

  1. ├── .gitignore Git ignore config
  2. ├── README.md This file
  3. ├── requirements.txt Conda environment definition
  4. ├── data/ Contains datasets
  5. ├── reports/ Contains reports
  6. ├── results/ Contains final results and visualizations
  7. ├── checkpoints/ !!Contains downloaded checkpoints, see installation steps!!
  8. ├── elmoformanylanguages/ Contains pre-trained ELMo for EN and SI language
  9. ├── outputs/ Contains pre-trained BERT, mBERT, T5 and mT5 models
  10. ├── .gitignore
  11. └── src/ Contains source files
  12. └── eval-*.ipynb Model evaluation notebooks