Indonesian twitter dataset for emotion classification task
This dataset contains 4.403 Indonesian tweets which are labeled into five emotion classes: love, anger, sadness, joy and fear.
Each line consists of a tweet and its respective emotion label separated by semicolon (,). The first line is a header. For a tweet with coma (,) inside the text, there is an quote (“ “) to avoid column separation.
The tweets in this dataset has been pre-processed using the following criterias:
We have trained 1 Millions Indonesian tweets into Word2Vec and FastText vector. Those pre-trained word embedding can be downloaded /g/personal/mei_silviana_office_ui_ac_id/EkwS7R4C2F9FnYHVLvvkmeoBGRpJ3abNa7Lti9ceG2TWFw?e=dRzvmG">here.
If you want to publish a paper using this dataset and pre-trained word embedding, please cite this publication:
Mei Silviana Saputri, Rahmad Mahendra, and Mirna Adriani, “Emotion Classification on Indonesian Twitter Dataset“, in Proceeding of International Conference on Asian Language Processing 2018. 2018.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.