项目作者: retroinspect

项目描述 :
Given tweets about NASDAQ top 6 stocks(AAPL, GOOG, GOOGL, TSLA, AMZN, MSFT), will be there any relationship between tweet sentiments and the stock price?
高级语言: Python
项目地址: git://github.com/retroinspect/tweetstock.git
创建时间: 2021-08-16T05:31:15Z
项目社区:https://github.com/retroinspect/tweetstock

开源协议:MIT License

下载


tweetstock: Regression Analysis from Public Tweet Sentiment toward NASDAQ stock prices

Given tweets about NASDAQ top 6 stocks(AAPL, GOOG, GOOGL, TSLA, AMZN, MSFT), will be there any relationship between tweet sentiments and the stock price?

Directory structure

  1. |- data
  2. |- company-sentiment
  3. : sentiment labeled tweet ids on NASDAQ stocks
  4. |- raw
  5. |- company-tweets.csv
  6. |- company-values.csv
  7. |- nasdaq-tweets.csv
  8. |- (sentiment labeled financial tweets)
  9. |- regression
  10. : data for regression of public tweet sentiment and market value
  11. |- sample
  12. : data for debugging data generator
  13. |- sentiment-classifier
  14. : data to finetune BERtweet for sentiment classification of financial tweets
  15. |- calculate.py
  16. : calculate public sentiment and stock price difference of nasdaq stocks for intervals
  17. |- finetune_classifier.py
  18. : finetune model from `fintweet_sentiment_classifier.py`
  19. |- fintweet_sentiment_classifier.py
  20. : model copied & slightly modified from HuggingFace Transformer RoBERTa
  21. |- nasdaq_tweet_sentiment_tagger.py
  22. : tag sentiment to nasdaq tweets with fine-tuned classifier
  23. |- preproces_tweets.py
  24. : drop duplicated and suspicious nasdaq tweets and sentiment labeled financial tweets
  25. |- regression.py
  26. : output the relation between public sentiment and stock price direction
  27. |- utils.py
  28. : misc functions

Experiement Pipeline

Step 1. Preprocess tweets to get rid of spam

  • invovles preproces_tweets.py
  • Drop duplicated or suspicious spam tweets
  • Spam filter was based on [Kaggle notebook of aramacus] [1]
  • Due to limitation of computing power, sampled 30% of the tweets

    Step 2. Finetune BERtweet with sentiment labeled financial tweets

  • invovles fintweet_sentiment_classifier.py, finetune_classifier.py
  • To make sentiment classifier for financial tweets, finetune BERtweet (which is RoBERTa pretrained on tweeter data) with sentiment labeled financial tweets

    Step 3. Caculate public sentiment & stock price difference of NASDAQ tweets

  • invovles nasdaq_tweet_sentiment_tagger.py, calculate.py
  • Tag sentiment to NASDAQ tweets
  • Calculate public sentiment for 3 days
  • Calculate price difference of:
    • open price of the day right before the duration
    • close price of the day right after the duration

      Step 4. Regression

  • involves regression.py
  • Output the relation between public sentiment and stock price direction
  • Trained for tweets of 2015/01/01-2017/12/31
  • Tested for tweets of 2018/01/01-2019/12/31

Results

Model / Train Model / Test Predict always rise / Test Predict always rise / All
AAPL 63.5% 74.2% 63.7% 58.8%
GOOG 66.4% 60.9% 58.6% 56.5%
GOOGL 65.6% 65.2% 58.0% 60.0%
AMZN 63.9% 62.6% 63.2% 61.9%
TSLA 62.2% 54.9% 56.0% 55.1%
MSFT 58.1% 66.4% 66.4% 61.2%

*All values are accuracy of predicting rise or fall

Discussion

References

[1]: https://www.kaggle.com/aramacus/bot-hunting-or-how-many-tweets-were-made-by-bots
[2]:
@inproceedings{bertweet,
title = {{BERTweet: A pre-trained language model for English Tweets}},
author = {Dat Quoc Nguyen and Thanh Vu and Anh Tuan Nguyen},
booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
year = {2020},
pages = {9—14}
}