This repository contains additional features, extended to the traditional Word2Vec library, launched in 2013
This repository contains additional features, extended to the traditional Word2Vec library, launched in 2013.
This project was part of a larger project that involved sentimental analysis and sarcasm detection. The project details have been added in the repo-description section.
Directly clone the repository and start using it. The details of how the files are named and strored is given below:-
-> The model made using 3 options - Skipgram model, Common Bag of Words, Negative Sampling.
-> You can change the following model parameters:
-> Optimizer used is SGD.
-> Other that the main code ipynb, entire training corpus and testing corpus (numpy array) of reuters dataset (can be downloaded directly from here), I have included an example directory where I have applied my code base.
For more information, please refer to the project report uploaded.
-> The folder contains numpy array of training and testing datasets
-> The folder also contains a pickle file containing the dictionary:
key - Word (string format)
values - onehot encoding (numpy array)
-> Naming Convention of the weight1 files are :
../Window{window size}/{choice of model}/{learning rate}{vector size}weight_numpy.npy
These, weight vectors must be dot producted with the onehot vectors to get the actual embeddings in
the format given : W’X { W - Weight ; X - Onehot Vector}