Bangla Word Embedding
What is Word2vec?
Word2vec is a method to efficiently create word embeddings. Word2vec model is basically a two-layer neural network that processes text. The use of word2vec is huge in deep learning such as Machine translations, Language modelling, Question and Answering, Image Captioning, Speech Recognition and so on. You can use this bangla word2vec model in your projects for getting better result. The procedure is given below to load this pretained weight into the project.
Install requirements packages by the following command
pip install -r requirments.txt
If you want to load the pretrained Embedding weights into your project. Follow the following procedure.
python export_weights.py
it will export the pretrained weight as “pretrained_weights.pickle” in the results folder.
import pickle
import torch.nn
embedding = nn.Embedding(num_embeddings, embedding_dim=300)
with open("results/pretrained_weights.pickle","rb") as f:
weight = pickle.load(f)
weight = torch.from_numpy(weight)
embedding.weight = nn.Parameter(weight)
copy the text file into the “data/“ folder
To preprocess, run the following script
python preprocess.py
To train the model, run the following script.
Default configuration for training,
python train.py
To Evaluate, run this script.
python eval.py
Note: In terminal, Bangla words can’t be displayed properly. So, It is better to evaluate them in the jupyter notebook. if you don’t have jupyter, install anaconda python properly and run “jupyter notebook” in the terminal.