Doc2Vec implementation using JAX and Haiku
This is a JAX-based implementation of Le and Mikolov’s Doc2Vec algorithm, which
builds on Word2Vec to generate document-level (bag of words) representations.
The paper proposes 2 main model variants — Paragraph Vector-Distributed Memory (PV-DM
) and Distributed Bag of Words (DBOW
) — although found PV-DM
was more performant in most situations.
PV-DM architecture:
DBOW architecture:
doc2vec
contains the core implementation, including code to prepare documents for trainingexperiments
contains key experiments described in the original paper, reimplemented herePre-requisite: ensure nvidia-docker is installed
$ sudo docker build -t doc2vec-jax .
$ sudo docker run --gpus all -it -v ~/doc2vec:/doc2vec doc2vec-jax:latest /bin/bash
$ poetry shell
$ python -m doc2vec.train ...
PVDM
model variantDBOW
model variant