Code for training and testing a deep learning model for context-aware citation recommendation using the Microsoft Academic Graph
Pipeline for training the Neural Citation Network (NCN) with the Microsoft Academic Graph (MAG) dataset.
To setup the processing service, see https://github.com/sebastiancelis98/CitationRexApp
This Repository is for the training part only.
input: MAG-dump (.tsv for each table in the MAG - tab seperated text files)
output: weights for the model
Pipeline (using bwUniCluster)
$ws_allocate _name 10
to create a new workspace with more than enough temporary storage for 10 days. Copy relevant files to your home directory)$pip3 install -r req.txt
)python -m spacy download en_core_web_lg
cut -f1,20 Papers.txt > Papers+Counts.txt
in the directory where the MAG dump is placed$sbatch --partition=insert_node_here job.sh
after specifying the needed memory and the file to execute in job.sh (also change the path in job.sh to match the created venv) data preparation: (you can also execute all data.py functions in one job, but that takes more estimated time = more time before the job is accepted by the HPC Cluster)
training:
$ws_list
gives the location of your workspace(s)$scontrol show jobid _insert_id
gives the current status of your job$python3 -u xyz.py
makes the StdOut Stream unbuffered, so you can have print statements which will print to the job’s slurm_jobid.out file.