Source code for: On the Effect of Low-Frequency Terms on Neural-IR Models, SIGIR'19
SIGIR’19, Sebastian Hofstätter, Navid Rekabsaz, Carsten Eickhoff, and Allan Hanbury
Low-frequency terms are a recurring challenge for information retrieval models, especially neural IR frameworks struggle with adequately capturing infrequently observed words. While these terms are often removed from neural models - mainly as a concession to efficiency demands - they traditionally play an important role in the performance of IR models. In this paper, we analyze the effects of low-frequency terms on the performance and robustness of neural IR models. We conduct controlled experiments on three recent neural IR models, trained on a large-scale passage retrieval collection. We evaluate the neural IR models with various vocabulary sizes for their respective word embeddings, considering different levels of constraints on the available GPU memory.
We observe that despite the significant benefits of using larger vocabularies, the performance gap between the vocabularies can be, to a great extent, mitigated by extensive tuning of a related parameter: the number of documents to re-rank. We further investigate the use of subword-token embedding models, and in particular FastText, for neural IR models. Our experiments show that using FastText brings slight improvements to the overall performance of the neural IR models in comparison to models trained on the full vocabulary, while the improvement becomes much more pronounced for queries containing low-frequency terms.
Get the full paper here: http://arxiv.org/abs/1904.12683
Please cite the paper:
@inproceedings{hofstaetter_sigir_2019,
author = {Hofst{\"a}tter, Sebastian and Rekabsaz, Navid and Eickhoff, Carsten and Hanbury, Allan},
title = {On the Effect of Low-Frequency Terms on Neural-IR Models},
booktitle = {Proceedings of SIGIR},
year = {2019},
publisher = {ACM}
}
If you have any questions or suggestions, please feel free to open an issue or write an email to Sebastian (email in the paper). Of course we are also open future collaborations in the field of neural IR
Thanks to all the original authors for their inspiring papers! - We re-implemented the following models:
We show that all three models work really well with the MS MARCO test collection - if implemented and tuned correctly.
Requirements: PyTorch 1.0+ and AllenNLP
For re-ranking depth evaluation you need BM25 ranks (We recommend using Anserini to generate them)
train.py is the main trainer -> it uses a multiprocess batch generation pipeline
python matchmaker/preprocessing/tokenize_files.py --in-file <path> --out-file <path> --reader-type <labeled_tuple or triple>
./generate_file_split.sh
1x for training.tsv and 1x for top1000dev.tsv (the validation set)./generate_file_split.sh <base_file> <n file chunks> <output_folder_and_prefix> <batch_size>
for train + validation setstrain.py
with python -W ignore train.py --run-name experiment1 --config-file configs/your_file.yaml
(-W ignore = ignores useless spacy import warnings, that come up for every subprocess (and there are many of them))