项目作者: luopeixiang

项目描述 :
Pytorch implemention of Deep CNN Encoder + LSTM Decoder with Attention for Image to Latex
高级语言: Python
项目地址: git://github.com/luopeixiang/im2latex.git
创建时间: 2019-03-26T11:51:02Z
项目社区:https://github.com/luopeixiang/im2latex

开源协议:MIT License

下载


Im2Latex

License

Deep CNN Encoder + LSTM Decoder with Attention for Image to Latex, the pytorch implemention of the model architecture used by the Seq2Seq for LaTeX generation

Sample results from this implemention

sample_result

Experimental results on the IM2LATEX-100K test dataset

BLUE-4 Edit Distance Exact Match
40.80 44.23 0.27

Getting Started

Install dependency:

  1. pip install -r requirement.txt

Download the dataset for training:

  1. cd data
  2. wget http://lstm.seas.harvard.edu/latex/data/im2latex_validate_filter.lst
  3. wget http://lstm.seas.harvard.edu/latex/data/im2latex_train_filter.lst
  4. wget http://lstm.seas.harvard.edu/latex/data/im2latex_test_filter.lst
  5. wget http://lstm.seas.harvard.edu/latex/data/formula_images_processed.tar.gz
  6. wget http://lstm.seas.harvard.edu/latex/data/im2latex_formulas.norm.lst
  7. tar -zxvf formula_images_processed.tar.gz

Preprocess:

  1. python preprocess.py

Build vocab

  1. python build_vocab.py

Train:

  1. python train.py \
  2. --data_path=[data dir] \
  3. --save_dir=[the dir for saving ckpts] \
  4. --dropout=0.2 --add_position_features \
  5. --epoches=25 --max_len=150

Evaluate:

  1. python evaluate.py --split=test \
  2. --model_path=[the path to model] \
  3. --data_path=[data dir] \
  4. --batch_size=32 \
  5. --ref_path=[the file to store reference] \
  6. --result_path=[the file to store decoding result]

Features