项目作者: yxtay

项目描述 :
Character Embeddings Recurrent Neural Network Text Generation Models
高级语言: Python
项目地址: git://github.com/yxtay/char-rnn-text-generation.git
创建时间: 2017-07-02T03:40:40Z
项目社区:https://github.com/yxtay/char-rnn-text-generation

开源协议:MIT License

下载


Character Embeddings Recurrent Neural Network Text Generation Models

Inspired by Andrej Karpathy‘s
The Unreasonable Effectiveness of Recurrent Neural Networks.

This repository attempts to replicate the models, with slight modifications, in different python deep learning frameworks.

Frameworks

Default Model Specification

Layer Type Output Shape Param # Remarks
Embedding (64, 64, 32) 3136 vocab size: 98, embedding size: 32
Dropout (64, 64, 32) 0 dropout rate: 0.0
LSTM (64, 64, 128) 82432 output size: 128
Dropout (64, 64, 128) 0 dropout rate: 0.0
LSTM (64, 64, 128) 131584 output size: 128
Dropout (64, 64, 128) 0 dropout rate: 0.0
Dense (64, 64, 98) 12642 output size: 98

Training Specification

  • Batch size: 64
  • Sequence length: 64
  • Number of epochs: 32
  • Learning rate: 0.001
  • Max gradient norm: 5.0

Setup

  1. # clone repo
  2. git clone git@github.com:yxtay/char-rnn-text-generation.git && cd char-rnn-text-generation
  3. # create conda environment
  4. conda env create -f=environment.yml
  5. # activate environment
  6. source activate dl

Usage

Training

  1. usage: <framework>_model.py train [-h] --checkpoint-path CHECKPOINT_PATH
  2. --text-path TEXT_PATH
  3. [--restore [RESTORE]]
  4. [--seq-len SEQ_LEN]
  5. [--embedding-size EMBEDDING_SIZE]
  6. [--rnn-size RNN_SIZE]
  7. [--num-layers NUM_LAYERS]
  8. [--drop-rate DROP_RATE]
  9. [--learning-rate LEARNING_RATE]
  10. [--clip-norm CLIP_NORM]
  11. [--batch-size BATCH_SIZE]
  12. [--num-epochs NUM_EPOCHS]
  13. [--log-path LOG_PATH]
  14. optional arguments:
  15. -h, --help show this help message and exit
  16. --checkpoint-path CHECKPOINT_PATH
  17. path to save or load model checkpoints
  18. --text-path TEXT_PATH
  19. path of text file for training
  20. --restore [RESTORE] whether to restore from checkpoint_path or from
  21. another path if specified
  22. --seq-len SEQ_LEN sequence length of inputs and outputs (default: 64)
  23. --embedding-size EMBEDDING_SIZE
  24. character embedding size (default: 32)
  25. --rnn-size RNN_SIZE size of rnn cell (default: 128)
  26. --num-layers NUM_LAYERS
  27. number of rnn layers (default: 2)
  28. --drop-rate DROP_RATE
  29. dropout rate for rnn layers (default: 0.0)
  30. --learning-rate LEARNING_RATE
  31. learning rate (default: 0.001)
  32. --clip-norm CLIP_NORM
  33. max norm to clip gradient (default: 5.0)
  34. --batch-size BATCH_SIZE
  35. training batch size (default: 64)
  36. --num-epochs NUM_EPOCHS
  37. number of epochs for training (default: 32)
  38. --log-path LOG_PATH path of log file (default: main.log)

Example:

  1. python tf_model.py train \
  2. --checkpoint=checkpoints/tf_tinyshakespeare/model.ckpt \
  3. --text=data/tinyshakespeare.txt

Sample logs:

Text Generation

  1. usage: <framework>_model.py generate [-h] --checkpoint-path CHECKPOINT_PATH
  2. (--text-path TEXT_PATH | --seed SEED)
  3. [--length LENGTH] [--top-n TOP_N]
  4. [--log-path LOG_PATH]
  5. optional arguments:
  6. -h, --help show this help message and exit
  7. --checkpoint-path CHECKPOINT_PATH
  8. path to load model checkpoints
  9. --text-path TEXT_PATH
  10. path of text file to generate seed
  11. --seed SEED seed character sequence
  12. --length LENGTH length of character sequence to generate (default:
  13. 1024)
  14. --top-n TOP_N number of top choices to sample (default: 3)
  15. --log-path LOG_PATH path of log file (default: main.log)

Example:

  1. python tf_model.py generate \
  2. --checkpoint=checkpoints/tf_tinyshakespeare/model.ckpt \
  3. --seed="KING RICHARD"

Sample output:

  1. KING RICHARDIIIIl II I tell thee,
  2. As I have no mark of his confection,
  3. The people so see my son.
  4. SEBASTIAN:
  5. I have men's man in the common to his sounds,
  6. And so she said of my soul, and to him,
  7. And too marry his sun their commanded
  8. As thou shalt be alone too means
  9. As he should to thy sensess so far to mark of
  10. these foul trust them fringer whom, there would he had
  11. As the word of merrous and subject.
  12. GLOUCESTER:
  13. A spack, a service the counsel son and here.
  14. What is a misin the wind and to the will
  15. And shall not streaks of this show into all heard.
  16. KING EDIN YORK:
  17. I will be suppet on himself tears as the sends.
  18. KING EDWARD IV:
  19. No looks and them, and while, a will, when this way.
  20. BAPTHIO:
  21. A mortain and me to the callant our souls
  22. And the changed and such of the son.
  23. CORIOLANUS:
  24. I will, so show me with the child to the could sheep
  25. To beseence, and shall so so should but hear
  26. Than him with her fair to be that soul,
  27. Whishe it is no meach of my lard and
  28. And this, and with my love and the senter'd with marked
  29. And her should

Benchmarks

Below are training duration and loss on tinyshakespeare.txt.

Framework Duration (s) Loss
Keras 5270 1.42505
TensorFlow 3003 1.45795
PyTorch 5868 1.32285
Chainer 4954 1.22930
MXNet 7348 1.34199