Setup char-rnn and torch-rnn for character level language model in Torch

Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch


This code is written in Lua and requires Torch. So Follow the tutorial provided in my previous blog post for installing Torch.

Lua Pre-requisite

# Install most things using luarocks
luarocks install torch
luarocks install nn
luarocks install optim
luarocks install lua-cjson

# We need to install torch-hdf5 from GitHub
git clone
cd torch-hdf5
luarocks make hdf5-0-0.rockspec
cd ..


This code implements multi-layer Recurrent Neural Network (RNN, LSTM, and GRU) for training/sampling from character-level language models. In other words the model takes one text file as input and trains a Recurrent Neural Network that learns to predict the next character in a sequence. The RNN can then be used to generate text character by character that will look like the original training data.

More info here in this blog by Andrej Karpathy.


1. Clone char-rnn repository

git clone

2. Download char-rnn-files.tar.gz zip file from following link

3. Delete data folder from char-rnn folder and then copy and data provided in char-rnn-files zip file and paste it in char-rnn.

4. Train the model. Now execute the provided in char-rnn folder

cd char-rnn
mkdir checkpoints checkpoints/linux checkpoints/obama checkpoints/tinyshakespeare checkpoints/tolstoy
chmod +x

Edit if you want to change settings such as run from cpu rather than gpu and if so you have use flag “-gpuid -1”.

5. Optional if you want to manually train model using train.lua then for example the data you wanna train is tolstoy then do as following:

cd char-rnn (wherever it is)  
th train.lua -data_dir data/tolstoy -rnn_size 512 -num_layers 2 -dropout 0.5

Checkpoints. While the model is training it will periodically write checkpoint files to the cv folder. The frequency with which these checkpoints are written is controlled with number of iterations, as specified with the eval_val_every option (e.g. if this is 1 then a checkpoint is written every iteration). The filename of these checkpoints contains a very important number: the loss. For example, a checkpoint with filename lm_lstm_epoch0.95_2.0681.t7 indicates that at this point the model was on epoch 0.95 (i.e. it has almost done one full pass over the training data), and the loss on validation data was 2.0681. This number is very important because the lower it is, the better the checkpoint works. Once you start to generate data (discussed below), you will want to use the model checkpoint that reports the lowest validation loss. Notice that this might not necessarily be the last checkpoint at the end of training (due to possible overfitting).

6. Sampling output. Given a checkpoint file (such as those written to cv) we can generate new text. For example:

#the gpuid 0 specifies gpu card and if you want to run on cpu then use gpuid -1
th sample.lua checkpoints/tinyshakespeare/cv/lm_lstm_epoch<epoch>_<smallest-loss-value>.t7 -gpuid 0


Make sure that if your checkpoint was trained with GPU it is also sampled from with GPU, or vice versa. Otherwise the code will (currently) complain. As with the train script, see $ th sample.lua -help for full options. One important one is (for example) -length 10000 which would generate 10,000 characters (default = 2000).

Temperature. An important parameter you may want to play with is -temperature, which takes a number in range (0, 1] (0 not included), default = 1. The temperature is dividing the predicted log probabilities before the Softmax, so lower temperature will cause the model to make more likely, but also more boring and conservative predictions. Higher temperatures cause the model to take more chances and increase diversity of results, but at a cost of more mistakes.

Priming. It’s also possible to prime the model with some starting text using -primetext. This starts out the RNN with some hardcoded characters to warm it up with some context before it starts generating text. E.g. a fun primetext might be -primetext "the meaning of life is ".

For example:

th sample.lua checkpoints/tinyshakespeare/cv/lm_lstm_epoch<epoch>_<smallest-loss-value>.t7 -gpuid 0 -primetext "the meaning of life is  " -temperature 0.50



torch-rnn provides high-performance, reusable RNN and LSTM modules for torch7, and uses these modules for character-level language modeling similar to char-rnn.

You can find documentation for the RNN and LSTM modules here; they have no dependencies other than torch and nn, so they should be easy to integrate into existing projects.

Compared to char-rnn, torch-rnn is up to 1.9x faster and uses up to 7x less memory.


1. Clone torch-rnn repository

git clone
cd torch-rnn

2. Preprocess the data

You can use any text file for training models. Before training, you’ll need to preprocess the data using the script scripts/; this will generate an HDF5 file and JSON file containing a preprocessed version of the data.

python scripts/ \
 --input_txt data/tiny-shakespeare.txt \ 
 --output_h5 data/tiny-shakespeare.h5 \ 
 --output_json data/tiny-shakespeare.json

This will produce files tiny-shakespeare.h5 and tiny-shakespeare.json that will be passed to the training script.

3. Train the model. After preprocessing the data, you’ll need to train the model using the train.lua script. This will be the slowest step. You can run the training script like this:

th train.lua -input_h5 data/tiny-shakespeare.h5 -input_json data/tiny-shakespeare.json

This will read the data stored in tiny-shakespeare.h5 and tiny-shakespeare.json, run for a while, and save checkpoints to files with names like cv/checkpoint_1000.t7.

[Optional] You can change the RNN model type, hidden state size, and number of RNN layers like this:

th train.lua -input_h5 data/tiny-shakespeare.h5 -input_json data/tiny-shakespeare.json -model_type rnn -num_layers 3 -rnn_size 256

By default this will run in GPU mode using CUDA; to run in CPU-only mode, add the flag -gpu -1.

4. Sample from the model. After training a model, you can generate new text by sampling from it using the script sample.lua. Run it like this:

th sample.lua -checkpoint cv/checkpoint_10000.t7 -length 2000

This will load the trained checkpoint cv/checkpoint_10000.t7 from the previous step, sample 2000 characters from it, and print the results to the console.


Monitoring Validation Loss vs. Training Loss

If you’re somewhat new to Machine Learning or Neural Networks it can take a bit of expertise to get good models. The most important quantity to keep track of is the difference between your training loss (printed during training) and the validation loss (printed once in a while when the RNN is run on the validation data (by default every 1000 iterations)). In particular:

  • If your training loss is much lower than validation loss then this means the network might be overfitting. Solutions to this are to decrease your network size, or to increase dropout. For example you could try dropout of 0.5 and so on.
  • If your training/validation loss are about equal then your model is underfitting. Increase the size of your model (either number of layers or the raw number of neurons per layer)


For minimal python-numpy model of char-rnn: min-char-rnn

For fancy torch based char-rnn: char-rnn

For optimized char-rnn called torch-rnn: torch-rnn




Join the Discussion

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Blog at

Up ↑

%d bloggers like this: