BeamSearch + Simplification + Visualization#3
Conversation
… decoder.py) + visualization (predictions and attention mechanism)
|
Wow @Edouard360, this is awesome! Please give me a few days to check out the code, as I'm a little busy with work now. Have you had a chance to train the model with these changes? |
|
Hi @ssampang ! Thank you for your answer, I am glad you are interested :) I did try to train the model reducing the depth of the hidden layers for both the convolutional network and the LSTM (I have limited ressources). I achieved around 60% on test set with the “greedy” decoding. Interestingly, a wide beam search decreased final accuracy, but with a reasonable beam_width of 3, I had results a few % higher. I will train from scratch the full original network and confirm those results in this thread. Anytime for you answer ! I would be very happy to contribute to your great repo. |
The Tensorflow's seq2seq module can be used to reproduce exactly the behavior of decoder.py, with the advantage of being more compact, and relatively tuneable (e.g. the attention mechanism).
The seq2seq module provides convenient classes that can facilitate both training and inference.
For training, we can use:
TrainingHelperto guide the training so that the decoder predicts only one step ahead, with the ground truth label as input to the LSTM.ScheduledEmbeddingTrainingHelperwith a very close behavior except that with a probability p, it chooses the next input from the network's output (which might be wrong).For inference, we can either use:
BasicDecodercombined with theGreedyEmbeddingHelper- which reproduces the same behavior as in the current implementation.BeamSearchDecoder- to experiment for performance gains.Following the design advice given in Tensorflow nmt tutorial, I separated the graphs for Training and Inference.
Finally I implemented two visualisation tools for: