Skip to content

BeamSearch + Simplification + Visualization#3

Open
Edouard360 wants to merge 1 commit into
ssampang:masterfrom
Edouard360:master
Open

BeamSearch + Simplification + Visualization#3
Edouard360 wants to merge 1 commit into
ssampang:masterfrom
Edouard360:master

Conversation

@Edouard360
Copy link
Copy Markdown

The Tensorflow's seq2seq module can be used to reproduce exactly the behavior of decoder.py, with the advantage of being more compact, and relatively tuneable (e.g. the attention mechanism).

The seq2seq module provides convenient classes that can facilitate both training and inference.
For training, we can use:

  • The TrainingHelper to guide the training so that the decoder predicts only one step ahead, with the ground truth label as input to the LSTM.
  • Or the ScheduledEmbeddingTrainingHelper with a very close behavior except that with a probability p, it chooses the next input from the network's output (which might be wrong).

For inference, we can either use:

  • A BasicDecoder combined with the GreedyEmbeddingHelper - which reproduces the same behavior as in the current implementation.
  • A BeamSearchDecoder - to experiment for performance gains.

Following the design advice given in Tensorflow nmt tutorial, I separated the graphs for Training and Inference.

Finally I implemented two visualisation tools for:

  • The result

  • The attention model - to have an insight into the evolution of the highest alignment scores

attention1

… decoder.py) + visualization (predictions and attention mechanism)
@ssampang
Copy link
Copy Markdown
Owner

ssampang commented Jan 2, 2018

Wow @Edouard360, this is awesome! Please give me a few days to check out the code, as I'm a little busy with work now. Have you had a chance to train the model with these changes?

@Edouard360
Copy link
Copy Markdown
Author

Hi @ssampang ! Thank you for your answer, I am glad you are interested :)

I did try to train the model reducing the depth of the hidden layers for both the convolutional network and the LSTM (I have limited ressources). I achieved around 60% on test set with the “greedy” decoding. Interestingly, a wide beam search decreased final accuracy, but with a reasonable beam_width of 3, I had results a few % higher.

I will train from scratch the full original network and confirm those results in this thread. Anytime for you answer ! I would be very happy to contribute to your great repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants