Skip to content

API Documentation

Richard Csaky edited this page Jul 19, 2019 · 6 revisions

Config

Contains all arguments used in the program. You can set them here or provide them as command line arguments.

Variables

  • bleu_smoothing: Smoothing method to be used for BLEU calculation.
  • t: t value for confidence interval calculation.
  • train_source: Path to the train source file, where each line corresponds to one train input.
  • test_source: Path to the test source file, where each line corresponds to one test input.
  • test_target: Path to the test target file, where each line corresponds to one test target.
  • text_vocab: A file where each line is a word in the vocab.
  • vector_vocab: A file where each line is a word in the vocab followed by a vector.
  • test_responses: Path to the test model responses file, or to a directory containing different test response files.
  • metrics: A dict, where the keys are the 17 metrics, and the values are either 0 or 1, depending on if you want the specific metric to be computed.

Metrics

Main class which computes all the metrics and saves them to the output file.

Variables

  • project_path: Path to the current file.
  • test_responses: Path to test responses file or directory.
  • config: A Config instance.
  • distro: The train data distribution of words / bigrams.
  • vocab: Vocab containing word vectors.
  • input_dir: The directory containing responses file.
  • output_path: Path to the output file.
  • which_metrics: A dict, where each metric has either 0 or 1 value depending on if it will be calculated.
  • metrics: A dict containing filenames and for each filename a dict containing the list of metrics (for each example).
  • train_source: Path to the train source file, where each line corresponds to one train input.
  • test_source: Path to the test source file, where each line corresponds to one test input.
  • test_target: Path to the test target file, where each line corresponds to one test target.
  • text_vocab: A file where each line is a word in the vocab.
  • vector_vocab: A file where each line is a word in the vocab followed by a vector.
  • objects: A dict containing the instances of the other Metrics classes, keyed by their name.

init(config)

Initialize the Metrics objects and other variables based on the Config instance.

these_metrics(metric):

Whether there is at least one metric in the family of metrics provided by the parameter. returns: A bool representing this.

download_fasttext():

Downloads fastText word embeddings.

get_vocab():

Builds a vocab based on the training data file.

get_fast_text_embedding():

Generate the fast text embeddings for the vocab. Also generate the vocab if necessary.

delete_from_metrics(metric_list):

Set to 0 a list of metrics, so they won't be computed.

build_vocab():

Load the vocab from file.

run():

Main loop to compute metrics for all files.

write_metrics():

Compute mean, standard deviance, and confidence interval of metrics and save them to output file.

EmbeddingMetrics

A class to handle embedding-average, -extrema, -greedy as described here.

Variables

  • vocab: Dict containing the vocab and word vectors.
  • emb_dim: Embedding dimension of word vectors.
  • distro: Training data distribution of words.
  • average: Whether to compute embedding-average.
  • metrics: Dict containing the metric lists.

init(vocab, distro, emb_dim, average):

Initialize the provided parameters.

update_metrics(resp_words, gt_words, source_words):

Compute the metrics for the provided example sentence (as word list).

avg_embedding(words):

Compute the average word embedding. return: An np.array representation

extrema_embedding(words):

Compute the extrema embedding. return: An np.array representation.

greedy_embedding(words1, words2):

Compute the greedy score from one side. returns: A float, the greedy score.

CoherenceMetrics:

Handles the computation of coherence.

DivergenceMetrics:

Handles the computation of the KL divergence.

Variables

  • vocab: A dict containing word vectors of the vocab.
  • gt_path: Path to the ground truth file.
  • metrics: A dict containing the metric lists.

init(vocab, gt_path):

Initialize the given parameters.

update_metrics(resp, gt_words, source):

Compute the metrics for the provided example sentence (as word list).

setup(filename):

Setup the ground truth and test distributions from the given filename.

filter_distros(test, true):

Only keep the intersection of the two dictionaries. returns: The filtered dictionaries.

BleuMetrics

Handles the computation of BLEU score.

Variables

  • metrics: Dict containing the metric lists.
  • smoothing: The smoothing method from nltk.

init(smoothing):

Initialize the smoothing function.

update_metrics(resp, gt, source):

Compute the metrics for the provided example sentence (as word list).

EntropyMetrics:

Handles the computation of entropy metrics.

Variables

  • vocab: Dict containing the word vectors of the vocab.
  • distro: Dict containing the training data word distribution.
  • metrics: Dict containing the metric lists.

init(vocab, distro):

Initialize the given parameters.

update_metrics(resp_words, gt_wprds, source_words):

Compute the metrics for the provided example sentence (as word list).

DistinctMetrics

Handles the computation of distinct-1 and distinct-2

Variables

  • vocab: Dict containing word vectors of the vocab.
  • metrics: Dict containing the metric lists.

init(vocab):

Initialize the given parameters.

distinct(distro):

Calculate the distinct value for a distribution. returns: A float.

calculate_metrics(filename):

Calculate distinct metrics for a given file.

update_metrics(a, s, d):

Ghost function.