API Documentation

Config

Contains all arguments used in the program. You can set them here or provide them as command line arguments.

Variables

bleu_smoothing: Smoothing method to be used for BLEU calculation.
t: t value for confidence interval calculation.
train_source: Path to the train source file, where each line corresponds to one train input.
test_source: Path to the test source file, where each line corresponds to one test input.
test_target: Path to the test target file, where each line corresponds to one test target.
text_vocab: A file where each line is a word in the vocab.
vector_vocab: A file where each line is a word in the vocab followed by a vector.
test_responses: Path to the test model responses file, or to a directory containing different test response files.
metrics: A dict, where the keys are the 17 metrics, and the values are either 0 or 1, depending on if you want the specific metric to be computed.

Metrics

Main class which computes all the metrics and saves them to the output file.

Variables

project_path: Path to the current file.
test_responses: Path to test responses file or directory.
config: A Config instance.
distro: The train data distribution of words / bigrams.
vocab: Vocab containing word vectors.
input_dir: The directory containing responses file.
output_path: Path to the output file.
which_metrics: A dict, where each metric has either 0 or 1 value depending on if it will be calculated.
metrics: A dict containing filenames and for each filename a dict containing the list of metrics (for each example).
train_source: Path to the train source file, where each line corresponds to one train input.
test_source: Path to the test source file, where each line corresponds to one test input.
test_target: Path to the test target file, where each line corresponds to one test target.
text_vocab: A file where each line is a word in the vocab.
vector_vocab: A file where each line is a word in the vocab followed by a vector.
objects: A dict containing the instances of the other Metrics classes, keyed by their name.

init(config)

Initialize the Metrics objects and other variables based on the Config instance.

these_metrics(metric):

Whether there is at least one metric in the family of metrics provided by the parameter. returns: A bool representing this.

download_fasttext():

Downloads fastText word embeddings.

get_vocab():

Builds a vocab based on the training data file.

get_fast_text_embedding():

Generate the fast text embeddings for the vocab. Also generate the vocab if necessary.

delete_from_metrics(metric_list):

Set to 0 a list of metrics, so they won't be computed.

build_vocab():

Load the vocab from file.

run():

Main loop to compute metrics for all files.

write_metrics():

Compute mean, standard deviance, and confidence interval of metrics and save them to output file.

EmbeddingMetrics

A class to handle embedding-average, -extrema, -greedy as described here.

Variables

vocab: Dict containing the vocab and word vectors.
emb_dim: Embedding dimension of word vectors.
distro: Training data distribution of words.
average: Whether to compute embedding-average.
metrics: Dict containing the metric lists.

init(vocab, distro, emb_dim, average):

Initialize the provided parameters.

update_metrics(resp_words, gt_words, source_words):

Compute the metrics for the provided example sentence (as word list).

avg_embedding(words):

Compute the average word embedding. return: An np.array representation

extrema_embedding(words):

Compute the extrema embedding. return: An np.array representation.

greedy_embedding(words1, words2):

Compute the greedy score from one side. returns: A float, the greedy score.

CoherenceMetrics:

Handles the computation of coherence.

DivergenceMetrics:

Handles the computation of the KL divergence.

Variables

vocab: A dict containing word vectors of the vocab.
gt_path: Path to the ground truth file.
metrics: A dict containing the metric lists.

init(vocab, gt_path):

Initialize the given parameters.

update_metrics(resp, gt_words, source):

Compute the metrics for the provided example sentence (as word list).

setup(filename):

Setup the ground truth and test distributions from the given filename.

filter_distros(test, true):

Only keep the intersection of the two dictionaries. returns: The filtered dictionaries.

BleuMetrics

Handles the computation of BLEU score.

Variables

metrics: Dict containing the metric lists.
smoothing: The smoothing method from nltk.

init(smoothing):

Initialize the smoothing function.

update_metrics(resp, gt, source):

Compute the metrics for the provided example sentence (as word list).

EntropyMetrics:

Handles the computation of entropy metrics.

Variables

vocab: Dict containing the word vectors of the vocab.
distro: Dict containing the training data word distribution.
metrics: Dict containing the metric lists.

init(vocab, distro):

Initialize the given parameters.

update_metrics(resp_words, gt_wprds, source_words):

Compute the metrics for the provided example sentence (as word list).

DistinctMetrics

Handles the computation of distinct-1 and distinct-2

Variables

vocab: Dict containing word vectors of the vocab.
metrics: Dict containing the metric lists.

init(vocab):

Initialize the given parameters.

distinct(distro):

Calculate the distinct value for a distribution. returns: A float.

calculate_metrics(filename):

Calculate distinct metrics for a given file.

update_metrics(a, s, d):

Ghost function.

API Documentation

Config

Variables

Metrics

Variables

init(config)

these_metrics(metric):

download_fasttext():

get_vocab():

get_fast_text_embedding():

delete_from_metrics(metric_list):

build_vocab():

run():

write_metrics():

EmbeddingMetrics

Variables

init(vocab, distro, emb_dim, average):

update_metrics(resp_words, gt_words, source_words):

avg_embedding(words):

extrema_embedding(words):

greedy_embedding(words1, words2):

CoherenceMetrics:

DivergenceMetrics:

Variables

init(vocab, gt_path):

update_metrics(resp, gt_words, source):

setup(filename):

filter_distros(test, true):

BleuMetrics

Variables

init(smoothing):

update_metrics(resp, gt, source):

EntropyMetrics:

Variables

init(vocab, distro):

update_metrics(resp_words, gt_wprds, source_words):

DistinctMetrics

Variables

init(vocab):

distinct(distro):

calculate_metrics(filename):

update_metrics(a, s, d):

Clone this wiki locally