Skip to content

ninarao/wer_calc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wer_calc

Python script that performs word error rate (WER) calculation for a set of reference and generated TXT, VTT, or SRT files and outputs results to csv.

Usage

python wer_calc.py [path/to/reference-directory] [path/to/generated-directory] [path/to/output.csv]

Requirements

Create a CSV file ("output.csv") where WER will be written to, with the following headers: "Reference", "Generated", "WER". Include the names of each reference file in the "Reference" column, and the name of its corresponding generated file in the "Generated" column. For example:

Reference Generated WER
reference1.srt generated1.srt
reference2.txt generated2.txt

The script matches reference files with generated files by looking up the pairs of filenames in each row. The corresponding WER will be written to the "WER" column in the same row.

Files can be in TXT (".txt"), SRT (".srt"), or VTT (".vtt") format.

Before running this script, install werpy: pip install werpy or pip3 install werpy

This script relies on werpy to do the following:

  • preprocess/normalize input text to remove punctuation, remove duplicated spaces, leading/trailing blanks and convert all words to lowercase
  • calculate word error rate (WER) for each of the reference and hypothesis texts

Licensing

This script is created with an MIT license.

werpy is released under the terms of the BSD 3-Clause License. Please refer to its LICENSE file for full details.

werpy also includes third-party packages distributed under the BSD-3-Clause license (NumPy, Pandas) and the Apache License 2.0 (Cython). The full NumPy, Pandas and Cython licenses can be found in the werpy LICENSES directory. They can also be found directly in the following source codes:

Conversion from SRT is adapted from srt2text. Please refer to its LICENSE for full details.

Misc

Feedback, comments, suggestions, etc are welcome!

About

Python script that performs word error rate (WER) calculation for a set of reference and generated text files and outputs results to csv.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages