This repository runs two approaches to extract knowledge triples from the text. The first approach is based on dependency parsing while the second approach uses Bert token classification to classify the knowledge triples. Currently this repository only supports 20 News Dataset.
- Downloads and cleans the data.
- Extracts triples.
- Saves extracted triples in data/20NewsGroups.csv
To do the extraction via dependency parse, use 'dep' mode in command line.
python3 extract.py --mode depThe script will extract the triples and will save them in data/20NewsGroups.csv in 'triples' column.
For extracting triples doing inference on transformer based model, use 'bert' mode in command line.
But, beforehand, please make sure the following points are in check.
- The checkpoint has been downloaded from the google drive link (provided in the email) and the
config.pyfile has the location of checkpoint.
python3 extract.py --mode bertThe GPU based training can be done using Colab_training: Triples.ipynb notebook.
The GPU based k-fold inference can be done using Colab_inference: Triples.ipynb notebook.
To use these notebooks, please point the required data/checkpoints in the Config->Globals section.
Thank you.