InferDoc

This repo has code to generate synthetic question answering training data using https://github.com/facebookresearch/UnsupervisedQA and training them using https://github.com/deepset-ai/haystack

SQUAD style QA dataset generation

cd self_supervised_qa && python -m unsupervisedqa.generate_synthetic_qa_data example_input.txt example_output

Transformer QA Model train ,eval and CLI testing

Usage:

qa_model.py train --data_dir=<data_dir> --train_file_name=<train_file_name> --dev_file_name=<dev_file_name>  --save_dir=<save_dir>\
qa_model.py test --data_dir=<data_dir> --eval_file_name=<eval_file_name> --save_dir=<save_dir>\
qa_model.py cli --data_dir=<data_dir> --save_dir=<save_dir>

Options:

--data_dir=<data_dir>........A namespace to find .txt squad formatted train or eval files
--train_file_name=<train_file_name>..............name of the train file in the data dir
--dev_file_name=<dev_file_name>..............The file to be used as a development set ,expected in SQUAD json format
--eval_file_name=<eval_file_name>..............The file to be used as a evaluation file,expected in SQUAD json format
--save_dir=<save_dir> ............The directory to save the trained model or to load the model from

Todo

Add automatic dataset generation within https://github.com/cdqa-suite/cdQA-ui to enable human in loop semi-supervised training Make fine-tuning on domain specific data more robust with deepset-ai/FARM#141

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
examples		examples
haystack		haystack
self_supervised_qa		self_supervised_qa
.gitignore		.gitignore
LLM_DATA_CREATION.md		LLM_DATA_CREATION.md
README.md		README.md
README.rst		README.rst
llm_data_creator.py		llm_data_creator.py
qa_model.py		qa_model.py
requirements.txt		requirements.txt
requirements_llm.txt		requirements_llm.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InferDoc

SQUAD style QA dataset generation

Transformer QA Model train ,eval and CLI testing

Usage:

Options:

Todo

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InferDoc

SQUAD style QA dataset generation

Transformer QA Model train ,eval and CLI testing

Usage:

Options:

Todo

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages