Skip to content

scottbouma/katecheo

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

alt text

Katecheo is a modular system for topical question answering built on Kubernetes. It is portable to any Kubernetes cluster (in the cloud or on-prem), and it allows developers to integrate state-of-the-art questions answering into their applications via its REST API.

You can learn more about Katecheo in:

Deploy Katecheo

System Prerequisites

Katecheo runs on Kubernetes and utilizes Seldon to serve predictions. You will need:

  • A Kubernetes cluster (see here for more information)
  • Seldon deployed on the Kubernetes cluster (see here for more info about Seldon)
  • Ambassator or Istio installed on the Kubernetes cluster (see here for more info)

Data/model Preparation

To classify input messages according to topic, Katecheo requires a pre-trained spaCy NER model trained to detect topical entities. To match topical questions with appropriate knowledge base articles, Katecheo relies on a dataset of knowledge base articles with their corresponding titles.

You will need the following for each topic you want to enable in Katecheo:

  • A pre-trained spaCy NER model all bundled into a single zip file
  • A JSON file containing Knowledge base articles (structure as shown here)

Deploy

  1. Clone this repo.

  2. Move into the deploy directory and copy the template configuration file:

    $ cd deploy && cp config.template.json config.json
    
  3. Fill in the links to your NER model(s) and knowledge base article files in config.json. When you are done, the config file should look something like the following (for a scenario when we are enabling Q&A in two topics: faith, or Christianity, and health, or Medical Sciences):

    [
      {
        "name": "faith",
        "ner_model": "https://storage.googleapis.com/pachyderm-neuralbot/ner_models/faith.zip",
        "kb_file": "https://storage.googleapis.com/pachyderm-neuralbot/knowledge_bases/kb_faith.json"
      },
      {
        "name": "health",
        "ner_model": "https://storage.googleapis.com/pachyderm-neuralbot/ner_models/health.zip",
        "kb_file": "https://storage.googleapis.com/pachyderm-neuralbot/knowledge_bases/kb_health.json"
      }
    ]
    
  4. Make sure your local kubectl is connected to your cluster.

  5. Run the deploy script.

    $ ./deploy.sh
    
  6. This will deploy all of the Katecheo modules to your cluster. Once the Katecheo pod is in a running state, you will be able to serve multi-topic answers at the following endpoint: http://<ingress IP>/seldon/default/katecheo/api/v0.1/predictions

Usage

Example request (Question):

$ curl -X POST -H 'Content-Type: application/json' -d '{"data": {"names": ["message"], "ndarray": ["What does the Bible say about vegetarianism?"]}}' http://35.201.10.193/seldon/default/katecheo/api/v0.1/predictions

Example response (Answer):

{
  "meta": {
    "puid": "nf7ukk5cur2dp8bcpe7qe53hsb",
    "tags": {
      "proceed": true,
      "topic": "faith"
    },
    "routing": {
      "target-classifier": -1,
      "question-detector": -1,
      "kb-search": -1
    },
    "requestPath": {
      "target-classifier": "cvdigital/target-classifier:v0.1.0",
      "question-detector": "cvdigital/question-detector:v0.1.0",
      "comprehension": "cvdigital/comprehension:v0.1.0",
      "kb-search": "cvdigital/kb-search:v0.1.0"
    },
    "metrics": []
  },
  "strData": "I think you would be hard pressed to say that the Bible commands a vegetarian diet"
}

Future extensions

In the future we intend to:

  • Extend our knowledge base search methodology (e.g., to use bigrams and TF-IDF)
  • Enable usage of a wider variety of pre-trained models (BERT, XLNet, etc.)
  • Explore other topic matching/modeling techniques to remove our NER model dependency (non-negative matrix factorization and/or latent dirichlet allocation)

Citing

If you use Katecheo in your research, please cite Katecheo: A Portable and Modular System for Multi-Topic Question Answering:

@inproceedings{CV2019Katecheo,
  title={Katecheo: A Portable and Modular System for Multi-Topic Question Answering},
  author={Shirish Hirekodi and Seban Sunny and Leonard Topno and Alwin Daniel and Reuben Skewes and Stuart Cranney and Daniel Whitenack},
  year={2019},
  Eprint = {arXiv:1907.00854},
}

All material is licensed under the Apache License Version 2.0, January 2004.

About

Modular, multi-topic question answering on top of Kubernetes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 97.7%
  • Shell 2.3%