-
Notifications
You must be signed in to change notification settings - Fork 22
Introduction to Pipelines
In cTAKES, processing of text is done in a Pipeline.
A pipeline is an ordered collection of components that work together,
building and refining knowledge of the content within a document.
cTAKES contains several pre-built pipelines, but users can create their own using components included with cTAKES or custom components.
The Default Clinical Pipeline is a great place for a new user to start.
The Default Clinical Pipeline produces the most commonly desired output from cTAKES. This includes annotations for Anatomical sites, Signs/Symptoms, Procedures, Diseases/Disorders and Medications. For each annotation there are normalized UMLS CUIs, plus values for negation, uncertainty and subject.
A sample sentence processed by the Default Clinical Pipeline:
The patient underwent a CT scan in April which did not reveal lesions in his liver.

In cTAKES, pipelines are usually configured using Piper Files,
which contain short commands to add components to the pipeline, set values,
and allow interaction with the user's environment.
TODO -> This step-by-step guide should be in a how-to. That how-to can have a wiki link on this page.
On the command line run:
bin/runClinicalPipeline -i inputDirectory --htmlOut outputDirectory --key _umlsPasskey
There will be html files in your output directory that display the note text with underlines and other indications of discovered entities and their attributes.
If runClinicalPipeline fails with "ERROR PipelineBuilder - No Collection Reader specified.", verify that you used -i inputDirectory
The command line bin/runClinicalPipeline runs the Piper File DefaultFastPipeline.piper in resources/org/apache/ctakes/clinical/pipeline/
