Skip to content

Introduction to Pipelines

Sean Finan edited this page Dec 19, 2025 · 1 revision

In cTAKES, processing of text is done in a Pipeline.
A pipeline is an ordered collection of components that work together, building and refining knowledge of the content within a document.

cTAKES contains several pre-built pipelines, but users can create their own using components included with cTAKES or custom components.

Default Clinical Pipeline

The Default Clinical Pipeline is a great place for a new user to start.

The Default Clinical Pipeline produces the most commonly desired output from cTAKES. This includes annotations for Anatomical sites, Signs/Symptoms, Procedures, Diseases/Disorders and Medications. For each annotation there are normalized UMLS CUIs, plus values for negation, uncertainty and subject.

A sample sentence processed by the Default Clinical Pipeline:

The patient underwent a CT scan in April which did not reveal lesions in his liver.

In cTAKES, pipelines are usually configured using Piper Files, which contain short commands to add components to the pipeline, set values, and allow interaction with the user's environment.


TODO -> This step-by-step guide should be in a how-to. That how-to can have a wiki link on this page.

Step-by-step guide

On the command line run:

bin/runClinicalPipeline -i inputDirectory --htmlOut outputDirectory --key _umlsPasskey

There will be html files in your output directory that display the note text with underlines and other indications of discovered entities and their attributes.

If runClinicalPipeline fails with "ERROR PipelineBuilder - No Collection Reader specified.", verify that you used -i inputDirectory

The command line bin/runClinicalPipeline runs the Piper File DefaultFastPipeline.piper in resources/org/apache/ctakes/clinical/pipeline/

Clone this wiki locally