CDAtransformer

CDAtransformer is an R Shiny application for parsing C-CDA (CDA XML) and FHIR (JSON) clinical documents into analysis-ready tabular outputs.

The application supports both single-document uploads and table-based data sets where each row contains a complete document payload associated with a patient identifier.

Overview

Clinical data are frequently distributed as nested XML or JSON documents (C-CDA or FHIR), which are difficult to analyze directly in R. CDAtransformer extracts structured elements from these documents and returns tidy tables suitable for filtering, joins, and downstream analysis.

This tool is intended for research, informatics, and data science workflows using exported electronic health record (EHR) data.

Features

Parse C-CDA (CDA XML) documents
Parse FHIR JSON, including Bundle and searchset responses
Accept either:
- Single document uploads, or
- CSV data sets containing one full document per row
Preserve patient identifiers in all outputs
Multiple export formats:
- Long (tidy, recommended)
- Wide (single-row summary)
- FHIR split export (one CSV per resource type)

Supported Input Formats

C-CDA (CDA XML)

Single document upload

Upload a .txt file containing one complete C-CDA XML document.

Data set upload (CSV)

The CSV file must contain exactly two columns:

Column name	Description
`patient_id`	Identifier used to link results back to the patient
`doc`	Full C-CDA XML document as a single string

Important:
Each row in doc must contain one complete XML document. If XML content is split across rows due to quoting or newline issues, parsing will fail.

FHIR

Single document upload

Upload a .txt file where each non-empty line is a valid JSON object:

a FHIR resource, or
a FHIR Bundle (e.g., searchset)

Data set upload (CSV)

The CSV file must contain exactly two columns:

Column name	Description
`patient_id`	Identifier used to link results back to the patient
`record_body`	JSON string containing a FHIR resource or Bundle

FHIR Bundles with entry[].resource elements are supported.

Output Formats

Long format (recommended)

A tidy, row-based output suitable for analysis.

Wide format

Collapses repeated paths into a single row per document. Intended for inspection only; not recommended for analysis.

FHIR split export

Exports one CSV file per FHIR resource type (e.g., Patient.csv, Condition.csv, Observation.csv), packaged as a ZIP archive.

Each file includes patient_id for linkage.

File Size Limits

Upload size is controlled in app.R:

options(shiny.maxRequestSize = 250 * 1024^2)  # 250 MB

When deployed behind a proxy (e.g., nginx, Shiny Server, Posit Connect), additional upload limits may apply outside of R.

Installation

Prerequisites

R (>= 4.0)
RStudio (recommended)

Install dependencies

Show R commands

install.packages(c(
  "shiny", "xml2", "jsonlite", "dplyr", "tidyr", "DT",
  "tibble", "zip", "readr", "tools"
))

Install from GitHub

Show R commands

install.packages("devtools")
devtools::install_github("BoyceLab/CDAtransformer", force = TRUE)

Usage

Run the Shiny application:

Show R commands

library(shiny)
runApp(".")

Typical data set workflow

Export clinical documents to CSV
Confirm required columns:
- C-CDA: patient_id, doc
- FHIR: patient_id, record_body
Upload the CSV into the application
Select document type (C-CDA or FHIR)
Choose an export format
Download the results

Contributing

Contributions are welcome. Please open an issue or submit a pull request via GitHub.

License

MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CCDASample.jpg		CCDASample.jpg
CDAtransformer.Rproj		CDAtransformer.Rproj
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
app.R		app.R
index.html		index.html
sitemap.xml		sitemap.xml
sitemap.xml.gz		sitemap.xml.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CDAtransformer

Overview

Features

Supported Input Formats

C-CDA (CDA XML)

Single document upload

Data set upload (CSV)

FHIR

Single document upload

Data set upload (CSV)

Output Formats

Long format (recommended)

Wide format

FHIR split export

File Size Limits

Installation

Prerequisites

Install dependencies

Install from GitHub

Usage

Typical data set workflow

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CDAtransformer

Overview

Features

Supported Input Formats

C-CDA (CDA XML)

Single document upload

Data set upload (CSV)

FHIR

Single document upload

Data set upload (CSV)

Output Formats

Long format (recommended)

Wide format

FHIR split export

File Size Limits

Installation

Prerequisites

Install dependencies

Install from GitHub

Usage

Typical data set workflow

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages