The INDICATE Data Dictionary is designed to be forked and reused by other teams. This guide walks through:
- Initial setup — fork, configure, wipe INDICATE content
- Day-to-day — adding your own concept sets, building
- Updating — pulling the latest code from upstream while keeping your content
- Deployment — publishing the static site (GitHub Pages or GitLab Pages)
The recommended path is "Use this template" on GitHub: it gives you a clean, separate history rather than a fork that always points back at INDICATE. On https://github.com/indicate-eu/data-dictionary, click Use this template → Create a new repository, name it (e.g. <your-org>/data-dictionary), then clone it locally:
git clone git@github.com:<your-org>/<your-repo>.git
cd <your-repo>If you prefer GitLab, mirror or import the repo there: GitLab's New project → Import project → Repository by URL accepts https://github.com/indicate-eu/data-dictionary.git. The repo ships a .gitlab-ci.yml so GitLab Pages works out of the box (see §4.2).
This file at the repo root holds everything that identifies the dictionary as yours:
{
"title": "My Team Data Dictionary",
"languages": ["en", "fr"],
"defaultLanguage": "en",
"github": {
"repo": "<your-org>/<your-repo>",
"branch": "main",
"upstream": "https://github.com/indicate-eu/data-dictionary.git",
"upstreamBranch": "main"
},
"organization": {
"name": "My Team",
"url": "https://my-team.example.org"
},
"customVocabulary": {
"id": "MYTEAM",
"codePrefix": "MYTEAM-"
},
"branding": {
"logo": "logo.png",
"favicon": "favicon.png",
"logoAlt": "My Team"
},
"tabs": {
"showProjects": true,
"showMappingRecommendations": true
}
}Key fields:
title— shown in the browser tab and the SPA header. This is the name of your dictionary.github.repo— the URL fragment used for "Propose on GitHub" links (https://github.com/<repo>/edit/<branch>/...).github.upstream— kept pointing at INDICATE's repo soupdate_from_upstream.pyknows where to pull code updates from.organization— defaultmetadata.organizationwritten into new concept sets.customVocabulary— vocabulary id and code prefix used when a user adds a custom concept (not from OMOP) inside the SPA.tabs— setfalseto hide the Projects or Mapping Recommendations tabs if you don't need them.
Not configurable (intentionally): the application name and version shown in the footer always reference the master upstream INDICATE app, since that's what's running here. Your dictionary content is versioned per concept set (version field in each concept_sets/<id>.json), independently of the app version.
Replace these files in docs/ with your own:
docs/logo.png— header logo (transparent PNG, ~64–128 px tall)docs/favicon.png— favicon (square PNG)docs/data_dictionary.png— README screenshot (optional)
If you change the file names, update branding.logo / branding.favicon in config.json.
The Documentation page (#/documentation, served from docs/documentation.js) is INDICATE-specific content (mission, partners, references). Edit docs/documentation.js to describe your own dictionary. Until you do, leave it as-is — it still renders, it just talks about INDICATE.
python3 reset.pyThis wipes concept_sets/, projects/, concept_sets_resolved/, resets units/recommended_units.json to [] and id_counters.json to {1, 1}. Generic content (units/unit_conversions.json, mapping_recommendations/) and configuration are kept. After confirming, the script also runs build.py to regenerate docs/data.json.
Flags:
--yes— skip the confirmation prompt--keep-units— also keeprecommended_units.json(otherwise reset to[])--no-build— skip the rebuild step
Copy config.local.example.json to config.local.json (gitignored):
cp config.local.example.json config.local.jsonconfig.local.json looks like this:
{
"ohdsiVocab": "/path/to/ohdsi_vocabularies.duckdb_or_folder_with_CSV_or_Parquet",
"loincPath": "/path/to/loinc_distribution",
"snomedPath": "/path/to/snomed_rf2_release",
"umlsPath": "/path/to/umls_metathesaurus",
"npuCodesPath": "/path/to/npu-codes-latest.csv"
}All entries are optional; fill in only what you have, the tools will prompt you for missing paths when needed. Each key points to a different terminology resource:
-
ohdsiVocab— the OHDSI vocabulary, used byresolve.pyand theresolve-concept-setsskill to expand concept sets (descendants, mapped concepts) using theCONCEPT,CONCEPT_ANCESTOR, andCONCEPT_RELATIONSHIPtables. Download the vocabularies you need (LOINC, SNOMED, RxNorm, ATC, UCUM, etc.) from ATHENA (free, OHDSI account required). Three formats are accepted, auto-detected:- a folder of Parquet files — recommended: faster to load, much smaller on disk, and the in-browser SPA uses them to let you browse the hierarchy of any OMOP concept (including concepts not yet in the catalog). With CSV, in-browser hierarchy browsing is limited to concepts already used in existing concept sets.
- a folder of CSV files as downloaded from Athena — works out of the box, no conversion needed.
- a
.duckdbdatabase file, if you've already loaded the vocabularies into DuckDB.
Athena ships CSV by default. To convert to Parquet, run one of the following in the folder containing the Athena CSV files (both CSV and Parquet can coexist — Parquet files are preferred when both are present):
DuckDB CLI (install:
brew install duckdbon macOS, or see duckdb.org/docs/installation):cd /path/to/athena_download for f in CONCEPT CONCEPT_ANCESTOR CONCEPT_RELATIONSHIP CONCEPT_SYNONYM \ RELATIONSHIP VOCABULARY DOMAIN CONCEPT_CLASS DRUG_STRENGTH; do [ -f "$f.csv" ] && duckdb -c \ "COPY (SELECT * FROM read_csv('$f.csv', delim='\t', header=true, quote='')) \ TO '$f.parquet' (FORMAT PARQUET);" done
Python (
pip install duckdb):cd /path/to/athena_download python3 -c " import duckdb, os for f in ['CONCEPT','CONCEPT_ANCESTOR','CONCEPT_RELATIONSHIP','CONCEPT_SYNONYM', 'RELATIONSHIP','VOCABULARY','DOMAIN','CONCEPT_CLASS','DRUG_STRENGTH']: if os.path.exists(f+'.csv'): duckdb.sql(\"COPY (SELECT * FROM read_csv('\"+f+\".csv', delim='\t', header=true, quote='')) TO '\"+f+\".parquet' (FORMAT PARQUET)\") print(f' {f}.csv -> {f}.parquet') "
-
loincPath— the official LOINC distribution, used by thedescribe-concept-setskill to retrieve LOINC Part descriptions andEXAMPLE_UCUM_UNITS(the source of truth for recommended units on laboratory concepts). Download from https://loinc.org/downloads/ (free, registration required) and point this to the unzipped folder containingLoincTable/Loinc.csvandAccessoryFiles/. -
snomedPath— the SNOMED CT International RF2 release, used by thedescribe-concept-setskill to retrieve SNOMED Fully Specified Names and textual definitions. Download from https://www.nlm.nih.gov/healthit/snomedct/international.html (free, UMLS licence required) and point this to the unzipped RF2 snapshot folder. -
umlsPath— the UMLS Metathesaurus, used by thedescribe-concept-setskill as a fallback source of clinical definitions when LOINC Part descriptions and SNOMED definitions are sparse. Download from https://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html (free, UMLS licence required) and point this to the Metathesaurus folder containingMRCONSO.RRFandMRDEF.RRF. -
npuCodesPath— the NPU (Nomenclature for Properties and Units) database, used by thedescribe-concept-setskill as the primary citable source for laboratory measurement definitions (NPU is the IFCC/IUPAC reference for clinical biology). Download from https://npu-terminology.org/npu-database/ (free) and point this to thenpu-codes-latest.csvfile.
For more detail on what each terminology contains and how the skills use them, see the in-app Documentation → Sources page.
Once filled in:
python3 resolve.pyruns without--vocab- The Claude skills
describe-concept-setandresolve-concept-setsno longer ask for these paths each run
git add config.json docs/logo.png docs/favicon.png docs/data.json docs/data_inline.js
git add concept_sets/ projects/ units/recommended_units.json id_counters.json
git commit -m "Initialize fork for <my team>"
git pushThen publish the static site (see §4 below).
Either:
- Use the SPA at
https://<your-org>.github.io/<your-repo>/— create concept sets locally (stored inlocalStorage), then "Propose on GitHub" to commit them. - Or edit
concept_sets/<id>.jsondirectly, thenpython3 build.py.
When creating a new concept set or project file by hand, increment the matching counter in id_counters.json (build.py validates this and bumps it automatically if too low).
python3 resolve.py # all sets, uses config.local.json
python3 resolve.py --id 42 # single setpython3 build.pyAfter any change to concept_sets/, projects/, units/, mapping_recommendations/, or concept_sets_resolved/, regenerate docs/data.json and docs/data_inline.js and commit them.
INDICATE keeps shipping fixes and new features. To pull them into your fork:
python3 update_from_upstream.pyWhat it does:
- Adds (or updates) a git remote named
upstreampointing at the URL inconfig.json -> github.upstream. - Runs
git fetch upstream <branch>. - Checks out a fixed list of code paths from
upstream/<branch>:build.py,resolve.py,reset.py,update_from_upstream.py, all ofdocs/*.js,docs/*.html,docs/*.css,.claude/skills/,CLAUDE.md,FORKING.md,config.local.example.json,.gitignore,.gitlab-ci.yml. - Leaves your content alone:
concept_sets/,projects/,units/,mapping_recommendations/,id_counters.json,config.json,config.local.json,docs/logo.png,docs/favicon.png,docs/data_dictionary.png, and the generateddocs/data.json/docs/data_inline.js.
Flags:
--dry-run— show what would change without modifying anything--yes— skip the confirmation prompt--upstream <url>— override the upstream URL--branch <name>— override the upstream branch
After it finishes:
git diff --stat # see what changed
git diff # review changes in detail
python3 build.py # if build.py or any data file changed
git commit -am "Update from upstream"
git pushIf a code change conflicts with a local customization (e.g. you edited docs/documentation.js for your own dictionary), git checkout upstream/main -- <path> will overwrite your version. Resolve by re-applying your local edits on top, or by removing that path from the UPSTREAM_PATHS list in update_from_upstream.py.
update_from_upstream.py is for routine updates. If the upstream has done a breaking change (e.g. renamed config.json keys), read the upstream CHANGELOG or commit log first — you may need to migrate your config.json by hand before running the update.
The static site lives in docs/. Both GitHub Pages and GitLab Pages can serve it directly without a build step (because docs/data.json and docs/data_inline.js are committed — they are regenerated locally with python3 build.py whenever source data changes).
In the GitHub repo: Settings → Pages → Source: Deploy from a branch → Branch: main, folder: /docs → Save. After 1–2 minutes the site is published at:
https://<your-org>.github.io/<your-repo>/
Each push to main redeploys automatically. You can also use a custom domain via the same settings page.
The repo ships a .gitlab-ci.yml that publishes docs/ to GitLab Pages on every push to the default branch. No configuration needed beyond pushing the repo to GitLab — GitLab Pages is enabled by default for public projects, and the pages job runs as part of the standard pipeline.
After the first successful pipeline, the site is published at:
https://<group>.gitlab.io/<project>/
(or, for personal namespaces, https://<username>.gitlab.io/<project>/). Custom domains are configured in Settings → Pages.
If you would rather not commit the generated files (docs/data.json, docs/data_inline.js, docs/resolved_concept_ids.json, docs/concept_sets_resolved/), add them to .gitignore and uncomment the build step inside .gitlab-ci.yml (or set up an equivalent GitHub Actions workflow). The default setup commits them to keep CI minimal — the data is the canonical artifact, the page just serves it.