Modern decentralized software ecosystems evolve through crowdsourced improvement proposals (IPs) that are continuously shaped and autonomously implemented by independent actors. As a result, these ecosystems exhibit so-called Community-Driven Variability (CDV) 1, a novel paradigm that extends beyond traditional variability-intensive systems. This tool allows to explore the proposal space of such ecosystems by providing interactive visualizations and insights about their evolution, authorship, classification, conformity, and inter-proposal relationships.
CDV Explorer is an ecosystem-agnostic pipeline for mining and analysing improvement proposals (IPs). At the moment, the explorer ships with active source integrations for the following two CDV-exhibiting ecosystems:
| Ecosystem | Proposals | Source repository |
|---|---|---|
| Bitcoin | Bitcoin Improvement Proposals (BIPs) | bitcoin/bips |
| Nostr | Nostr Implementation Possibilities (NIPs) | nostr-protocol/nips |
The live site is available at seg-unibe.github.io/cdv-explorer, with a demo video on YouTube.
| Tool | Version | macOS using brew |
Linux | Windows using winget |
|---|---|---|---|---|
| Python | 3.12+ | brew install python |
sudo apt install python3 |
winget install Python.Python.3 |
| Node.js | 22+ (npm bundled) | brew install node |
sudo apt install nodejs npm |
winget install OpenJS.NodeJS |
| Git | any | brew install git |
sudo apt install git |
winget install Git.Git |
git clone https://github.com/SEG-UNIBE/cdv-explorer.git
cd cdv-explorerpython -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activatepip install -r requirements.txtThe run command clones/updates the source repository, extracts and enriches proposal data, builds analysis artifacts, and produces React-ready exports -- all in one step.
Bitcoin (BIPs):
python main.py run -e bitcoin -s 2026-03-16 --skipllmNostr (NIPs):
python main.py run -e nostr -s 2026-03-16 --skipllmNote
Omit --skipllm to also run the OpenAI-based inter-proposal relation extraction. Set the model in the ecosystem YAML under llm.model, and provide the API key via the OPENAI_API_KEY environment variable or an apikey.secret file in the project root. The pipeline picks up the file automatically when the environment variable is not set.
Snapshot date:
-sis required. The pipeline resolves to the last commit whose committer timestamp falls on or beforeYYYY-MM-DD 23:59:59and checks out the repository at that point.
cd react
npm install
npm start # Vite dev server, typically at http://localhost:5173The frontend now uses Vite for local development and production builds.
npm start and npm run dev both regenerate the snapshot index, proposal link index, and ecosystem metadata before launching the dev server.
For a production build:
npm run buildFor the frontend test suite:
npm test -- --runCDV Explorer is driven by a Typer CLI.
Run python main.py --help for a full overview.
python main.py run [OPTIONS]
Options:
-e, --ecosystem TEXT Ecosystem slug (default: first registered)
--source TEXT Source slug (default: all sources for that ecosystem)
-s, --snapshot TEXT Snapshot date YYYY-MM-DD [required]
--skipllm Skip LLM-based extraction
--focus TEXT Process only specific proposals (e.g. '1-9,30-44,85,A0')Snapshot rebuild workflow:
When updating LLM extraction, dependency analysis, or other enrichment logic, rebuild only the affected proposals to avoid re-processing large snapshots:
# Rebuild analysis and postprocess for a specific proposal
python main.py run -e bitcoin -s 2026-03-16 --focus 340 --skipllm
# Rebuild analysis for multiple proposals
python main.py run -e bitcoin -s 2026-03-16 --focus 1-10,320-340 --skipllm
# Rebuild an entire snapshot (regenerate all four pipeline stages)
python main.py run -e bitcoin -s 2026-03-16 --skipllmNote
The --focus flag skips harvest and preprocess stages, re-running only analysis and postprocess on the targeted proposals. This preserves existing enrichment (compliance checks, Git history, word clouds) while updating derived metrics.
python main.py snapshots
python main.py snapshots -e bitcoinpython main.py doctorRuns read-only checks for required tooling, installed Python packages, configured ecosystem sources, snapshot artifacts, generated frontend indexes, and optional LLM credentials.
python main.py artifacts rebuild -e bitcoin -s 2026-03-16
python main.py artifacts rebuild -e bitcoin --allUse this when the raw/preprocessed proposal JSON is already available and you only want to rebuild analysis and React-facing exports after changing downstream logic.
python main.py ground-truth sample-ips --wizardThis interactive helper pre-fills ground_truth/ips.csv from a stratified sample of IPs, thereby enlarging the ground truth data set of manually reviewed IPs and their interrelations (maintained in ground_truth/interrelations.csv).
python main.py ecosystems list # show all registered ecosystems
python main.py ecosystems show bitcoin # dump full YAML config as JSON
python main.py ecosystems add # scaffold a new ecosystem YAML (interactive)
python main.py ecosystems add-source bitcoin # add a second IP catalog to an ecosystemFor local test/development work, install the dev requirements instead of the runtime-only set:
pip install -r requirements-dev.txt
python -m pytestFor the React frontend:
cd react
npm install
npm test -- --run
npm run buildThe frontend uses Vite for bundling/dev serving and Vitest for unit tests.
Build output is written to react/build to stay compatible with the existing GitHub Pages and Cloudflare deployment workflows.
The pipeline transforms raw IP corpora into versioned, frontend-ready datasets in four stages: Harvest → Preprocess → Analysis → Postprocess. Ecosystem-specific logic is confined to the first two stages, keeping the analysis and frontend layers fully reusable across ecosystems.
.
├── ecosystems/ # ecosystem configs (YAML) — one file per ecosystem
├── pipeline/
│ ├── harvest/ # Stage I — ecosystem-specific: clone & snapshot checkout
│ └── preprocess/ # Stage II — ecosystem-specific: preamble extraction & enrichment
├── analysis/ # Stage III/IV — ecosystem-agnostic analysis modules & postprocess
│ ├── authorship/
│ ├── classification/
│ ├── conformity/
│ ├── dependencies/
│ ├── evolution/
│ └── wordcloud/
├── react/ # interactive frontend (D3, PrimeReact)
└── ip_data/
└── <ecosystem>/ # e.g. bitcoin, nostr, ...
├── <source>/ # e.g. bips, slips, ...
│ ├── 01_harvest/ # raw IP documents [gitignored]
│ ├── 02_preprocess/ # IP object model (JSON) ← Stage II output
│ ├── 03_analysis/ # analysis artifacts ← Stage III output
│ └── 04_postprocess/ # frontend payloads ← Stage IV output
├── _combined/ # precomputed multi-source artifacts
└── ground_truth/ # curated benchmark CSVsEach proposal is stored as a JSON file under 02_preprocess/<snapshot>/.
The schema has three top-level blocks: raw (verbatim preamble), meta (Git-derived history and timestamps), and insights (compliance checks, word list, status history, inter-proposal relations).
{
"raw": {
"preamble": [dict]
},
"meta": {
"last_commit": [datetime],
"total_commits": [int],
"git_history": [/* ... */]
},
"insights": {
"formal_compliance": [/* ... */],
"word_list": [dict],
"changes_in_status": [/* ... */],
"interrelations": {
"preamble_extracted": [set of IPs],
"body_extracted_regex": [set of IPs],
"body_extracted_llm": [set of IPs]
}
}
}The exact object shapes inside interrelations are source-aware and method-specific.
In particular, targets use source_slug:id keys, regex-derived entries carry occurrence counts, and LLM-derived entries are stored as timestamped runs with per-dependency metadata.
Concrete examples: bip-0340.json (Schnorr Signatures) · nip-10.json (Text Notes and Threads)
- Run
python main.py ecosystems addand answer the prompts — a scaffoldedecosystems/<slug>.ymlis created. - Edit the YAML to fill in classification dimensions, conformity standards, preamble field rules, and any other config.
- Implement the ecosystem-specific Stage I & II logic in
pipeline/:harvest/— a harvester that clones or fetches the IP source and checks out a snapshotpreprocess/— an extractor that parses raw documents into the canonical IP object model, and a compliance checker underpreprocess/checkers/
- Add a corresponding adapter under
react/src/ecosystems/<slug>/(copybitcoin/ornostr/as a template). - Run
python main.py run -e <slug> -s <date> --skipllmto verify the pipeline end-to-end.
Production is deployed to GitHub Pages via .github/workflows/deploy-prod.yml after a successful CI run on main or master.
Development builds are deployed to Cloudflare Pages via .github/workflows/deploy-dev.yml after a successful CI run on dev.
To enable GitHub Pages on a fork, go to Settings > Pages and set the source to GitHub Actions.
Both workflows build the Vite app and publish the generated react/build directory.
Deactivate the virtual environment:
deactivateOptionally remove individual artifacts without deleting the repository:
rm -rf .venv
rm -rf ip_data/**/01_harvest # harvested source repos (gitignored, can be large)
rm -rf react/node_modules react/buildOr remove everything at once:
cd .. && rm -rf cdv-explorerFootnotes
-
Bögli, R. et al. Community-driven variability: characterizing a new software variability paradigm. Autom Softw Eng 33, 67 (2026). 10.1007/s10515-026-00594-0 ↩
