updating README

EvanDietzMorris · EvanDietzMorris · commit 95f45bdecc0e · 2026-03-26T01:58:35.000-04:00
diff --git a/README.md b/README.md
@@ -2,202 +2,148 @@
 
 ### Operational Routine for the Ingest and Output of Networks
 
-This package takes data sets from various sources and converts them into Knowledge Graphs.
+ORION ingests data from knowledge sources and converts them into [Biolink Model](https://biolink.github.io/biolink-model/) knowledge graphs in [KGX](https://github.com/biolink/kgx) format.
 
-Each data source will go through the following pipeline before it can be included in a graph:
+Each data source goes through the following pipeline:
 
-1. Fetch (retrieve an original data source)
-2. Parse (convert the data source into KGX files)
-3. Normalize (use normalization services to convert identifiers and ontology terms to preferred synonyms)
-4. Supplement (add supplementary knowledge specific to that source)
+1. **Fetch** - retrieve the original data source
+2. **Parse** - transform the data into KGX files
+3. **Normalize** - use normalization services to convert identifiers and ontology terms to preferred synonyms
+4. **Supplement** - add supplementary knowledge specific to that source
 
-To build a graph use a Graph Spec yaml file to specify the sources you want. Some examples live in `graph_specs` folder.
+Sources are defined in a Graph Spec yaml file (see examples in the `graph_specs/` directory). ORION automatically runs each specified source through the pipeline and merges them into a Knowledge Graph.
 
-ORION will automatically run each data source specified through the necessary pipeline. Then it will merge the specified sources into a Knowledge Graph.
+### Installation
 
-### Installing and Configuring ORION
+ORION requires [uv](https://docs.astral.sh/uv/) for dependency management.
 
-Create a parent directory:
-
-```
-mkdir ~/ORION_root
-```
-
-Clone the code repository:
-
-```
-cd ~/ORION_root
+```bash
 git clone https://github.com/RobokopU24/ORION.git
+cd ORION
+uv sync --extra robokop
 ```
 
-Next create directories where data sources, graphs, and logs will be stored.
-
-**ORION_STORAGE** - for storing data sources
+The core library is also available on PyPI (`pip install robokop-orion`), but the full repository is needed to utilize ingest modules from the [ROBOKOP](https://robokop.renci.org/) project.
 
-**ORION_GRAPHS** - for storing knowledge graphs
+### CLI Commands
 
-**ORION_LOGS** - for storing logs
+After installation, the following commands are available (prefix with `uv run` if not using a uv-managed shell):
 
-You can do this manually, or use the script indicated below to set up a default workspace.
+| Command | Description                                           |
+|---|-------------------------------------------------------|
+| `orion-build` | Build complete knowledge graphs from a Graph Spec     |
+| `orion-ingest` | Run the ingest pipeline for individual data sources   |
+| `orion-merge` | Merge KGX node/edge files                             |
+| `orion-meta-kg` | Generate MetaKG and test data files                   |
+| `orion-redundant-kg` | Generate edge files with redundant biolink predicates |
+| `orion-ac` | Generate AnswerCoalesce files                         |
+| `orion-neo4j-dump` | Generate Neo4j database dumps                         |
+| `orion-memgraph-dump` | Generate Memgraph database dumps                      |
 
-Option 1: Use this script to create the directories and set the environment variables:
+### Configuring ORION
 
-```
-cd ~/ORION_root/ORION/
-source ./set_up_test_env.sh
-```
+ORION uses three directories for its data, configured via environment variables:
 
-Option 2: Create three directories and set environment variables specifying paths to the locations of those directories.
+| Variable | Purpose                              |
+|---|--------------------------------------|
+| `ORION_STORAGE` | Data ingest pipeline storage |
+| `ORION_GRAPHS` | Knowledge graph outputs              |
+| `ORION_LOGS` | Log files                            |
 
-```
-mkdir ~/ORION_root/storage/
-export ORION_STORAGE=~/ORION_root/storage/
+You can set these up manually or use the provided script:
 
-mkdir ~/ORION_root/graphs/
-export ORION_GRAPHS=~/ORION_root/graphs/
-
-mkdir ~/ORION_root/logs/
-export ORION_LOGS=~/ORION_root/logs/
+```bash
+source ./set_up_test_env.sh
 ```
 
-#### Specify Graph Spec file.
-
-Next create or select a Graph Spec yaml file, where the content of knowledge graphs to be built is specified.
+#### Graph Spec
 
-Set either of the following environment variables, but not both:
+A Graph Spec yaml file defines which sources to include in a knowledge graph. Set one of the following environment variables (not both):
 
-Option 1: ORION_GRAPH_SPEC - the name of a Graph Spec file located in the graph_specs directory of ORION
-
-```
+```bash
+# Option 1: Name of a file in the graph_specs/ directory
 export ORION_GRAPH_SPEC=example-graph-spec.yaml
-```
-
-Option 2: ORION_GRAPH_SPEC_URL - a URL pointing to a Graph Spec yaml file
 
-```
+# Option 2: URL pointing to a Graph Spec yaml file
 export ORION_GRAPH_SPEC_URL=https://stars.renci.org/var/data_services/graph_specs/default-graph-spec.yaml
 ```
 
-#### Building graph
-
-To build a custom graph, alter a Graph Spec file, which is composed of a list of graphs.
-
-For each graph, specify:
+Here is a simple Graph Spec example:
 
-**graph_id** - a unique identifier string for the graph, with no spaces
-
-**sources** - a list of sources identifiers for data sources to include in the graph
-
-See the full list of data sources and their identifiers in the [data sources file](https://github.com/RobokopU24/ORION/blob/master/orion/data_sources.py).
-
-Here is a simple example.
-
-```
+```yaml
 graphs:
   - graph_id: Example_Graph
     graph_name: Example Graph
     graph_description: A free text description of what is in the graph.
     output_format: neo4j
     sources:
-      - source_id: CTD
+      - source_id: DrugCentral
       - source_id: HGNC
 ```
 
-There are variety of ways to further customize a knowledge graph. The following are parameters you can set for a particular data source. Mostly, these parameters are used to indicate that you'd like to use a previously built version of a data source or a specific normalization of a source. If you specify versions that are not the latest, and haven't previously built a data source or graph with those versions, it probably won't work.
-
-**source_version** - the version of the data source, as determined by ORION
-
-**parsing_version** - the version of the parsing code in ORION for this source
-
-**merge_strategy** - used to specify alternative merge strategies
-
-The following are parameters you can set for the entire graph, or for an individual data source:
-
-**node_normalization_version** - the version of the node normalizer API (see: https://nodenormalization-sri.renci.org/openapi.json)
-
-**edge_normalization_version** - the version of biolink model used to normalize predicates and validate the KG
+See the full list of data sources and their identifiers in the [data sources file](https://github.com/RobokopU24/ORION/blob/master/orion/data_sources.py).
 
-**strict_normalization** - True or False specifying whether to discard nodes, node types, and edges connected to those nodes when they fail to normalize
+#### Graph Spec Parameters
 
-**conflation** - True or False flag specifying whether to conflate genes with proteins and chemicals with drugs
+The following parameters can be set per data source:
 
-For example, we could customize the previous example:
+- **merge_strategy** - alternative merge strategies
+- **strict_normalization** - whether to discard nodes that fail to normalize (true/false)
+- **conflation** - whether to conflate genes with proteins and chemicals with drugs (true/false)
 
-```
-graphs:
-  - graph_id: Example_Graph
-    graph_name: Example Graph
-    graph_description: A free text description of what is in the graph.
-    output_format: neo4j
-    sources:
-      - source_id: CTD
-      - source_id: HGNC
-```
+The following can be set at the graph level:
 
-See the `graph_specs` directory for more examples.
+- **add_edge_id** - whether to add unique identifiers to edges (true/false)
+- **edge_id_type** - if add_edge_id is true, the type of identifier can be specified (uuid or orion)
 
-### Running ORION
+See the `graph_specs/` directory for more examples.
 
-Install Docker to create and run the necessary containers.
+### Running with Docker
 
-Use the following command to build the necessary images.
+Build the image:
 
-```
+```bash
 docker compose build
 ```
 
-To build every graph in your Graph Spec use the following command. This runs `orion-build all` on the image.
+Build all graphs in the configured Graph Spec:
 
-```
+```bash
 docker compose up
 ```
 
-#### Building specific graphs
-
-To build an individual graph use `orion-build` with a graph_id from the Graph Spec.
+Build a specific graph:
 
-Usage: `orion-build [-h] graph_id`
-positional arguments:
-`graph_id` : ID of the graph to build. Must match an ID from the configured Graph Spec.
-
-Example command to create a graph from a Graph Spec with graph_id: Example_Graph:
-
-```
+```bash
 docker compose run --rm orion orion-build Example_Graph
 ```
 
-#### Run ORION Pipeline on a single data source.
+Run the ingest pipeline for a single data source:
 
-To run the ORION pipeline for a single data source and transform it into KGX files, you can use `orion-load`.
-
-```
-optional arguments:
-  -h, --help : show this help message and exit
-  -t, --test_mode : Test mode will process a small sample version of the data.
-  -f, --fresh_start_mode : Fresh start mode will ignore previous states and overwrite previous data.
-  -l, --lenient_normalization : Lenient normalization mode will allow nodes that do not normalize to persist in the finalized kgx files.
+```bash
+docker compose run --rm orion orion-ingest DrugCentral
 ```
 
-Example command to convert data source CTD to KGX files.
+See available data sources and options:
 
-```
-docker compose run --rm orion orion-load CTD
+```bash
+docker compose run --rm orion orion-ingest -h
 ```
 
-To see the available arguments and a list of supported data sources:
+### Development
 
-```
-docker compose run --rm orion orion-load -h
-```
+Install dev dependencies with [uv](https://docs.astral.sh/uv/):
 
-#### Testing and Troubleshooting
+```bash
+uv sync --extra robokop --group dev
+```
 
-If you are experiencing issues or errors you may want to run tests:
+Run tests:
 
-```
-docker-compose run --rm orion pytest /ORION
+```bash
+uv run pytest tests/
 ```
 
-#### Contributing to ORION
+### Contributing
 
-Contributions are welcome, see the [Contributer README](README-CONTRIBUTER.md).
+Contributions are welcome, see the [Contributor README](README-CONTRIBUTER.md).