Skip to content

Commit 95f45bd

Browse files
updating README
1 parent eecd862 commit 95f45bd

1 file changed

Lines changed: 76 additions & 130 deletions

File tree

README.md

Lines changed: 76 additions & 130 deletions
Original file line numberDiff line numberDiff line change
@@ -2,202 +2,148 @@
22

33
### Operational Routine for the Ingest and Output of Networks
44

5-
This package takes data sets from various sources and converts them into Knowledge Graphs.
5+
ORION ingests data from knowledge sources and converts them into [Biolink Model](https://biolink.github.io/biolink-model/) knowledge graphs in [KGX](https://github.com/biolink/kgx) format.
66

7-
Each data source will go through the following pipeline before it can be included in a graph:
7+
Each data source goes through the following pipeline:
88

9-
1. Fetch (retrieve an original data source)
10-
2. Parse (convert the data source into KGX files)
11-
3. Normalize (use normalization services to convert identifiers and ontology terms to preferred synonyms)
12-
4. Supplement (add supplementary knowledge specific to that source)
9+
1. **Fetch** - retrieve the original data source
10+
2. **Parse** - transform the data into KGX files
11+
3. **Normalize** - use normalization services to convert identifiers and ontology terms to preferred synonyms
12+
4. **Supplement** - add supplementary knowledge specific to that source
1313

14-
To build a graph use a Graph Spec yaml file to specify the sources you want. Some examples live in `graph_specs` folder.
14+
Sources are defined in a Graph Spec yaml file (see examples in the `graph_specs/` directory). ORION automatically runs each specified source through the pipeline and merges them into a Knowledge Graph.
1515

16-
ORION will automatically run each data source specified through the necessary pipeline. Then it will merge the specified sources into a Knowledge Graph.
16+
### Installation
1717

18-
### Installing and Configuring ORION
18+
ORION requires [uv](https://docs.astral.sh/uv/) for dependency management.
1919

20-
Create a parent directory:
21-
22-
```
23-
mkdir ~/ORION_root
24-
```
25-
26-
Clone the code repository:
27-
28-
```
29-
cd ~/ORION_root
20+
```bash
3021
git clone https://github.com/RobokopU24/ORION.git
22+
cd ORION
23+
uv sync --extra robokop
3124
```
3225

33-
Next create directories where data sources, graphs, and logs will be stored.
34-
35-
**ORION_STORAGE** - for storing data sources
26+
The core library is also available on PyPI (`pip install robokop-orion`), but the full repository is needed to utilize ingest modules from the [ROBOKOP](https://robokop.renci.org/) project.
3627

37-
**ORION_GRAPHS** - for storing knowledge graphs
28+
### CLI Commands
3829

39-
**ORION_LOGS** - for storing logs
30+
After installation, the following commands are available (prefix with `uv run` if not using a uv-managed shell):
4031

41-
You can do this manually, or use the script indicated below to set up a default workspace.
32+
| Command | Description |
33+
|---|-------------------------------------------------------|
34+
| `orion-build` | Build complete knowledge graphs from a Graph Spec |
35+
| `orion-ingest` | Run the ingest pipeline for individual data sources |
36+
| `orion-merge` | Merge KGX node/edge files |
37+
| `orion-meta-kg` | Generate MetaKG and test data files |
38+
| `orion-redundant-kg` | Generate edge files with redundant biolink predicates |
39+
| `orion-ac` | Generate AnswerCoalesce files |
40+
| `orion-neo4j-dump` | Generate Neo4j database dumps |
41+
| `orion-memgraph-dump` | Generate Memgraph database dumps |
4242

43-
Option 1: Use this script to create the directories and set the environment variables:
43+
### Configuring ORION
4444

45-
```
46-
cd ~/ORION_root/ORION/
47-
source ./set_up_test_env.sh
48-
```
45+
ORION uses three directories for its data, configured via environment variables:
4946

50-
Option 2: Create three directories and set environment variables specifying paths to the locations of those directories.
47+
| Variable | Purpose |
48+
|---|--------------------------------------|
49+
| `ORION_STORAGE` | Data ingest pipeline storage |
50+
| `ORION_GRAPHS` | Knowledge graph outputs |
51+
| `ORION_LOGS` | Log files |
5152

52-
```
53-
mkdir ~/ORION_root/storage/
54-
export ORION_STORAGE=~/ORION_root/storage/
53+
You can set these up manually or use the provided script:
5554

56-
mkdir ~/ORION_root/graphs/
57-
export ORION_GRAPHS=~/ORION_root/graphs/
58-
59-
mkdir ~/ORION_root/logs/
60-
export ORION_LOGS=~/ORION_root/logs/
55+
```bash
56+
source ./set_up_test_env.sh
6157
```
6258

63-
#### Specify Graph Spec file.
64-
65-
Next create or select a Graph Spec yaml file, where the content of knowledge graphs to be built is specified.
59+
#### Graph Spec
6660

67-
Set either of the following environment variables, but not both:
61+
A Graph Spec yaml file defines which sources to include in a knowledge graph. Set one of the following environment variables (not both):
6862

69-
Option 1: ORION_GRAPH_SPEC - the name of a Graph Spec file located in the graph_specs directory of ORION
70-
71-
```
63+
```bash
64+
# Option 1: Name of a file in the graph_specs/ directory
7265
export ORION_GRAPH_SPEC=example-graph-spec.yaml
73-
```
74-
75-
Option 2: ORION_GRAPH_SPEC_URL - a URL pointing to a Graph Spec yaml file
7666

77-
```
67+
# Option 2: URL pointing to a Graph Spec yaml file
7868
export ORION_GRAPH_SPEC_URL=https://stars.renci.org/var/data_services/graph_specs/default-graph-spec.yaml
7969
```
8070

81-
#### Building graph
82-
83-
To build a custom graph, alter a Graph Spec file, which is composed of a list of graphs.
84-
85-
For each graph, specify:
71+
Here is a simple Graph Spec example:
8672

87-
**graph_id** - a unique identifier string for the graph, with no spaces
88-
89-
**sources** - a list of sources identifiers for data sources to include in the graph
90-
91-
See the full list of data sources and their identifiers in the [data sources file](https://github.com/RobokopU24/ORION/blob/master/orion/data_sources.py).
92-
93-
Here is a simple example.
94-
95-
```
73+
```yaml
9674
graphs:
9775
- graph_id: Example_Graph
9876
graph_name: Example Graph
9977
graph_description: A free text description of what is in the graph.
10078
output_format: neo4j
10179
sources:
102-
- source_id: CTD
80+
- source_id: DrugCentral
10381
- source_id: HGNC
10482
```
10583
106-
There are variety of ways to further customize a knowledge graph. The following are parameters you can set for a particular data source. Mostly, these parameters are used to indicate that you'd like to use a previously built version of a data source or a specific normalization of a source. If you specify versions that are not the latest, and haven't previously built a data source or graph with those versions, it probably won't work.
107-
108-
**source_version** - the version of the data source, as determined by ORION
109-
110-
**parsing_version** - the version of the parsing code in ORION for this source
111-
112-
**merge_strategy** - used to specify alternative merge strategies
113-
114-
The following are parameters you can set for the entire graph, or for an individual data source:
115-
116-
**node_normalization_version** - the version of the node normalizer API (see: https://nodenormalization-sri.renci.org/openapi.json)
117-
118-
**edge_normalization_version** - the version of biolink model used to normalize predicates and validate the KG
84+
See the full list of data sources and their identifiers in the [data sources file](https://github.com/RobokopU24/ORION/blob/master/orion/data_sources.py).
11985
120-
**strict_normalization** - True or False specifying whether to discard nodes, node types, and edges connected to those nodes when they fail to normalize
86+
#### Graph Spec Parameters
12187
122-
**conflation** - True or False flag specifying whether to conflate genes with proteins and chemicals with drugs
88+
The following parameters can be set per data source:
12389
124-
For example, we could customize the previous example:
90+
- **merge_strategy** - alternative merge strategies
91+
- **strict_normalization** - whether to discard nodes that fail to normalize (true/false)
92+
- **conflation** - whether to conflate genes with proteins and chemicals with drugs (true/false)
12593
126-
```
127-
graphs:
128-
- graph_id: Example_Graph
129-
graph_name: Example Graph
130-
graph_description: A free text description of what is in the graph.
131-
output_format: neo4j
132-
sources:
133-
- source_id: CTD
134-
- source_id: HGNC
135-
```
94+
The following can be set at the graph level:
13695
137-
See the `graph_specs` directory for more examples.
96+
- **add_edge_id** - whether to add unique identifiers to edges (true/false)
97+
- **edge_id_type** - if add_edge_id is true, the type of identifier can be specified (uuid or orion)
13898
139-
### Running ORION
99+
See the `graph_specs/` directory for more examples.
140100

141-
Install Docker to create and run the necessary containers.
101+
### Running with Docker
142102

143-
Use the following command to build the necessary images.
103+
Build the image:
144104

145-
```
105+
```bash
146106
docker compose build
147107
```
148108

149-
To build every graph in your Graph Spec use the following command. This runs `orion-build all` on the image.
109+
Build all graphs in the configured Graph Spec:
150110

151-
```
111+
```bash
152112
docker compose up
153113
```
154114

155-
#### Building specific graphs
156-
157-
To build an individual graph use `orion-build` with a graph_id from the Graph Spec.
115+
Build a specific graph:
158116

159-
Usage: `orion-build [-h] graph_id`
160-
positional arguments:
161-
`graph_id` : ID of the graph to build. Must match an ID from the configured Graph Spec.
162-
163-
Example command to create a graph from a Graph Spec with graph_id: Example_Graph:
164-
165-
```
117+
```bash
166118
docker compose run --rm orion orion-build Example_Graph
167119
```
168120

169-
#### Run ORION Pipeline on a single data source.
121+
Run the ingest pipeline for a single data source:
170122

171-
To run the ORION pipeline for a single data source and transform it into KGX files, you can use `orion-load`.
172-
173-
```
174-
optional arguments:
175-
-h, --help : show this help message and exit
176-
-t, --test_mode : Test mode will process a small sample version of the data.
177-
-f, --fresh_start_mode : Fresh start mode will ignore previous states and overwrite previous data.
178-
-l, --lenient_normalization : Lenient normalization mode will allow nodes that do not normalize to persist in the finalized kgx files.
123+
```bash
124+
docker compose run --rm orion orion-ingest DrugCentral
179125
```
180126

181-
Example command to convert data source CTD to KGX files.
127+
See available data sources and options:
182128

183-
```
184-
docker compose run --rm orion orion-load CTD
129+
```bash
130+
docker compose run --rm orion orion-ingest -h
185131
```
186132

187-
To see the available arguments and a list of supported data sources:
133+
### Development
188134

189-
```
190-
docker compose run --rm orion orion-load -h
191-
```
135+
Install dev dependencies with [uv](https://docs.astral.sh/uv/):
192136

193-
#### Testing and Troubleshooting
137+
```bash
138+
uv sync --extra robokop --group dev
139+
```
194140

195-
If you are experiencing issues or errors you may want to run tests:
141+
Run tests:
196142

197-
```
198-
docker-compose run --rm orion pytest /ORION
143+
```bash
144+
uv run pytest tests/
199145
```
200146

201-
#### Contributing to ORION
147+
### Contributing
202148

203-
Contributions are welcome, see the [Contributer README](README-CONTRIBUTER.md).
149+
Contributions are welcome, see the [Contributor README](README-CONTRIBUTER.md).

0 commit comments

Comments
 (0)