Skip to content

Properly handle relative paths for schema imports #81

@jonasfreimuth

Description

@jonasfreimuth

Currently, as far as I can tell, relative paths in the import field of a PEP schema get resolved relative to the current working dir via the use of os.abspath in peppy (https://github.com/pepkit/peppy/blob/60523c51cc7243680797f681a9d49a1ecec3c6ca/peppy/utils.py#L154). However, my assumption as a developer using the PEPkit would be that resolution of relative paths in the import field are relative to the location of the PEP schema which imports upstream schemas (the "importing schema").

The current working dir being used for relative path resolution is an issue when the (pipleine) developer writing the PEP schema has no control over what the current working directory is in relation to the importing schema, as is the case for Snakemake PEP schemas. Here, the working directory is specified by the user running the Snakemake pipeline (through specifying the output dir), while the pipeline developer might like to specify a common base schema for all their workflows (e.g. via a git submodule relative to the pipeline's schema). In this case, Eido cannot correctly resolve the path to the imported base schema in the pipeline's PEP schema (except in the special case that the output directory the pipeline user specified happens to the same as the location of the pipeline's PEP schema, which is likely not the cases most of the time). Therefore, relative paths currently cannot be used in the import field of a PEP schema without also restricting the working directory at the time the schema is constructed.

As a solution, I would propose that when another schema is imported and the import field contains a relative path, that that path is simply appended to the path of the importing schema, if the current schema is identifiable by a path. Otherwise the current working directory may still be used, if applicable. All of this should be possible to implement in the read_schema function.

Example

Project structure:

- project_root
  - workflow
    - Snakefile
    - schemas
      - pep_validation.yaml
    - common
      - schemas
        - common_pep_validation.yaml

Contents of workflow/schemas/pep_validation.yaml:

[...]
imports:
  - "../../common/schemas/common_pep_validation.yaml"
[...]

Contents of workflow/Snakefile:

[...]
pepschema: "schemas/validation.yaml"
[...]

This will lead to an error that Eido cannot find [working dir]../../common/schemas/common_pep_validation.yaml whenever working_dir is not set to anything nested two levels below the workflow dir. In this example working dir is set via snakemake --directory working_dir, i.e., the Snakemake pipeline's output directory.

For interactive use, I have fleshed out the example and attached it:

import_relative_path_demo.tar.gz

Set up the env via conda with the following command

conda env create --prefix ./test_env --file environment.yaml

and run

conda run --prefix test_env/ snakemake -n --config pep_config=./pep/config.yaml

and it should complain about not finding the common config file, while

conda run -p test_env/ snakemake -n --config pep_config=../../pep/config.yaml --directory workflow/test_output/

should succeed on account of the output dir being nested in the same way as the pipeline specific schema.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions