Hi Tony,
I updated a few functions in pipeline_utils.py to support mouse (mt-/versioned Ensembl IDs), and the pipeline now runs end-to-end. I have a couple follow-up questions about the reference indices and output paths:
1.Reference naming
nascent_all / nascent_basic vs kb-python “standard” / “nac (nascent)”
In the kb-python documentation the two workflows are typically described as:
• standard (spliced/cDNA only)
• nac / nascent (cDNA + nascent/intron-containing)
In our references directory, I see two subfolders:
• nascent_all
• nascent_basic
Could you clarify which one corresponds to kb-python’s standard vs nascent/nac, and what exactly “all” vs “basic” means here (e.g., gene biotype inclusion, pseudogenes, etc.)?
2.Snakefile hard-codes nascent_all (yaml nascent_genome seems unused)
In config.yaml we have:
• nascent_genome: "nascent_all" # Subdirectory with genome index files
But in the Snakefile the inputs appear hard-coded to nascent_all:
|
index_idx=f"{REFERENCES}/nascent_all/index.idx", |
|
t2g=f"{REFERENCES}/nascent_all/t2g.txt", |
|
cdna=f"{REFERENCES}/nascent_all/cdna.txt", |
|
nascent=f"{REFERENCES}/nascent_all/nascent.txt", |
Would you prefer we modify the Snakefile to use:
{REFERENCES}/{config['nascent_genome']}/..."
3. Output location: annotated h5ad / MTX written to $SCRATCH
At the “7. SAVING OUTPUTS…” stage, the log indicates the annotated .h5ad and .mtx outputs are written under $SCRATCH (e.g., /scratch/users/.../counts_filtered/...).
Should these be manually copied/moved to the final results directory, or is there a rule that is supposed to sync outputs from scratch back to the project/results path?
Thanks a lot!
Best,
Shanshan
Hi Tony,
I updated a few functions in pipeline_utils.py to support mouse (mt-/versioned Ensembl IDs), and the pipeline now runs end-to-end. I have a couple follow-up questions about the reference indices and output paths:
1.Reference naming
nascent_all / nascent_basic vs kb-python “standard” / “nac (nascent)”
In the kb-python documentation the two workflows are typically described as:
In our references directory, I see two subfolders:
Could you clarify which one corresponds to kb-python’s standard vs nascent/nac, and what exactly “all” vs “basic” means here (e.g., gene biotype inclusion, pseudogenes, etc.)?
2.Snakefile hard-codes nascent_all (yaml nascent_genome seems unused)
In config.yaml we have:
But in the Snakefile the inputs appear hard-coded to nascent_all:
perturb_pipeline/analysis/Snakefile
Lines 312 to 315 in 9dd7137
Would you prefer we modify the Snakefile to use:
{REFERENCES}/{config['nascent_genome']}/..."3. Output location: annotated h5ad / MTX written to $SCRATCH
At the “7. SAVING OUTPUTS…” stage, the log indicates the annotated .h5ad and .mtx outputs are written under $SCRATCH (e.g., /scratch/users/.../counts_filtered/...).
Should these be manually copied/moved to the final results directory, or is there a rule that is supposed to sync outputs from scratch back to the project/results path?
Thanks a lot!
Best,
Shanshan