Questions on kb ref index naming and Snakemake output locations

Hi Tony,

I updated a few functions in pipeline_utils.py to support mouse (mt-/versioned Ensembl IDs), and the pipeline now runs end-to-end. I have a couple follow-up questions about the reference indices and output paths:

### 1.Reference naming
nascent_all / nascent_basic vs kb-python “standard” / “nac (nascent)”
In the [kb-python documentation](https://github.com/pachterlab/kallisto-transcriptome-indices) the two workflows are typically described as:

	•	standard (spliced/cDNA only)
	•	nac / nascent (cDNA + nascent/intron-containing)

In our references directory, I see two subfolders:

	•	nascent_all
	•	nascent_basic

Could you clarify which one corresponds to kb-python’s standard vs nascent/nac, and what exactly “all” vs “basic” means here (e.g., gene biotype inclusion, pseudogenes, etc.)?

### 2.Snakefile hard-codes nascent_all (yaml nascent_genome seems unused)

In config.yaml we have:

	•	nascent_genome: "nascent_all"  # Subdirectory with genome index files

But in the Snakefile the inputs appear hard-coded to nascent_all:

https://github.com/tkzeng/perturb_pipeline/blob/9dd71376970de9ae64587e5375fe988e1832ede3/analysis/Snakefile#L312-L315

Would you prefer we modify the Snakefile to use:
`{REFERENCES}/{config['nascent_genome']}/..."`

### 3.	Output location: annotated h5ad / MTX written to $SCRATCH
At the “7. SAVING OUTPUTS…” stage, the log indicates the annotated .h5ad and .mtx outputs are written under $SCRATCH (e.g., /scratch/users/.../counts_filtered/...).
Should these be manually copied/moved to the final results directory, or is there a rule that is supposed to sync outputs from scratch back to the project/results path?

Thanks a lot!
Best,
Shanshan

	index_idx=f"{REFERENCES}/nascent_all/index.idx",
	t2g=f"{REFERENCES}/nascent_all/t2g.txt",
	cdna=f"{REFERENCES}/nascent_all/cdna.txt",
	nascent=f"{REFERENCES}/nascent_all/nascent.txt",

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions on kb ref index naming and Snakemake output locations #2

1.Reference naming

2.Snakefile hard-codes nascent_all (yaml nascent_genome seems unused)

3. Output location: annotated h5ad / MTX written to $SCRATCH

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Questions on kb ref index naming and Snakemake output locations #2

Description

1.Reference naming

2.Snakefile hard-codes nascent_all (yaml nascent_genome seems unused)

3. Output location: annotated h5ad / MTX written to $SCRATCH

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions