Skip to content

Runing and Fine-Tuning ecole on other diseases #6

@saifeldeen-bio

Description

@saifeldeen-bio

We wanted to use ecole to call large SV, specifically (DELs and DUPs). We was trying to run ecole-model on 112 WES in other disorder, not cancer. I used your default parameter at the beginning and get in all samples, then I calculated the mean and Std for my samples and used them in the --normalize parameter
151.619584955357, 31.8524058935725
and I used a another bed file containing the coordinates for each exon in a specific gene panel we're using in our research. I also got

this is the command we used

python3 ./scripts/ECOLE_call.py --model ecole --input ./processed_samples --output ./ecole_calls_All_samples_exonlevel_batch_size_16 --cnv exonlevel --batch_size 16 --normalize ecole_stats.txt

PS: We, already knew that these samples have SV in specific locations but when we wanted to test ecole to see if it can detect these SVs, it couldn't able to do that

we then decided to fin-tun ecole-model using 29 WES samples, with using Calls from their matched sr-WGS, and two of them had matched lr-WGS. we used these calls as ground-truth calls for fin-tuning, when we used the gene-panel-exon-specific bed file that we've created we got
63 INS, 93 DEL 27 DUP in total as ground truth calls. We set the BS to 29, epochs 12, learning_rate 5e-5. the state file was the mean and stddev for those 29 sample used in the fine-tuning, not the all 112 samples. so the results after 12 epochs was

[ 2025-08-16 04:42:32 ] ECOLÉ: Epoch: 12 | time in 24 minutes, 54 seconds [ 2025-08-16 04:42:32 ] ECOLÉ: Loss: 0.1087(train) | Acc: 98.4%(train)| Nocall_prec: 99.5%(train)| Dup_prec: 20.4%(train)| Del_prec: 0.0%(train)| Nocall_recall: 98.9%(train)| Dup_recall: 34.4%(train)| Del_recall: 0.0%(train)

so the problem here is the fin tuned model had Del_recall and prec: 0.0%, and the prec and recall for Dup is too small

we then decided to use another bed file we've downloaded from USCS containing all exons coordinates in the hg38 genome. to get all posible SVs from the matched sr-WGS and lr-WGS and we've got 238778 INS, 198716 DEL 7815 DUP in total as ground truth calls.

but we faced the same problem as the results was not promising as well

[ 2025-08-18 23:07:03 ] ECOLÉ: Epoch: 12 | time in 53 minutes, 11 seconds
[ 2025-08-18 23:07:03 ] ECOLÉ: Loss: 0.1148(train) | Acc: 98.0%(train)| Nocall_prec: 99.4%(train)| Dup_prec: 27.6%(train)| Del_prec: 0.0%(train)| Nocall_recall: 98.6%(train)| Dup_recall: 49.7%(train)| Del_recall: 0.0%(train)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions