Skip to content

cath.superpose ssaps files optimization ? #76

@tubiana

Description

@tubiana

Dear all,

I'm facing challenging alignments (several times 1000+ structures).
Since cath.superpose check if ssaps files exit, I found a way to speed up alignments by re-executing the cath.superpose command with random file order in the arguments (with a code example bellow, if it can be useful to someone);

But here's my question, I actually realised that all ssaps files pairs are computed

(base) thibault@XXX [XXX]/ssaps $ ls -l | grep A1A4S6 | grep B1AVH7
-rw-r--r-- 1 thibault ansatt     3080 Aug 21 10:20 A1A4S6.pdbB1AVH7.pdb.list
-rw-r--r-- 1 thibault ansatt       62 Aug 21 10:20 A1A4S6.pdbB1AVH7.pdb.scores
-rw-r--r-- 1 thibault ansatt     3080 Aug 21 15:37 B1AVH7.pdbA1A4S6.pdb.list
-rw-r--r-- 1 thibault ansatt       62 Aug 21 15:37 B1AVH7.pdbA1A4S6.pdb.scores
(base) thibault@XXX [XXX]/ssaps $ cat A1A4S6.pdbB1AVH7.pdb.scores
A1A4S6.pdb  B1AVH7.pdb  108   99  85.49   97   89   15   3.34
(base) thibault@XXX [XXX]/ssaps $ cat B1AVH7.pdbA1A4S6.pdb.scores
B1AVH7.pdb  A1A4S6.pdb   99  108  85.49   97   89   15   3.34

In some cases, I can have more than 10 million files in the same folder...
I was thinking if there is a particular reason to generate all pairs? Maybe cath.superpose could gain in efficiency and storage if only one file for each pair is generated?

Wishing you a nice day 🙂
Best regards,
Thibault.


Code example for running cath.superpose with random files order

export CATH_TOOLS_PDB_PATH=$WORKDIR
pdbinfile=""
for pdb in `ls $WORKDIR/*.pdb |sort -R`
do
  pdbinfile+="--pdb-infile $pdb "
done
#echo $pdbinfile
cath-superpose --do-the-ssaps ssaps --sup-to-pdb-files-dir output $pdbinfile

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions