Skip to content

Tony Notes: Pre grouping Clusters

Tony E Lewis edited this page May 30, 2017 · 4 revisions

Tony Notes: Pre-grouping Clusters

cd-hit

cp /cath/people2/ucbcdal/dfx_funfam2013_data/shared/used/domseq_data_gene3d_12_31072013/persf/3.20.20.120.faa /tmp
cp /cath/people2/ucbcdal/dfx_funfam2013_data/shared/used/domseq_data_gene3d_12_31072013/persf/3.40.50.300.faa /tmp
cp /cath/people2/ucbcdal/dfx_funfam2013_data/shared/used/domseq_data_gene3d_12_31072013/persf/3.40.50.620.faa /tmp

cd /tmp

wget "https://github.com/weizhongli/cdhit/releases/download/V4.6.7/cd-hit-v4.6.7-2017-0501-Linux-binary.tar.gz"
tar -zxvf cd-hit-v4.6.7-2017-0501-Linux-binary.tar.gz

ls -1 *.faa | sed 's/.faa//g' | xargs -I VAR cd-hit-v4.6.7-2017-0501/cd-hit -i VAR.faa -o VAR.nr64.out -c 0.64 -n 5
ls -1 *.faa | sed 's/.faa//g' | xargs -I VAR cd-hit-v4.6.7-2017-0501/cd-hit -i VAR.faa -o VAR.nr90.out -c 0.90 -n 5

grep -c Cluster *clstr

Appears can't go (much) below 60% with cd-hit.

Gives:

3.20.20.120.nr64.out.clstr:2185
3.20.20.120.nr90.out.clstr:7150
3.40.50.300.nr64.out.clstr:256484
3.40.50.300.nr90.out.clstr:619477
3.40.50.620.nr64.out.clstr:31661
3.40.50.620.nr90.out.clstr:82238

Clone this wiki locally