-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME
More file actions
91 lines (64 loc) · 3.33 KB
/
README
File metadata and controls
91 lines (64 loc) · 3.33 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
ARCS - a genome assembler for 2nd-generation genome sequencing
==============
ARCS is a de novo assembler designed for the 2nd-generation genome sequencing.
ARCS first estimates the copy numbers of contigs, and then assembles and optimally
positions the identified nonrepeat contigs through linear programming to unravel
relative distance information among them. Then ARCS decomposes inexact repeats into
a collection of exact repeats interspersed with unique sequences that distinguish a
single repeat from others. These unique sequences act as bridges connecting neighboring
contigs and local scaffolds into longer scaffolds. ARCS solves the chimeric paired-end
reads by using a statistical model to iteratively filter out chimera read pairs. Finally,
ARCS uses a gap-filling method to merge contigs within each scaffold.
Experimental results demonstrate that on the real sequencing data of the E. coli genome,
ARCS generates scaffolds with an N50 of 132kbp, outperforming the state-of-art assembly tool
SOAP-denovo2 with N50 scaffold of 95kbp. On two additional sets up genome sequencing data,
from the D9 and D12 genomes, ARCS again assembles scaffolds longer than those assembled by
SOAP-denovo. Through case studies we demonstrate that the performance improvement can be
explained by the accurate identification of chimeric read pairs and the assembly of inexact
repeats.
The core algorithms are described in this paper:
http://bioinfo.ict.ac.cn/ARCS/
-------------
Compiling ARCS
ARCS dependencies:
-glpk (https://www.gnu.org/software/glpk/)
-boost (http://www.boost.org)
-log4cxx (http://logging.apache.org/log4cxx/)
If you cloned the repository from github, run autogen.sh from the src directory
to generate the configure file:
./autogen.sh
if all the dependencies have been installed in standard locations (like /usr) you
can run configure without any parameters then run make:
./configure
make
--------------
Installing ARCS
The execution of 'make install' will install arcs into /usr/local/bin/ by default. The install location can be specified by using the --prefix option to configure as below:
./configure --prefix=/home/arcs/ && make && make install
This command will copy arcs to /home/arcs/bin/arcs
-----------
Running ARCS
ARCS consists of serval subprograms, and a pipeline to run these subprograms. The descriptions of the subprograms can be listed by running "arcs -h".
Each program and subprogram will print a brief description and its usage instructions if the -h flag is used.
The major subprograms are:
* arcs preprocess
Filter and quality-trim reads
* arcs assemble
Assemble takes the output of the preprocess step and constructs contigs.
* copy_num_estimate
Estimate copy number of contigs.
* scaffold
Generate ordered sets of contigs using distance estimates.
* solveLP
Position the identified nonrepeat contigs through linear programming.
* remove_repeats
Revise the results of solveLP.
* gapfill
Fill intra-scaffold gaps.
An example to run arcs is provided under the directory of 'arcs/benchmarks'. It is recommended to follow this example to know how to run arcs.
-------
License
ARCS is licensed under GPLv3. See the COPYING file in the src directory.
-------
Credits
The software was developed by Renyu Zhang, Qing Xu, Chungong Yu, Bing Wang, Guozheng Wei, Yanbo Li, Ting Cheng, and Dongbo Bu.