You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From a genomic fasta file and its associated GFF, the program first scans the genome to retrieve all sequences
12
13
delimited by stop codons. Only sequences of at least 60 nucleotides long are kept by default.
13
14
14
-
Those so-called ORF sequences are then annotated depending upon GFF element type(s) used as a reference. The CDS element type is always used as a reference but others can be added. By default an ORF sequence has 5 possible annotations:
15
+
Those so-called ORF sequences are then annotated depending upon GFF element type(s) used as a reference.
16
+
The CDS element type is always used as a reference but others can be added.
17
+
18
+
By default an ORF sequence has 5 possible annotations:
15
19
16
20
| ORF annotation | Condition |
17
21
| --- | --- |
@@ -21,9 +25,11 @@ Those so-called ORF sequences are then annotated depending upon GFF element type
21
25
| nc_ovp-CDS | if the ORF overlap with a CDS in a different phase |
22
26
| nc_intergenic | if the ORF do not overlap with anything |
23
27
24
-
**Note** that if an ORF sequence is tagged as 'c_CDS', this sequence is further processed to be cut at its 5' and 3' extremities that do not overlap with the CDS. If their length is above or equal to 60 nucleotides, then those subsequences can be assigned as nc_5-CDS and/or nc_3-CDS.
28
+
**Note:**
29
+
If an ORF sequence is tagged as 'c_CDS', this sequence is further processed to be cut at its 5' and 3' extremities that do not overlap with the CDS. If their length is above or equal to 60 nucleotides, then those subsequences can be assigned as nc_5-CDS and/or nc_3-CDS.
30
+
<br></br>
31
+
<br></br>
25
32
26
-
27
33
The user can also specify what GFF element type(s) can be used as reference(s) to annotate ORF sequences in addition to the CDS type. For instance, if the user adds the tRNA element type, ORF sequences could now be assigned as nc_ovp-tRNA if they overlap with a tRNA. Thus 6 assignments would now be possible for an ORF sequence:
28
34
29
35
| ORF annotation | Condition |
@@ -36,93 +42,122 @@ The user can also specify what GFF element type(s) can be used as reference(s) t
36
42
| nc_intergenic | if the ORF do not overlap with anything |
37
43
38
44
**Note on default parameters**:
39
-
* CDS in the only element type used as a reference to annotate ORF sequences.
45
+
* CDS is the only element type used as a reference to annotate ORF sequences.
40
46
* the minimum nucleotide number required to consider an ORF sequence is set at 60 nucleotides
41
47
* an ORF sequence is considered as overlapping with an element (e.g. CDS) if at least 70 % of its sequence overlap with the element or if this element is totally included within the ORF sequence
42
48
43
49
44
-
----------------------------------------
45
-
Installation procedure from distribution
46
-
----------------------------------------
50
+
<h2><aname="installation">Installation</a></h2>
47
51
52
+
### 1. Download and uncompress the latest release archive
48
53
49
-
I. First steps
50
-
------------
54
+
#### Download the latest release
55
+
Latest release:
56
+
[](https://github.com/nchenche/orfmap/releases/latest/)
51
57
52
-
1. Uncompress and untar the package:
58
+
#### Uncompress the archive
59
+
If you downloaded:
60
+
* the *.zip* file: ```unzip ORFMap-x.x.x.zip```
61
+
* the *.tar.gz* file: ```gunzip ORFMap-x.x.x.tar.gz | tar xvf```
62
+
63
+
64
+
### 2. Create an isolated environment
65
+
Although not strictly necessary, this step is highly recommended (it will allow you to work on different projects without having
66
+
any conflicting library versions).
67
+
68
+
#### Install virtualenv
69
+
```python
70
+
python3 -m pip install virtualenv
71
+
```
53
72
73
+
#### Create a virtual python3 environment
54
74
```bash
55
-
tar -xzvf orfmap-0.0.tgz
75
+
virtualenv -p python3 my_env
56
76
```
57
77
58
-
2. Go to the ORFMap directory
59
-
78
+
#### Activate the created environment
60
79
```bash
61
-
cd ORFMap-0.0
80
+
source my_env/bin/activate
62
81
```
63
82
83
+
Once activated, any python library you'll install using pip will be installed solely in this isolated environment.
84
+
Every time you'll need to work with libraries installed in this environment (i.e. work on your project), you'll have
85
+
to activate it.
64
86
65
-
II. Install the package in a virtual environment (the recommended way to avoid dependencies conflict)
**Note**: once installed, you should be able to run orfmap (see below). Once you don't need to use it, you can deactivate or exit your virtual environment by executing in the terminal:
114
+
To see all options available:
99
115
100
-
```bash
101
-
deactivate
116
+
```
117
+
run_orfmap -h
102
118
```
103
119
104
-
From this installation, everytime you'll want to use orfmap you'll need to activate your dedicated virtual environment.
| -type TYPE [TYPE ...]| Type feature(s) a flag is desired for ('CDS' in included by default). |
172
-
| -o_include O_INCLUDE [O_INCLUDE ...]| Type feature(s) and/or Status attribute(s) desired to be written in the output (all by default). |
173
-
| -o_exclude O_EXCLUDE [O_EXCLUDE ...]| Type feature(s) and/or Status attribute(s) desired to be excluded (None by default). |
174
-
| -orf_len [ORF_LEN]| Minimum number of nucleotides required to define a sequence between two consecutive stop codons as an ORF sequence (60 nucleotides by default). |
175
-
| -co_ovp [CO_OVP]| Cutoff defining the minimum CDS overlapping ORF fraction required to label on ORF as a CDS. By default, an ORF sequence will be tagged as a CDS if at least 70 per cent of its sequence overlap with the CDS sequence. |
176
-
| -out [OUT]| Output directory |
202
+
In the case where an ORF sequence overlaps
177
203
178
-
179
-
Except -fna and -gff arguments that are mandatory, all others are optional.
180
-
181
-
| Arguments | Default value |
182
-
| --- | --- |
183
-
| -type | CDS |
184
-
| -o_include | 'all' |
185
-
| -o_exclude | None |
186
-
| -orf_len | 60 |
187
-
| -co_ovp | 0.7 |
188
-
| -out | './' |
204
+
##### Use tRNA and snRNA element as a reference to annotate ORF sequences:
**Note**: -o_include and -o_exclude take either feature types or a status attribute as arguments. Feature types have to be amongst the possible annotations for ORF sequences (e.g. c_CDS, nc_5-CDS, nc_intergenic...) while status attribute is either 'coding' or 'non-coding' ('coding' refers to c_CDS and 'non-coding' refers to the other ones).
231
+
<em>Note</em>:
232
+
<p>
233
+
-o_include and -o_exclude take either feature types or a status attribute as arguments.
234
+
Feature types have to be amongst the possible annotations for ORF sequences (e.g. c_CDS, nc_5-CDS, nc_intergenic...)
235
+
while status attribute is either 'coding' or 'non-coding' ('coding' refers to c_CDS and 'non-coding' refers to the other ones).
236
+
</p>
228
237
229
238
230
-
231
-
This command will define ORF sequences if they are at least 50 nucleotides
239
+
##### Assign ORF seqences if stop-to-stop length is at least 50 nucleotides:
This command will consider an ORF sequence as overlapping with an element (e.g. CDS) if at least 60 % of its sequence overlap with the element or if this element is totally included within the ORF sequence
244
+
##### Consider an ORF sequence as overlapping with any element if at least 60 % of its sequence overlap with the element:
0 commit comments