RiSER
Downloading and using RiSER is free, if you use RiSER or its code in your work please acknowledge RiSER by referring to its GitHub homepage https://github.com/oicr-ibc/riser
This is important for us since obtaining grants is one significant way to fund planning and implementation for our projects. Also if you find RiSER useful in your research feel free to let us know.
RiSER is brought to you by:
- Vincent Ferretti
- Ivan Borozan
- Stuart Watt
RiSER was originally developed by:
- Ivan Borozan
Tested on UBUNTU-12.04
R (2.14.1):
$ apt-get install r-basePython (2.7.3) - the program assumes that Python is in /usr/bin/python
perl (5.14.2)
samtools (0.1.18)
Following Perl module needs to be installed:
bioperl:
$ apt-get install bioperlFollowing Python modules need to be installed in the order shown below:
If you do not have pip installed, install it as shown below:
$ sudo apt-get install python-pip python-dev numpy(1.6.2):
$ sudo pip install numpyBioPython(1.6):
$ sudo pip install biopython rpy2(2.3.1):
$ sudo pip install rpy2setuptools(1.0):
$ sudo easy_install -U distributeCython(0.17.4):
$ sudo pip install cythonpysam(0.6):
$ sudo pip install pysamThis version of RiSER has has been tested under Linux (Ubuntu 12.04).
To install:
Option 1:
$ sudo apt-get install git
$ git clone https://github.com/oicr-ibc/riser.git
$ cd riserOption 2:
$ wget https://github.com/oicr-ibc/riser/archive/master.zip
$ unzip master.zip
$ mv riser-master riser
$ cd riserDone! No installation is required, all Python scripts are in $RISER_DIR/bin and should be compatible with your system
Assuming you are working in the RiSER directory.
-
Simulate data for a particular set of genomes.
The default
config.inifile is in theconfigdirectory - to run RiSER you need to modify the initial.inifile. However make sure to first run RiSER with the configuration fileconfig/config_simulation.exampleprovided as an example on how to run the simulation:$ cp config/config.ini config/config.ini.save $ cp config/config_simulation.example config/config.ini
edit the
config.inifile.run the simulation script:
$ python bin/run_simulator.py
(i) Note, simulated results will be output to the directory specified in the
config.inifile (seeconfig_simulation.example). Also make sure to check if you are running a 32-bit or 64-bit machine (seeconfig_simulation.exampleunder[aligners])(ii) Note, in the GenBank flat file, the GenBank 'FEATURES' entries 'gene' and 'CDS' if both present, need to have the /db_xref="GeneID:XXXXX" associated with each.
-
Compare the aligner's output to the truth file (from simulated data):
Make sure to first run RiSER with the configuration file
config/config_analysis.exampleprovided as an example on how to run the analysis:$ cp config/config.ini config/config.ini.save $ cp config/config_analysis.example config/config.ini
edit the
config.inirun the analysis script:
$ python bin/run_analysis.py
The summary of results will be output to the aligner's directory specified by the user in the
config.inifile (seeconfig_analysis.example)In the example provided, results for the NC_001357.1 genome and the BFAST aligner will be output to:
examples/aligners/NC_001357.1/BFAST/simulated_transcripts_0.fa/Rdata_multi/examples/aligners/NC_001357.1/BFAST/simulated_transcripts_10.fa/Rdata_multi/Note that the results are output as R data files, to view them launch R and load results as shown below:
$ cd examples/aligners/NC_001357.1/BFAST/simulated_transcripts_0.fa/Rdata_multi/in R type:
> # To load the data > load("aligner_stats.gzip") > > # To run the analysis statistics > aligner_stats
File format for user specified transcript files:
In the case a transcript file is specified by the user (see also examples/genomes/NC_001357.1_transcripts.txt) each row in the file should designate a single transcript and columns (tab delimited) should be set as in the order shown below:
transcript_id (e.g. GI number or any other unique id) \t transcript_name \t genome_id (e.g. GenBank Accession) \t strand \t transcript_START \t transcript_END \t transcript_START \t transcript_END \t numb_exons \t exons_START(the START positions of each exon needs to be separated by commas) \t exons_END(the END positions of each exon needs to be separated by commas (and in the same order as the START positions))
More datasets are available on the wiki at: https://github.com/oicr-ibc/riser/wiki/Datasets.
Licensed under the GNU General Public License, Version 3.0. See LICENSE for more details.
Copyright 2013 The Ontario Institute for Cancer Research.
This project is supported by the Ontario Institute for Cancer Research (OICR) through funding provided by the government of Ontario, Canada.