-
Notifications
You must be signed in to change notification settings - Fork 0
1 Searching Biological Databases
There are dozens of biological databases on the Web, and many alternate web interfaces that provide access to the same sets of data. Which ones you use depends on your needs, but it's necessary for you to be aware of what the central data repositories are for various data types, and how often the more peripheral databases you might be using synchronize themselves with these central data sources.
Although data repositories for new types of biological data are multiplying, we focus here on one established database: NCBI's GenBank, for DNA sequence data. Every database has its own deposition procedures, and for the newer datatypes these are not yet well established or are still changing rapidly. NCBI has a mature, automated, web-based deposition system that are not likely to change drastically in the near future.
NCBI, in cooperation with EMBL and other international organizations, provides the most complete collection of DNA sequence data in the world, as well as PubMed, a taxonomy database, and an alternate access point for protein sequence and structure data. This database, known as GenBank, may be accessed at http://www.ncbi.nlm.nih.gov:80/entrez!query.fcgi?db=Protein. NCBI maintains sequence data from every organism, every source, every type of DNA- from mRNA to cDNA clones to expressed sequence tags (ESTs) to high-throughput genome sequencing data and information about sequence polymorphisms. Users of the NCBI database need to be aware of the differences between these datatypes so that they can search the data set that's most appropriate for the work they're doing. The main sequence types that you'll encounter in a full GenBank search include mRNAs, cDNAs genomic DNA (even whole genomes). NCBI is also a repository for genome scale sequence data from genome assembly projects, transcriptome projects, and just about every other effort utilizing genetic data.