Contributed
by David Wishart,
Director, Canadian Bioinformatics
Help Desk, University of Alberta
Sundararaj
S, Guo A, Habibi-Nazhad B, Rouani M, Stothard P, Ellison M, Wishart DS.
The CyberCell Database (CCDB): a comprehensive, self-updating,
relational database to coordinate and facilitate in silico modeling of
Escherichia coli. Nucleic Acids Res. 2004 Jan 1;32(1):D293-5 [PubMed]
[PDF].
When
someone says the
word "bioinformatics" most of us think of sequences and sequence
databases. However, most bioinformaticians want us to think that
bioinformatics is more than just a field that studies protein and DNA
sequences. In fact, as we move beyond current efforts to sequence
organisms, many life scientists are also realizing that there is a
growing need to gather and catalog much more detailed molecular
biological information to complement our ever-expanding archives of
DNA sequence files. Major efforts are going into cataloging
structural information (PDB), protein interaction information (BIND),
protein localization and function (GO, SwissProt), polymorphisms and
mutations (DBSNP), associated genetic diseases (OMIM), metabolism
(WIT), transcription and promoter sequences (TRANSFAC) and so on.
Currently it is estimated that there are more than 1000
bioinformatics databases that have unique or "high quality"
molecular biological information. Unfortunately these new databases,
which are aimed at enriching the amount of molecular biological
information available to researchers, are leading to a much more
diffuse bioinformatics "landscape". In other words, if you want
to find out everything known about a particular gene or pathway, you
are likely going to have to search through and consolidate
information from more than 30 different databases.
Fortunately,
bioinformaticians are working on several solutions to this problem of
database diffusion. One approach is to keep the databases separate
but to confederate and standardize their query engines. This is the
approach adopted by BioMOBY and SeqHound. Another approach is to
consolidate this information into a central repository using teams of
annotators and experts. This is the approach adopted by SwissProt
and BIND. Still other groups are trying to be much more specific by
focusing on creating organism-specific databases. This is the
approach adopted by FlyBase and EcoCyc. The trend toward building
organism-specific databases is growing rapidly as it seems to fill a
niche for many life scientists who tend to work on only one type of
organism. Nevertheless, one lingering shortcoming for all of these
database solutions is that they still lack the depth or
comprehensiveness that would allow the "one-stop shopping" that
many of us are still looking for.
The
CyberCell Database (http://redpoll.pharmacy.ualberta.ca/CCDB)
is an example of a new kind
of organism-specific database. The CCDB is an E. coli
specific database that offers remarkably "deep" coverage on just
about every facet of E. coli molecular biology. Rather than
the usual 10-20 data fields found for a given gene or protein, the
CCDB has more than 70 data fields for each gene, protein and compound
found in E. coli along with extensive textual and visual
descriptions of E. coli cellular components and substructures.
Rather than focusing only on the genes or proteins in E. coli,
the CCDB tries to bring together nearly all aspects of the genomic,
proteomic and metabolomic character of E. coli (strain K12)
under one roof.
The
CyberCell database CCDB actually consists of 4 browsable databases as
shown in Figure 1 below:
1) the main CyberCell database (CCDB - containing gene and protein
information), 2) the 3D structure database (CC3D – containing
information for structural proteomics), 3) the RNA database (CCRD –
containing tRNA and rRNA information), and 4) the metabolite database
(CCMD – containing metabolite information). Each of these
databases is accessible through hyperlinked buttons located at the
top of the CCDB homepage. All CCDB sub-databases are fully web
enabled, permitting a wide variety of interactive browsing, search
and display operations.
Figure 1. Screenshot of the
CyberCell home page (http://redpoll.pharmacy.ualberta.ca/CCDB).
The four browsable database links appear just below the CyberCell
banner as CCDB (CyberCell database, containing
gene and protein
information), CC3D (3D
structure database), CCRD (the RNA
database), and CCMD
(the
metabolite database).
In
addition to the usual data viewing and sorting features found in most
databases, CCDB also offers a local BLAST search (against E. coli
plus 4 other model organisms), a boolean text search, a chemical
structure search utility and a relational data extraction tool. The
data extraction utility allows users to select one or more data
fields and to search for ranges, occurrences or partial occurrences
of words, strings or numbers. The data extractor uses clickable web
forms to easily construct complex SQL-like queries. For instance,
using a few mouse clicks it is relatively simple to find "all
occurrences of periplasmic proteins that are homologous to human
proteins that have a molecular weight > 32 kD and a cysteine
content less than 5%". The output from these queries is provided
in multiple formats including an HTML table format, a circular
chromosome applet view and a tab-delimited Excel format for
subsequent downloading, analysis or graphical display.
So
if you’re someone who routinely uses or studies E. coli or if you
simply just want to learn more about microbes in general, check out
the CCDB at http://redpoll.pharmacy.ualberta.ca/CCDB.
For those of
you who are interested in database development, you may also find
that the CCDB offers a glimpse of what the next generation of
bioinformatics databases will soon offer.
Chapter 10.
GelScape
Contributed
by Paul Stothard,
Bioinformatician, Canadian Bioinformatics
Help Desk, University of Alberta
Young N, Chang Z, Wishart DS. GelScape: a
web-based server for interactively annotating, manipulating, comparing
and archiving 1D and 2D gel images. Bioinformatics. 2004 Apr
12;20(6):976-8. Epub 2004 Feb 5 [PubMed]
[PDF].
Do
you run a lot of protein gels? Do you find
yourself searching for gel images from your previous experiments,
either for comparison purposes or when it is time to publish? Keeping
track of gel images can be a daunting task. There is help however—a
new web-based tool called GelScape. GelScape stores your images in a
private database, and it allows you to add detailed annotations, so
that the information needed for interpretation is always available (Figure 1). But
the most exciting feature of GelScape is that it permits accurate gel
comparisons, through image morphing, annotation transfers, and image
overlays. GelScape can be used online at http://www.gelscape.ualberta.ca/,
or it can be installed and run locally. Because it is written in Java,
GelScape can run on any current computer with a web browser. The price
is right too--it's completely free! Commercial products with similar
features typically cost between $3000 and $5000. Additional
information, including a GelScape tutorial, is available at the above
website.
Figure 1. GelScape's
Annotate&View Mode. This mode allows the end user to annotate and
view spots on a gel. In the above 2D gel image, the ATP5B spot is
selected, as indicated by the green cross in the left pane, and
annotation about this spot (protein name, spot label,
Swiss-Prot/GenBank ID, spot volume, organism, cellular location, MW,
comments) appears in the right pane.