Jump to Chapter (53 Total):

1-2  3-4  5-6  7-8   9-10   (NEXT 10)   >>>>


Chapter 9. CyberCell Database

Contributed by David Wishart, Director, Canadian Bioinformatics Help Desk, University of Alberta

Sundararaj S, Guo A, Habibi-Nazhad B, Rouani M, Stothard P, Ellison M, Wishart DS. The CyberCell Database (CCDB): a comprehensive, self-updating, relational database to coordinate and facilitate in silico modeling of Escherichia coli. Nucleic Acids Res. 2004 Jan 1;32(1):D293-5 [PubMed] [PDF].

When someone says the word "bioinformatics" most of us think of sequences and sequence databases. However, most bioinformaticians want us to think that bioinformatics is more than just a field that studies protein and DNA sequences. In fact, as we move beyond current efforts to sequence organisms, many life scientists are also realizing that there is a growing need to gather and catalog much more detailed molecular biological information to complement our ever-expanding archives of DNA sequence files. Major efforts are going into cataloging structural information (PDB), protein interaction information (BIND), protein localization and function (GO, SwissProt), polymorphisms and mutations (DBSNP), associated genetic diseases (OMIM), metabolism (WIT), transcription and promoter sequences (TRANSFAC) and so on. Currently it is estimated that there are more than 1000 bioinformatics databases that have unique or "high quality" molecular biological information. Unfortunately these new databases, which are aimed at enriching the amount of molecular biological information available to researchers, are leading to a much more diffuse bioinformatics "landscape". In other words, if you want to find out everything known about a particular gene or pathway, you are likely going to have to search through and consolidate information from more than 30 different databases.

Fortunately, bioinformaticians are working on several solutions to this problem of database diffusion. One approach is to keep the databases separate but to confederate and standardize their query engines. This is the approach adopted by BioMOBY and SeqHound. Another approach is to consolidate this information into a central repository using teams of annotators and experts. This is the approach adopted by SwissProt and BIND. Still other groups are trying to be much more specific by focusing on creating organism-specific databases. This is the approach adopted by FlyBase and EcoCyc. The trend toward building organism-specific databases is growing rapidly as it seems to fill a niche for many life scientists who tend to work on only one type of organism. Nevertheless, one lingering shortcoming for all of these database solutions is that they still lack the depth or comprehensiveness that would allow the "one-stop shopping" that many of us are still looking for.

The CyberCell Database (http://redpoll.pharmacy.ualberta.ca/CCDB) is an example of a new kind of organism-specific database. The CCDB is an E. coli specific database that offers remarkably "deep" coverage on just about every facet of E. coli molecular biology. Rather than the usual 10-20 data fields found for a given gene or protein, the CCDB has more than 70 data fields for each gene, protein and compound found in E. coli along with extensive textual and visual descriptions of E. coli cellular components and substructures. Rather than focusing only on the genes or proteins in E. coli, the CCDB tries to bring together nearly all aspects of the genomic, proteomic and metabolomic character of E. coli (strain K12) under one roof.

The CyberCell database CCDB actually consists of 4 browsable databases as shown in Figure 1 below: 1) the main CyberCell database (CCDB - containing gene and protein information), 2) the 3D structure database (CC3D – containing information for structural proteomics), 3) the RNA database (CCRD – containing tRNA and rRNA information), and 4) the metabolite database (CCMD – containing metabolite information). Each of these databases is accessible through hyperlinked buttons located at the top of the CCDB homepage. All CCDB sub-databases are fully web enabled, permitting a wide variety of interactive browsing, search and display operations.

CyberCell Home Page

Figure 1. Screenshot of the CyberCell home page (http://redpoll.pharmacy.ualberta.ca/CCDB). The four browsable database links appear just below the CyberCell banner as CCDB (CyberCell database, containing gene and protein information), CC3D (3D structure database), CCRD (the RNA database), and CCMD (the metabolite database).

In addition to the usual data viewing and sorting features found in most databases, CCDB also offers a local BLAST search (against E. coli plus 4 other model organisms), a boolean text search, a chemical structure search utility and a relational data extraction tool. The data extraction utility allows users to select one or more data fields and to search for ranges, occurrences or partial occurrences of words, strings or numbers. The data extractor uses clickable web forms to easily construct complex SQL-like queries. For instance, using a few mouse clicks it is relatively simple to find "all occurrences of periplasmic proteins that are homologous to human proteins that have a molecular weight > 32 kD and a cysteine content less than 5%". The output from these queries is provided in multiple formats including an HTML table format, a circular chromosome applet view and a tab-delimited Excel format for subsequent downloading, analysis or graphical display.

So if you’re someone who routinely uses or studies E. coli or if you simply just want to learn more about microbes in general, check out the CCDB at http://redpoll.pharmacy.ualberta.ca/CCDB. For those of you who are interested in database development, you may also find that the CCDB offers a glimpse of what the next generation of bioinformatics databases will soon offer.



Chapter 10. GelScape

Contributed by Paul Stothard, Bioinformatician, Canadian Bioinformatics Help Desk, University of Alberta

Young N, Chang Z, Wishart DS. GelScape: a web-based server for interactively annotating, manipulating, comparing and archiving 1D and 2D gel images. Bioinformatics. 2004 Apr 12;20(6):976-8. Epub 2004 Feb 5 [PubMed] [PDF].

Do you run a lot of protein gels? Do you find yourself searching for gel images from your previous experiments, either for comparison purposes or when it is time to publish? Keeping track of gel images can be a daunting task. There is help however—a new web-based tool called GelScape. GelScape stores your images in a private database, and it allows you to add detailed annotations, so that the information needed for interpretation is always available (Figure 1). But the most exciting feature of GelScape is that it permits accurate gel comparisons, through image morphing, annotation transfers, and image overlays. GelScape can be used online at http://www.gelscape.ualberta.ca/, or it can be installed and run locally. Because it is written in Java, GelScape can run on any current computer with a web browser. The price is right too--it's completely free! Commercial products with similar features typically cost between $3000 and $5000. Additional information, including a GelScape tutorial, is available at the above website.

GelScape Screen Shot

Figure 1. GelScape's Annotate&View Mode. This mode allows the end user to annotate and view spots on a gel. In the above 2D gel image, the ATP5B spot is selected, as indicated by the green cross in the left pane, and annotation about this spot (protein name, spot label, Swiss-Prot/GenBank ID, spot volume, organism, cellular location, MW, comments) appears in the right pane.