Jump to Chapter (53 Total):

1-2  3-4  5-6  7-8   9-10   (NEXT 10)  >>>>

Chapter 5. Osprey: Improving Spotted Oligo Microarrays

The Importance of Design Parameters

Contributed by Paul Gordon, Sun Center of Excellence for Visual Genomics, University of Calgary

Gordon PM, Sensen CW. Osprey: a comprehensive tool employing novel methods for the design of oligonucleotides for DNA sequencing and microarrays. Nucleic Acids Res. 2004 Sep 29;32(17):e133 [PubMed] [PDF].

While it is still fairly common to design oligo PCR primers manually using the so-called Wallace Rule: G/C = 4°C, A/T = 2°C, summed for melting temperature; more accurate formulae exist that closely model the thermodynamics of nucleotide binding. The need to calculate large numbers of primers for genomic sequencing and the growing use of microarrays have lead to the development of increasingly sophisticated algorithms to improve the automation of oligo design.

Non-target binding can cause sequencing reactions to be unusable, and give false mRNA expression level readings in microarrays. Bioinformaticians have approached this problem in a variety of different ways. Oligodb filters out low-complexity regions using dustn without checking to determine if the sequences are repeated. ProbeWiz explicitly disables filtering, while other system manuals do not document this aspect of the computation. Most programs use a simple BLAST search to filter secondary binding based on percent mismatch, but this method has disadvantages; small sequence stretches with evenly spaced mismatches may not be found due to the heuristic nature of BLAST. Even if these methods could find all matches, the Sarani documentation shows that the duplex melting temperature of two twenty (20) base targets against the same oligo sequence can differ by 20°C when both targets have only two mismatches. Also, high GC regions bind with much higher energy than low GC regions of similar length. The number of pairwise matches is therefore not necessarily a good measure of melting thermodynamics.

SantaLucia's "unified" free energy parameters model, was derived from the unification of previously described nearest neighbor (NN) methods, and is generally considered the best model yet of DNA binding thermodynamics for melting temperature and duplex stability. The NN model assumes that summing the interaction energy of adjacent nucleic acids on a strand is the best predictor of the whole duplex's stability. Popular programs such as Primer3 are based on older NN models that often work, but can deviate significantly (6°C) from the unified model, causing problems for experiments like microarrays, where achieving a uniform melting temperature is very desirable.

Osprey: Simplifying and Improving Large-Scale Design

Osprey is new software that includes automated techniques to provide higher quality oligos with less human intervention. It also introduces the novel use of Position Specific Scoring Matrices (PSSMs) to encode the free energy model, improving the specificity and sensitivity of oligo secondary binding searches.

With regards to Web interface oligonucleotide design, there are many choices. Many of the most popular ones are wrappers around the ever-popular Primer3, and can have many parameters. Santa Lucia provides a thermodynamics calculation service called HyTher (based on the unified model), which works for single sequences. Osprey attempts to simplify the process for the user: the unified model thermodynamics, and most parameters calculated automatically from the input sequence (such as the ideal oligo length for a given melting temperature or vice versa as in Figure 1 below).

Osprey Web interface for microarray oligo design

Figure 1. Osprey Web interface for microarray oligo design.

Optimal Oligonucleotide Criteria

Osprey incorporates a series of fitness tests, in the following order: melting temperature, dimer potential, hairpin potential, and secondary (non-specific) binding. Ordered from computationally simple to computationally expensive, the tests filter out unsuitable candidates as quickly as possible.

For prokaryotes, typically a random hexamer nucleotide mixture is used to prime reverse transcription to cDNAs from the sample mRNA, placing little restraint on the location of oligonucleotide binding. For eukaryotic microarrays, a 3' hybridization site bias in maintained, since a poly(dT) is used to prime reverse transcription starting at the gene's 3' mRNA poly(A) tail. Checks for secondary binding are restricted to only include transcribed sequences in the genome.

A program from the popular Mfold package, quikfold, can be used to confirm the absence of significant secondary structure. Osprey is configured to use this check by default, similar to other oligo design programs including OligoArray.

Removing Sequence Redundancy Automatically

Using the MegaBLAST program from the BLAST package, all repetitive elements larger than a user-defined threshold are quickly (less than one minute for 3800 Sulfolobus genes) identified in the query sequences. For whole-genome analysis, the query file and the database are the same. If the user is iteratively searching for oligos using the "rejects" from a previous run, the database remains the whole genome, while the query consists of the sequences that do not yet have a suitable candidate. In either case, Osprey filters the query down to unique sequence, plus one copy of each repetitive section. This setup allows the secondary binding checks to be performed without interference from multi-copy elements. No user intervention or preprocessing of the dataset is required, facilitating the use of Osprey with redundant data derived from GenBank and other sources.

PSSMs: Improving Non-target Binding Checks

Osprey introduces a novel method of calculating and accelerating secondary binding checks using Position Specific Scoring Matrices (PSSMs). The models are compatible with the method established by Gribskov et al. Appropriately setting the position specific scores allows the raw profile score to encode the significant caloric (thermodynamic) values of the DNA binding. The interface available on the Osprey Web site uses the Genome Canada Bioinformatics Platform funded DeCypher bioinformatics accelerator to provide timely PSSM searches.

This representation reflects the thermodynamics of oligo duplexes, and compensates for dangling ends, as well as interspersed mismatches and bulges. Such a search is advantageous over a BLAST-type search because, unlike pairwise alignments, the match, mismatch and gap scores are context sensitive (following the NN model). It also overcomes inherent limitations of the alignment heuristic when dealing with short oligo sequences, such as missing DNA matches with gaps (duplex bulges), and interspersed mismatches. Due to these limitations, oligos where no apparent secondary binding was identified with heuristic pairwise alignments may in fact show some using profiles (increased sensitivity). Also, candidates rejected due to a percentage similarity cutoff exceeded in BLAST may in fact not bind strongly to those sites when the NN thermodynamics are calculated (improved specificity).


A manuscript detailing Osprey's implementation has been submitted for publication. Upon acceptance, the source code will be freely available to academic users. Please feel free to use the Osprey Web Interface (http://osprey.ucalgary.ca) and provide feedback on bugs and areas for improvement!

Suggested readings on oligonucleotide thermodynamics and PSSM searches:

Chapter 6. BioMOBY

Contributed by Ian J. Forsythe, Canadian Bioinformatics Help Desk, University of Alberta

Wilkinson MD, Links M. BioMOBY: an open source biological web services proposal. Brief Bioinform. 2002 Dec;3(4):331-41 [PubMed].

Many biologists face a plethora of web service providers on a daily basis: NCBI BLAST, PubMed, SwissProt, PROSITE, PRINTS, ProDom, Pfam, TIGRFAM, SMART, to name a few. Most web service providers present one with colourful, graphical-rich entry forms, each with their own unique data input requirements. If one is lucky, one can locate some help pages that offer tips on how to use the entry forms. Often, it is unclear what types of services the provider offers or how to use their services. Potentially even more difficult can be figuring out how to use their services from within a computer program instead of through a web form or taking the output from one service provider and sending it to another service provider. Many of these steps can be automated, using the power of MOBY, removing the frustration of having to work with customized, often changing interfaces from hundreds of different service providers.

MOBY simplifies the web services creation process by employing a simple application programming interface (API) and providing a central registry, called MOBY-Central, that informs users about all of the services that are available (Figure 1). If one has an algorithm, a program, or information that they want to provide, they can make it available using MOBY. Instead of having to create an elaborate web form, the provider just registers their URL with MOBY and provides a MOBY-compliant interface. MOBY-Central is analogous to a phone book. If the user inquires about a certain data type (e.g. a Genbank accession number or a FASTA-formatted sequence), MOBY lists all of the services that require this type of input. MOBY appears to have been designed with biologists in mind. MOBY objects consist of lightweight XML, providing a hierarchy of input/output objects that are particularly well suited to bioinformatics. Objects are provided for storing sequences, BLAST results, and other common biological data types. MOBY's extensibility allows biologists to add their own custom data types.

BioMOBY is an international, Open Source web service integration project, sponsored in part by Genome Prairie and Genome Canada. It aims to create an architecture for the discovery and distribution of biological data across the internet. BioMOBY involves biological data hosts, biological data service providers, and computer programmers. BioMOBY seeks to integrate data from disparate biological data sources via a central registry, MOBY-Central. Mark Wilkinson, a National Bioinformatics Platform team member, leads the MOBY Services Branch of the project. Lincoln Stein and Damian Gessler lead the Semantic MOBY Branch of the project.

Here are some links to BioMOBY web pages, articles, and tutorials:

MOBY Overview

Figure 1. The dynamics of a MOBY transaction. Three computers are involved: MOBY-Central, MOBY-Server, and MOBY-Client. MOBY Servers register (once) their services with MOBY-Central. A MOBY client has a piece of data in-hand, and queries MOBY-Central for the services which are able to use that piece of data as input. MOBY-Central returns one or more Web Services Description Language (WSDL) service descriptions for the applicable services, and the client then choses one, sends its data to the service, and is returned another form of data; the transaction is done according to the specifications in the WSDL document. The data passed is in the form of MOBY Objects—lightweight (generally) XML documents that conform to MOBY object descriptions. Not shown in this diagram is the ability to query MOBY-Central based on a service-type ontology, or an output type ontology in addition to the (shown) input type query.

Source: http://biomoby.org