BNFO 601 
Integrated Bioinformatics
Scenarios
Fall 2007 
Alignment of mystery sequence with known anthrax toxins

Scientific story (html)

In brief: You're analyzing a DNA sequence you're convinced comes from a gene encoding the lethal factor of the toxinfrom Bacillus anthracis... but Blast refuses to confirm your assessment! Who's wrong and why?
Bioinformatic tools
Local pairwise sequence alignment
     Smith-Waterman algorithm for exact alignments
     Modified Smith-Waterman algorithm for fast, approximate alignments
Scoring schemes for sequence alignment
     Standard program to find similarities between sequences or sets of sequences.
Dissection of BlastN
     Standard program to find similarities between sequences or sets of sequences.
Molecular biology concepts: Nothing new

Perl focus: Two-dimensional arrays

Programs & Files

Blast - NCBI implementation (various flavors you can run online at the NCBI site)
How to run BlastN (nucleotide sequence compared to nucleotide database)
How to translate DNA sequences to protein
How to run BlastP (protein sequence compared to protein database)
How to run BlastX (DNA sequence translated in all reading frames compared to protein database)
How to run Pairwise Blast (individual sequence compared to individual sequence)
BlastN - Homegrown version
        We'll be using this simplified version of BlastN to investigate how BlastN works.

lef.txt        - the sequence for B. anthracis lethal toxin
DG47.txt   - the sequence of mystery PCR product DG47

Dotmatrix1.pl - Historical alignment program
Dotmatrix2.pl - Dotmatrix modified to allow inexact matches
SmithWaterman1.pl - Smith-Waterman algorithm with gaps disallowed
SmithWaterman2.pl - Smith-Waterman algorithm allowing gaps
Consensus - Program used by notes to illustrate 2-dimensional arrays
Notes
Introduction to Scenario: (Presentation)
DNA Sequence Alignment: (Notes) (Questionnaire) (Presentation) (Scoring Exercises)
BlastN and 2-dimensional arrays: (Notes) (Questionnaire)
Scoring and protein alignment: (Notes)
(Presentation)
Problem Set: Just one for this scenario (PDF)