| BNFO 601 |
Scenarios |
Fall 2010
|
Scientific story (html)
In brief: You hit on the idea of undestanding the basis for pathogenesis by the deadly E. coli O157:H7 by comparing its total complement of protein with that of the nonpathogenic strain E. coli K12. Unfortunately, the comparison nets you a file bigger than anything you could go through in a year. How can you extract the useful information from the file and put it in a form a human could understand?Bioinformatic tools
BlastPerl focus: Pattern matching and extraction of strings through regular expressions
Standard program to find similarities between sequences or sets of sequences.
Parsing program
Scans output, looking for items of interest as you define them. Outputs them to a separate file.
Notes and papers
Molecular biology (PDF) (Questions)ProgramsBlast/Parsing program (html) (Questions)
- Perna, N. T., G. Plunkett, 3rd, et al. (2001). Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409(6819): 529-33.
- Hayashi, T., K. Makino, et al. (2001). Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12." DNA Res 8: 11-22.
Regular expressions (html) (Questions)
Blast (obtainable from NCBI site - see instructions on how to download Blast, set up a database, and run the program)Problem Set - Molecular biology (PDF)
Most people run this program off of the web. The point of interest for now is learning how to download the program so that you can tailor it to your own purposes.Protein databases (obtainable from TIGR-CMR site - see instructions on how to download and which set to download).
Files containing all proteins deduced from completed DNA sequences of E. coli strains, used by Blast.Parsing program:
- BlastParser.pl- takes output from Blast, extracts information
- BlastParserNative.pl - same as above, but more as a Perl programmer would write it
- 71vsnps.txt - data program was designed to handle
- EdVsK12s.txt - Small portion of expected output from BlastAll
Additional programs:
- matchtest.pl - Allows quick tests of regular expressions
- HTML_to_text.pl - Sample program using regular expressions