Scientific story (html)
In brief: There must exist a gene in the cyanobacterium Anabaena that is regulated by nitrogen deprivation (through the DNA-binding protein NtcA) and whose product regulates differentiation (leading to N2-fixing heterocysts). You have in hand a reasonable collection of sequences known to bind NtcA, but you're not sure exactly what features the protein finds important. Your task is to extract as much information from the known binding sequences as possible and use it to scan the genome of Anabaena looking for candidate binding sites.Bioinformatic tools
Position-specific scoring matrices (PSSMs)Molecular biology concepts: Nothing new
Identify positions in sequence alignments that carry the most information and use frequencies at those positions to characterize aligned motifs
Perl focus: Hashes; SortingPrograms
FindMotif.pl - Constructs PSSM from aligned sequences, scans genome, produces list of most plausible motifs
Data: Small set of aligned sequences (71NpNtSm.txt)
Meme - Web-based program designed to find statistically overrepresented motifs in a collection of sequences.
Click on MEME - Submission form to use program. Explore other links to learn more about the program.
Introduction to PSSM (PPT)
Information Theory (PPT)
In-class worksheet (DOC)
Position-specific scoring matrices (PDF) (Questionnaire)
PSSM program (PDF) (Questionnaire)
Problem Set: Just one for this scenario (HTML)