Scientific story (html)
In brief: You are trying to produce an enzyme in E. coli for an industrial process, but at high levels of expression, the protein precipitates in an inactive form. If you knew the three-dimensional structure of the protein, you might be able to predict which amino acids to change to prevent precipitation. You don't, but the structure of a moderately similar protein is available. How can you use the sparse similarity between the two proteins to suggest the unknown three-dimensional structure of the protein you're trying to overproduce?Bioinformatic tools
Parsing of PDB filesMolecular biology concepts:
PDB (protein data base) files are the most common way of representing protein structural information. They are used by different publically available programs to facilitate visualization of macromolecular structures.
Points of similarity between a protein with known structure and one whose structure is not known constrains the positioning of the dissimilar region and permits the approximation of the unknown structure.
Transcriptional and translational gene fusions
Perl focus: Modules
Presentations and notes
Notes: Overexpression of proteinPrograms
Notes: Resources for protein threading
Notes: Modifying a program that threads a protein sequence through a structure
Notes: Hints for modifying the program
ThreadProtein.pl - Superimposes one protein sequence on the structure of anotherProblem Set 7: (pdf)
FastA_module.pm - (Used by ThreadProtein) Reads FastA files
AA_module.pm - (Used by ThreadProtein) Interconverts name formats of amino acids
Data: UDPGD-mutants.txt, identifies amino acid residues of mutant UDP glucose dehydrogenase from M. loti
Data: File aligning UDPGD sequences from Streptoccus pyogenes and M. loti - Make it via Clustal (see below)
Data: PDB-formatted file containing coordinates for UDPGD from S. pyogenes - Find it
Visualize proteins in three dimensions with the Java-applet based Jmol package. MacOSX users can employ the alternative software package iMol
For a web-based interface FirstGlance in Jmol remains the best option.
Documentation: JmolClustal - Align two or more sequences (DNA or amino acid)
Reference: Jmol Interactive Scripting
Data: 1GZX.pdb, contains coordinates for three-dimensional structure of oxygenated human hemoglobin
Clustal Omega - web-based Clustal hosted on EMBL-EBI (European Bioinformatics Institute) server
Prealigned Streptococcus and Mesorhizobium sequences - just in case.
Program: Protein_mol_weight.pl, used by Problem 7.3 as shell to test new AA_module
Data: List of molecular weights of amino acids, used by Problem 7.3