Proteins are submicroscopic organic molecules that are the major products derived from genes. Experimental protein structure determination is a major research enterprise, and a well-recognised one. But what about calculating the structure of a protein in a computer? Yet another impressive paper from David Baker and his collaborators brings this idea another step closer.
Principles for designing ideal protein structures
Nobuyasu Koga, Rie Tatsumi-Koga, Gaohua Liu, Rong Xiao, Thomas B. Acton, Gaetano T. Montelione & David Baker
The units (amino acids) that make up a protein are encoded by its gene, and these units are connected nose to tail in an uninterrupted chain. When the chain is folded up into a 3-D shape, as most real life proteins are, a mixture of attractive forces between parts of the chain keep it together for long periods of time (think weeks or longer).
Most proteins fold into a unique arrangement. Two proteins with the same sequence aren’t as similar as two peas in a pod, they are identical, you couldn’t tell them apart. This means that there must be some strong rules that ensure that the fold is identical each time a given protein is synthesized.
We would like to know these folding rules, because we know the amino acid sequence of hundreds of thousands of proteins from gene sequencing, but the 3-D structures of only a few thousands of unique examples. Also, the proteins we have the structures of are biased towards particular types of proteins, and particularly, towards those from bacteria, which are not necessarily the most interesting ones. Knowing the folding rules would enable us to build testable models of proteins (from any organism) in the computer, as well as opening the door to synthetic proteins that might be useful.
Some rules about the organization of proteins are quite understood. The chain collapses into distinctive local structures, just as bricks can be built into walls or arches, and these local structures are packed together to make a domain. A protein might have only one domain, or many. Fairly weak bonds that share hydrogen atoms between parts of the chain are major players in forming these local structures, and equally in lashing them together. One such structure is the “alpha-helix” [wikipedia]. Here’s a cartoon.
They are spirals that always turn in the same direction- they are right handed. Another local structure is a beta strand, which is a simple zigzag /\/\/\/\/\/ . These can align into flat sheets.
The helices and strands, which are rod-like, are connected with “flexible” loops
that allow the protein to fold back on itself. What the authors found (via simulations and comparison with known structures) is that loop lengths fix the packing of sequential local structure elements, in a sequence-independent manner. Surprising. That is, the loops determine folding topology- they define how the helices and strands pack. The dogma – that “structures” like strands and helices matter most, and loops, being devoid of structure, are just in-between-parts with no role, is inverted here. To be fair, loops have attracted interest but their variable structures have been hard to predict, and sometimes they are hard to see in crystals, suggesting they have no fixed structure and thus little relevance.
The authors use their own Rosetta prediction software to optimize which different units to use (the amino acid sequence) on a given template, and tried to get bacteria to make five classes of alpha-beta folds, artificially. These artificial proteins were tested stringently and overall, the designs are successful, at a decent rate. The rate is much higher than would be expected from bad or random designs.
It would be wrong to assume that it’s a short step from here to the end of experimental structural biology. Designing large proteins from scratch or understanding how complex proteins form, especially how groups of proteins assemble, will require knowledge of more advanced rules. Just as importantly, proteins are not usually static structures, they are dynamic. We don’t have good ideas about how this dynamism is encoded. Designing proteins that catalyze chemical reactions (enzymes) and other exquisite functions like pumping metabolites or drugs across a membrane or moving containers around within cells requires more detail, but the Baker lab and others have handles on this too.
As more and more proteins are designed by computers according to simple rules, protein folding will become ever less controversial, letting us worry instead about how protein structure defines activity, which remains a vast and exciting challenge.