Gene function#

Now that we have identified potential protein sequences in our genome, we should try to identify their function. To this end, we first have to understand how an amino acid sequence encoded by a gene becomes a functional protein within a cell.

Protein structure#

The translated amino acid chain (polypeptide) only represents the primary protein structure. A single polypeptide will fold into different folds due to hydrogen bonds and Van der Waals forces (most commonly alpha-helices and beta-sheets), which makes up the secondary structures. Lastly, these secondary structures will arrange themselves in the 3-dimensional space which is the tertiary structure necessary for protein functioning. Proteins with multiple subunits (f.ex. DNA gyrase with subunits A/B) additionally contain a quaternary structure, created by the interaction and arrangement of the subunits into a whole protein complex, but this is outside the scope of this course.The overall 3D structure of a protein is typically optimized for a global minimum of an energy function which provides stability.

Conserved protein domains#

In addition, regions within the structure which are strictly necessary for protein functioning are often conserved - similar between proteins that contain the same function. This is because they are directly involved in the functioning of that protein. Common examples for such conserved protein domains are f.ex. enzyme active sites, molecule binding sites of transportation proteins or structural components of regulatory proteins. For the protein domains to be conserved, the amino acids at specific positions need to be conserved as well. Since each amino acid sequence contains one or more of these conserved regions, gene function and protein families can be predicted based on the principle of sequence homology.

Homology-based inference#

If the function of a genetic element was determined experimentally in the past, and if we see that genetic element again, or something that looks like it, we might suspect that it has the same function. Then, at some point we have seen enough examples of this particular element to figure out a pattern or set of rules to predict the existence of that element in a completely new sequence. Broadly, this is known as function by homology or homology-based inference, which is based on the similarity of two genes and their functions because of their descent from a common evolutionary ancestor.

Exercise 5.2#

Exercise 5.2

  • What is the reason that we find similar structural regions (conserved domains) in different proteins with similar function?

Conserved domains are regions of a protein that are directly involved in the functioning of that protein, e.g., enzyme active sites. Because they are required for specific function, different proteins with that particular function may contain this domain.

  • What is the process of predicting the function of an unknown protein using sequence homology?

Since this unknown protein has not been tested experimentally, we compare the sequence encoding the unknown protein to sequence databases that contain many sequences with similar patterns that indicate the most propbable function, also known as homology-based inference.