Abstract by Mads Mørup Nygaard
Computational design of proteins and peptides
Knowledge of a protein’s structure is critical for a complete understanding of a protein’s function and interactions as well as for the design of new potential drugs. The two most popular methods for solving atomic resolution structures of proteins are via X-ray crystallography and cryo-electron microscopy (cryo-EM). These are two techniques that present their own set of obstacles to achieve a structure of a given protein and its relevant conformation. Cryo-EM requires a particle of a certain size to resolve distinct features used to align the random orientation of the particles for the subsequent data processing needed to obtain the high-resolution structures. In X-ray crystallography, you need a crystal of the protein – a process that may lock the protein in an unfavorable conformational state.
Protein engineering has been applied successfully on numerous occasions to obtain atomic-resolution structures of proteins. The first part of my thesis focuses on two engineering approaches for solving structures of proteins with a specific focus on G protein-coupled receptors (GPCRs). I) by searching the full protein data bank, we find a number of fusion protein candidates with the potential to be fused into the reading frame of proteins. This is done to increase the particle size and add distinct marks on the particle enabling structural characterization by cryo-EM. II) by fusion of a small peptide from the C-terminal end of the Gα protein into the reading frame of a GPCR. We demonstrate that the peptide binds in the intracellular transducer binding pocket of the GPCR. This might allow for structural characterization of the active conformation of a GPCR via X-ray crystallography without binding the full G protein complex.
In the second part of my thesis, I focus on optimizing binding between peptides and binding partners via substitutions to non-proteinogenic amino acids using two different machine learning models. A deep learning 3D convolutional neural network can be trained on structures of proteins to predict which amino acid can be inserted in a given environment consisting of the cavity left behind upon removal of the given amino acid. Typically, models have been trained in a classification setting with the goal of predicting the 20 natural amino acids. Here, I expand the model by replacing the categorical output with a continuous one, corresponding to the latent space of a pre-trained molecular representation model. We show that the new model leads to meaningful perturbations over the natural amino acids and demonstrates a performance comparable to the state of the art with reduced computational cost.
In the second model, I fit relative binding affinity values obtained in a deep mutational scan of natural amino acids of three peptides binding to the Sushi domain 1 of the GABAB1a receptor. I show that by encoding the amino acids with the z-scales for amino acids we are able to fruitfully extrapolate the model to predict affinities when substituting to the 67 non-proteinogenic amino acids in the z-scales. Eight suggested non-proteinogenic amino acid substitutions from the model were selected for binding affinity characterization in vitro. All eight peptides with the suggested substitutions showed better binding affinity than wildtype and four were better than the previous best substitution to a natural amino acid in the same position. The two models show incorporation of machine learning models may be a good idea when optimizing peptides in a rational design approach.