Protein Loops in Tensorflow — A.I Bio Part 2

16th of March 2018

In the last post I talked about some of the basics of structural biology. I’m focusing on these annoying loops that form part of the antibody — the bits that do the work. My theory is maybe neural networks can do a better job than other methods thus far. But how do we even begin to approach this problem? How can we go from a list of amino acids to a full 3D structure?

Representing proteins

Amino acids have an amine and carboxyl group, with what is known as a sidechain that hangs off the side. I’m no chemist, so I’m sure someone out there will tell me what these words mean. But for our purposes, what we need to know is the amine and carboxyl groups form a backbone — a (polypeptide) chain if you will. It is this chain that we want to chase. For now, we don’t actually need to know the sidechain details.

Furthermore, we can represent this backbone using three dihedral angles, known as phi, psi and omega. Looking at the backbone we can count along each acid (known as a residue) and read off the atoms. If we start from the 5 prime end, we get: nitrogen, carbon, carbon, nitrogen, carbon… etc

The first carbon atom we come across is known as the Alpha Carbon (Ca). The second is the Carboxyl Carbon ( Ca). So for each residue, we have N-Ca-C, N-Ca-C, and repeat. If we know the distances and angles between these atoms we can calculate these dihedral angles. From these angles we can recreate the structure again if we need to.

A dihedral angle is essentially the angle between two planes that intersect. We take 4 atoms in our structure, find the two planes then the angle between them. Phi is the twist around the Alpha Carbon and the Nitrogen, with Psi being the twist around the Carboxyl Carbon and the Alpha Carbon.

Omega is a bit of a special case. It is the twist around the Nitrogen and the Carboxyl Carbon but it very rarely changes from around 180 degrees. It hovers roughly +/- 15 degrees from this, except in special cases. As with so many things in biology, there is always a special case!

Structures in neural networks

There has been some interesting work of late in creating 3D objects with neural networks. As far as I can tell, there are two main ways to create 3D structures; combining existing ones and discretising the space. There are a few workshops and conferences such as CVPR, 3DDL at NIPS, some github repositories like Synthesize3,3D pose estimation and Deep Lung. So far, the field seems to be fairly new, with most of the code being on the bleeding edge.

Fortunately, we don’t have to do any of that. Angles can be represented as two numbers, the sin and cosine of the angle in question. Both of these functions range from -1 to 1 and can therefore fit nicely in to the classic model of a neuron in our network. If we use tanh as oppose to something like a ReLU, we can cover this range quite comfortably (we can ignore the slight non-linear problem here for now — we are hacking this somewhat!).

NeRF wars!

So how do we go from angles back to a structure we can look at in order to do a comparison? One could do an awful lot of trig, but there is a more elegant algorithm called NeRF (and sn-NeRF). Annoyingly, the paper is behind the Wiley paywall, but the general gist of the algorithm is to start with 3 atoms, and place the next atom based on the positions of the previous three.

NeRF has two steps. Firstly, the atom is placed using the known bond angles, distances and the previous positions using a little trigonometry with the bond and torsion angles. The last step creates a matrix that rotates the atom into the correct reference plane.

The paper misses out two vital issues! One thing the paper does not go into is the fact that one should start from a Nitrogen at the 5 prime end! I’ve been trying to go from the 3 prime end, and it didn’t really work so well. I suspect this is one of these cases where the knowledge is just assumed in that domain. Secondly, there is a cheeky minus sign in the second section X value.

I had a little help from James Phillips at the London Biohackspace who took at look at my code, and noticed the 5 prime issue! Always good if you can get another person to look over your code when you are stuck.

Here is the NeRF algorithm in full, in Python for these who might want it.

NeRF isn’t perfect; it makes some assumptions that don’t always hold up. The distances given at the top do vary, though they have been verified by a few experiments. Adding in this variation might be something I implement in the future, but for now, NeRF recreates the majority of structures quite well.

Next steps with our nets

With a representation of the chain as a series of angles, we can build several different kinds of networks that, when given a series of amino acids, produce a set of numbers ranging from -1 to 1. Recombining the sin and cosine, converting to degrees and applying NeRF, we get a structure that is very close to the original.

In the next blog post, I’ll look at the different kinds of networks, which work best and what pitfalls we need to avoid.

Freelance Research Software Engineer and Bioinformatics Student.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store