Tips for a budding Masters Student in Bioinformatics

2nd of October 2018

So I’ve submitted the thesis, had the viva, gotten the result and it’s the end of a two year stint at trying bioinformatics. How did it go? Well, quite well I think but there are definitely some key learning points and outcomes that I feel are worth sharing.

I’ve uploaded all the code and methodology to my github page so you can get a better look at what some neural networks look like (hint — pretty messy I’ll admit but we can improve in the future!).

Check and double check your data

The plot above is called the Ramachandran plot and it’s used quite a bit in structural biology. It shows that certain combinations of the phi and psi angles are much more common than others. Had I looked at this graph much earlier, alarm bells would have gone off — we probably had a lot of erroneous models in there and we perhaps should have cut them out.

Redundancy was another big problem. Quite often, the same model might appear multiple times in a set, under another name. Maybe it’s the same structure with a different resolution? Maybe it was thought to be different at the time but was actually the same thing. When you are dealing with neural networks you don’t want them to start remembering what they’ve seen by showing the same datum over and over again.

Chase the niggles and scares

Reproducibility

Zenodo is an attempt to store code and datasets in order to make projects more reproducible. I’ll be placing my LoopDB database there very soon. The code I’ve used can also be found there, along with instructions on how to train your own networks.

Oh, and don’t remember to mention or cite your friendly High Performance Computing (HPC) people :)

Be ok with negative results

This is what we did when we looked at scoring and the results got a little more interesting.

Choose the right tools for the job

Where it went a little wrong was using Python for everything. While Python is pretty much the go-to for tensorflow, using something like the R language for analysing the data might have been a lot faster. Get python to spit out data that R can easily spin around, plot, filter and digest. Maybe moving the best neural network to C and running it that way would have been even better? A good mix of tools and using the right one for the right job is an underrated skill I think.

Of course, git is essential in any software development, but getting a backup and versioning strategy sorted early on is a real plus. One thing I was fortunate enough to be able to do is build a second PC with a GPU, from spare parts that were being thrown away. This box could run all the experiments whilst I designed and tested new nets on the main machine. Keeping track of all these running experiments with trello seemed to work quite well, especially when we began to use cloud services, in addition to my home box.

Zotero is essential for anyone doing this sort of thing. Honestly, I don’t know why anyone would use anything else if I’m honest. It’s free, exports to all major bibliography formats, has an excellent Android client you can use to read your papers in a comfy chair, and is easy to use. Forget Mendeley, Endnote and all that crap. And no, I don’t work for Zotero :P

Using a time tracking application on my second machine helped too! It can be easy, if you are me, to lose track of time or overestimate the amount of time you’ve spent. I know a few freelance folks who keep track of time with pomodoro affairs and what not but I use a program that allows me to keep time spent against particular projects.

Finally, turning off email, twitter, facebook or anything like that is a must! Keeping such distractions out however you can seemed to help me.

Other folks can really help

These communities might not always be in your lab or even your university or subject. I found that the London Biohackspace was really helpful in understanding a couple of algorithms I couldn’t quite get my head around.

When you are working remotely this can be really tricky and it took it’s toll on me I’d say. If I did this again, I’d probably seek out some more local groups I could go and visit to talk about my work. I did attempt to talk at a conference about my work but it came at a really bad time and I had to pull out. It’s a shame really, but it’s definitely a good idea to speak to other professionals when you can, even if it’s not something you enjoy doing. Even a little will help.

Setting up a routine with your supervisor is very important. Despite the distance and timezone difference I was upfront with my supervisor and we spoke every two weeks without fail. Getting a list of questions and thoughts before your meeting is a good idea as supervisor time is quite valuable.

Save a lot of time for writing and write a lot

People have recommended scrivener to me and I’ve tried it once. It’s pretty good but for this project I went full on Latex. Going forward I’d suggest a combination of both somehow, where one can organise one’s thoughts using scrivener first, then Latex just for layout. Sadly, scrivener doesn’t have a Linux client. One can get someway there with include files within latex but it is a little clunky. I’m open to alternatives on this one.

Doodling and keeping notes in some form of notebook is a good thing, but I did find that I never really referred to them that often. I think perhaps some mind-mapping software might be called for in the future. Keeping everything in your head all at once is impossible. Never overlook a good whiteboard, an empty wall and a large stack of post-it notes!

Choose well and enjoy it!

Birkbeck is a good university if you find yourself wanting to change tack and study whilst working. Aside from this, the structural biology group is very good and if you decide to go down a similar route, I can’t recommend them enough.

You’ll have down times, you have stresses and it won’t be easy at all. It can be a lonely place but there are moments that are just glorious! It’s an adventure!

Freelance Research Software Engineer and Bioinformatics Student.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store