In conversation with Prof. Sowdhamini

The Computational Approaches to Protein Science (CAPS) lab led by Prof. R Sowdhamini turned 25 last year. Over her research career, Prof. Sowdhamini has progressed from studying polypeptides to protein structures to genome databases. She has worked with protein folds, analyzed sequence data, worked with genome databases and now medicinal plants. In a way her research is an embodiment of biology across scales- the theme of NCBS. Join Dr. Vaishnavi Sridhar, as she talks to Prof. R Sowdhamini.

Q: It has been 25 years since you joined NCBS. Tell us about your initial days at NCBS?

A: I started at NCBS on 27th June 1998. At that time, NCBS was still located at Indian Institute of Science (IISc). I was given a room in the corridor where principal investigators would sit. We knew that we would move to the new campus, i.e., the current GKVK campus and that this was a temporary arrangement. On 6th December 1998, we moved to the new campus. There was only the Eastern Lab Complex (ELC) and the housing apartment Mallige at that time, the rest of the buildings came up later.

We were apprehensive to move to the new campus. When we came to the new campus, we felt isolated as our labs were away from each other. At IISc, our labs were close to each other. However, the plus point was that we got individual labs and a lot of space. There were 12 faculty and very few staff back then. Over time many young faculty joined, projects expanded, and different facilities came up. Even though we were growing and developing, we were still concerned that we may lose touch with the rest of the scientific community. To get around this, we hosted many meetings, conferences, and events to keep in touch.

My current office has been my office for the past 25 years. In terms of lab setup, we had little to ask, as we were a computation lab, but we only wanted the best. We asked for an octane silicon graphics machine, used for computing, coding and graphics, and the servers were assembled in house. I would say that my research kick started after we got the Wellcome Trust Grant in 2000. This grant was specifically for researchers who were trained outside the country and wanted to return.

Q. What prompted you to join NCBS?

I got lucky. We saw the initiation of NCBS as graduate students. We knew that a biology institute with Tata Institute for Fundamental Research (TIFR) as the parent was coming up.

It took me a lot of thinking and finally I joined in 1998.

Q. What was your training in?

I was trained as a computational biologist. Computation has been my strong suite. I did my PhD at IISc under the guidance of Prof. P Balaram, who was an experimentalist, but supported my 100% computation based PhD. During my PhD, I studied how disulphide bonds can be incorporated in a protein. These bonds help stabilize proteins. I looked to see which sites of the proteins could incorporate these bonds, without disturbing protein structure. This was the late 80’s and protein engineering had just begun with the advent of site directed mutagenesis. I also studied supersecondary structures.

My Postdoc was initially in Birbeck College, University College London and then at University of Cambridge with Prof. Tom Blundell, where my project was to classify globular proteins into families. This was a time before the genome sequencing era, so not many sequences were available. I used single protein sequences and applied my training as a computational biologist to understand trends across the whole genomes.

I focused on conserved parts of protein sequences or motifs, and used that to identify similar proteins. I was interested in the structural parts more than the motifs required for enzymatic activity. I classified globular proteins according to folds using my strong coding background. I wrote a code to pick up these folds. The structural data increased over time, so we went for an automated approach. The idea was to get an evolutionary model and to look at proteins whose sequences had diverged over evolution, but their structure and function were similar. The idea being, proteins with similar structure would have similar function. We used this information to compile proteins into various families.

Q. What does your lab at NCBS work on? How has this diversified over time?

We take a 3-tiered approach and look at protein sequence, structure and function. We look at protein sequence data to study diverse protein families. The aim is to understand how proteins retain their function, how they fold a certain way and how much change can they accommodate before their structure changes completely.

Our focus is now on specific protein families or superfamilies such as G-protein coupled receptors, methyltransferases, serine proteases and myosins. We are interested in genome-wide surveys, i.e., we search for conserved protein sequences within a genome using mathematical models. We are also interested in cross genome surveys, i.e., we compare conserved protein sequences in different genomes to find trends, such as how the sequences are changing, what are the differences, what part of the sequence is preserved, and are there any species where there is a sudden specific change in sequence. The idea is to pick a needle in the haystack. We use the data that we get to train the models and refine the codes. This makes the program even more robust.

Picture 1: Celebrating 25 years of the CAPS lab. A cake depicting the major milestones of Prof. Sowdhamini’s lab. Picture credit Dr Swetha Raghavan

We also study the structures of proteins using 3D modeling to visualize protein structures and observe where the changes are occurring among protein structures from closely related species. We also study how proteins could possibly interact with other proteins, using a program that we developed and how proteins interact with small molecules (potential therapeutics).

For the function part, we do this mostly by collaborating with other labs. We do the computational work to predict the potential function of a protein or how mutating certain regions of a protein can affect function and our collaborators do the experiments. Sometimes, my students also do experimental work.

In one such example of collaborative work, we applied to the Human Frontier Science Program, where we proposed to do the bioinformatics work, another lab used biophysics to do the experiments and finally a mathematical modeling lab to make sense of the data. I believe that collaborations lead to better questions.

About 10 years ago, I started working with medicinal plants. The aim is to understand how small molecules from plants interact with enzymes. We perform virtual screens to test which small molecule fits the best with an enzyme to find potential therapeutics.

Q. If we look at when you started vs now, how has the field of bioinformatics and computational biology evolved and how did that help with your research?

We predict how proteins form structures. When I started, we used to look at single sequences and used to do fold predictions. Eventually we moved to sequence based predictions. Many tools and databases such as Basic Local Alignment Tool (BLAST), Protein Data Bank (PDB) and many others were just coming up. It was just the beginning. Over time these became free and improved a lot. At that time, sequences had just started becoming available. Over time the repository of sequences has grown compared to when I started, when we hardly had any large scale sequence data. We do computation work, so for us, the more the merrier.

The technology and software has advanced as well. Originally, we used the octane graphics system. Eventually, we started getting computer clusters to do high throughput computing. These are basically stacks of computers that are talking to each other. Each cluster has multiple computers or nodes. There is also a master computer that commands the other computers in a cluster. We now have 7-8 clusters, all kept in SLC in a special room, as it needs cooling. The growth and evolution of these clusters have helped analyze large questions and large data. Now we have 500 cores and are moving towards GPU clusters to do simulations to predict protein structure. In a nutshell, all systems have evolved. In fact, we can now use desktops to generate high quality graphics.

There is a fun story related to the clusters. When the third cluster was brought in thirteen years ago, it was a huge deal. It was the largest in the country and was brought into the basement by a crane. There was a news article that covered this!

Picture 2: The 25 year old silicon octane graphics machine from 1999 (L) and the famous assembled SGI-SMP cluster from 2011 (R). Picture source: R. Sowdhamini

Q. What inspires you to keep going?

The environment at NCBS makes it easy to collaborate and venture into new directions. There is freedom to pursue your interests.