iDocSlide.Com

Free Online Documents. Like!

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Share

Description

ChainTweak: Sampling from the Neighbourhood of a Protein Conformation

Tags

Transcript

CHAINTWEAK: SAMPLING FROM THENEIGHBOURHOOD OF A PROTEIN CONFORMATION
ROHIT SINGH and BONNIE BERGER
∗†
Computer Science and Artiﬁcial Intelligence Laboratory Massachusetts Institute of Technology Cambridge MA 02139 E-mail:
{
rsingh, bab
}
@mit.edu
When searching for an optimal protein structure, it is often necessary to generatea set of structures similar, e.g., within 4˚A Root Mean Square Deviation (RMSD),to some
base
structure. Current methods to do this are designed to produce onlysmall deviations (
<
0
.
1˚A RMSD) and are ineﬃcient for larger deviations. Themethod proposed in this paper, ChainTweak, can generate conformations withlarger deviations from the base much more eﬃciently. For example, in 18 secondsit can generate 100 backbone conformations, each within 1-4˚A RMSD of a given45-residue conformation. Moreover, each conformation has correct bond lengths,angles and omega torsional angles; its phi-psi angles have energetically favorablevalues; and there are rarely any backbone steric clashes. The method uses theinsight that loop closure techniques can be used to perform compensatory changesof dihedral angles so that only a part of the conformation is changed. It is demon-strated, using decoys from the Decoys ‘R Us data-set, that ChainTweak can beused to construct good decoys. It also provides a novel and intuitive way of analyz-ing the energy landscape of a protein. In addition, ChainTweak can improve theaccuracy and performance of the loop modeling program RAPPER by an order of magnitude (1.1 min. vs. 36 min. for an 8-residue chain).
Availability & Supp. Info.:
http://theory.csail.mit.edu/chaintweak
1. Introduction
A fundamental axiom of molecular biology is that the function of a proteinis determined by its structure. In turn, most protein structure determinationproblems are, essentially, search problems. In some of these, e.g., homologymodeling or protein re-design, the problem speciﬁcation may restrict thesearch to the neighbourhood of some template structure. In other cases,restricting the search to the neighbourhoods of a set of candidate structuresmight just be a solution strategy (e.g., in the Rosetta
4
method for
ab-initio
folding). Here, the
neighbourhood
of a structure is the set of structures similarto it. For example, the set of all structures within, say, 4˚A RMSD of a basestructure could be deﬁned as its neighbourhood
a
.
∗
Corresponding author
†
Also in the MIT Dept. of Mathematics
a
Of course, the size of the neighbourhood and, consequently, the exact choice of a RMSDthreshold would depend on the problem instance and the size of the protein. Typically,
Global LocalNeighbourhood
E n e r g y
Distance from native
(a)
A B C D
(b) (c)
Figure 1:
(a)
A cartoon illustrating the space coverage diﬀerences between global, neigh-bourhood, and local search. Observe that local search techniques can only cover the basinon one local minima.
(b)
Cartoon illustrating that changes in dihedral angles near theterminal regions of a chain (A) result in small perturbations (B and C), while changing anangle in the middle of the chain results in a very large perturbation (D).
(c)
Example out-put from ChainTweak. Ten conformations from the neighbourhood of a 32-residue proteinstructure (PDB:1clv, chain I) were sampled and aligned with the srcinal. The srcinalstructure is in black, the others are in gray.
Eﬃciently searching in the neighbourhood of a possible protein structure(conformation) is thus an important and frequently recurring problem. Asthe term “neighbourhood search” signiﬁes, this search problem is diﬀerentfrom global or local search problems (Fig 1a), even though it has usuallybeen studied as an extreme case of these. This paper focuses on the samplingcomponent of this search problem and presents a method, ChainTweak, foreﬃciently and representatively sampling from a given neighbourhood.Many diﬀerent approaches to neighbourhood sampling have been tried.High-temperature Molecular Dynamics (MD) methods have been used togenerate structures with 2-4˚A RMSD from the native
1
. Methods based ondiscrete oﬀ-lattice models
2
,
3
discretize the dihedral-angle space and try outdiﬀerent combinations. Similarly, in Monte Carlo (MC) search methods, var-ious move-sets have been developed for making local moves. For example,fragment-swap MC in Rosetta
4
relies on using a database of polypeptide frag-ments to swap one fragment for another, as long as their ends match. Anotherset of approaches, such as in torsional dynamics
5
, or the MC-based methodsproposed by Ulmschneider & Jorgensen
6
and Cahill et al.
7
use geometricinsights to perform such local modiﬁcations.Our proposed neighbourhood sampling method, ChainTweak, has manyadvantages over existing methods. Rather than being closely tied to somesearch strategy (or an energy function), it is a stand-alone method that canbe used by researchers as a black-box, allowing them to focus on other partsof the search problem (e.g., energy function design
8
). Moreover, ChainTweak
for a 50-residue protein, two conformations within 2˚A of each other are considered almostidentical. Thus, in this case, the threshold size should be
≥
2˚A.
enables fast generation of ensembles (sets of conformations) centered aroundany given base conformation. The ﬂexibility of ChainTweak enables novelapplications (e.g., energy function analysis) and enhances the performance of existing applications (see Section 5).
1.1.
Neighbourhood Sampling: The Right Representation?
Almost all neighbourhood sampling methods work by perturbing the baseconformation’s structure to generate conformations in its neighbourhood. Tomodel the structure, these methods use either an all-atoms Cartesian coor-dinates based model or a dihedral angles based model.Most existing methods use the Cartesian coordinate based model. Withthis model, however, an energy minimization step is needed to restore correctbond lengths/angles in the perturbed structures. Eﬃciency and convergenceissues with this step limit the size of a single perturbation step (
<
0
.
1˚A
9
).Thus, only a small neighbourhood around the base can be explored. Forlarger deviations, successive perturb-and-minimize operations, using an MD-like approach
1
, can be done. However, generating many MD trajectories, toensure representative sampling, may become computationally expensive.In contrast, representing the protein backbone by its dihedral angles oﬀersdistinct advantages. All conformations sampled from the neighbourhood willthen have diﬀerent dihedral angles but the same bond lengths/angles. Sincethe latter can always be set to their desired/ideal values, no minimization stepis necessary. Hence, the restriction on small perturbation sizes is removed.However, modifying a dihedral angle at residue
i
changes the positions of allresidues
i
+ 1 onwards. As a result, the perturbed structure may deviateso far from the base as to not be in the neighbourhood at all, especially if residue
i
is in the middle of the chain (Fig 1b). This problem is the majorstumbling block in using a dihedral angles based representation.One way to solve this problem, e.g., in Torsional Dynamics (TD)
5
, iscompensatory modiﬁcation of multiple torsional angles such that the overallstructural deviation is acceptably small. However, the diﬀerential calculus-based methods used by TD algorithms work well only for small perturbations.Moreover, the sampling behavior is eﬀected by the energy function chosen forthe TD simulations. The reader might also notice the parallel here with theloop closure problem where one needs to ﬁnd small chains joining two ﬁxedends. Indeed, our proposed algorithm, ChainTweak, exploits this parallel.
1.2.
Contributions
ChainTweak is an algorithm for eﬃciently sampling from the neighbour-hood of a given base conformation. It generates a set of backbone conforma-
tions such that each new conformation has the following properties: it lies ina neighbourhood of the base; it has the terminal (ﬁrst and last) residues ﬁxedin the same relative positions as the base; and it has bond lengths/angles setto their desired/ideal values. In Section 2 we describe a simple extension thatallows the positions of terminal residues to vary as well.ChainTweak iteratively perturbs the base conformation using the dihedralangle representation. A sliding window approach is used to successively movesome atoms by 0–2˚A while keeping all others ﬁxed. Inside the window, loopclosure methods are used to generate such perturbations. Moreover, residue-speciﬁc phi-psi angle preferences can be used to choose a perturbation.We show that ChainTweak can explore large neighbourhoods eﬃciently.Given a conformation of a 45-residue protein, in 18 seconds it can generate100 backbone conformations, each within 1-4˚A RMSD of the base. Moreover,by running ChainTweak for more iterations larger neighbourhoods can beexplored: for this protein, a conformation with RMSD of 12˚A from the basecan be found. In contrast, after 18 seconds, an MD simulation (run usingTINKER
9
) produces a single conformation for the same protein (with 0.91˚ARMSD). Even theoretically, ChainTweak’s running time is asymptoticallyoptimal— linear in the length of the chain and the number of samples desired.We also describe some applications of ChainTweak (Section 4.2). It im-proves upon the performance of some existing applications (decoy generationand
ab-initio
loop-modeling using RAPPER) and also enables novel applica-tions (energy function analysis in an intuitive way).
2. Algorithm
Here we present the algorithm ChainTweak that has the following inputand output:
Input:
A single backbone conformation
C
0
described by its bond lengths,bond angles and dihedral angles.
Output:
N
conformations such that the RMSDs of these conformations w.r.t.
C
0
roughly follow a desired distribution. For example, half of the outputconformations are 0–2˚A RMSD from the base while the rest are 2–4˚A RMSDfrom the base. For each output conformation, the bond lengths, bond anglesand the relative positions of the end-residues are the same as in
C
0
.The initial restriction on preserving the relative positions of the end-residues can be adapted for ﬂexible chain ends by pre-processing
C
0
to producea set of conformations with randomly sampled values for dihedral angles atthe end-residues. Recall that modifying dihedral angles at the ends onlyresults in local structure changes (Fig 1b). Each of these conformations thenbecomes the input to a separate ChainTweak instance.
Observe that by iteratively setting each output conformation as the inputof a new ChainTweak problem, more solutions can be found for the srcinalChainTweak problem. Also, the problem can be
recursively
solved by split-ting the input chain into two sub-chains and concatenating the respectivesolutions. We do this until we have a chain small enough to be solved usingloop-closure techniques. The pre-processing step (moving the chain ends)mentioned previously is required only at the top level of recursion, i.e., forthe full-length chain.The loop closure problem was informally discussed by Robert Diamond
14
and was formally deﬁned by Go and Scheraga
15
. The input in such a problemis the relative position of two ﬁxed residues (anchors) at each end and thegoal is to ﬁnd diﬀerent possible conformations for a polypeptide chain of length
m
joining the ﬁxed ends. For a problem instance with 6 unknowndihedral angles, i.e. 6 degrees of freedom (DOFs), the maximum numberof possible solutions is 16. With more DOFs, the number of solutions isinﬁnite. In the 6-DOF case, Manocha et al.
16
applied inverse kinematicstechniques from robotics to numerically generate all possible 16 solutions.More recently, Wedemeyer and Scheraga
17
and Coutsias et al.
18
have alsopresented analytic solutions for the 6-DOF problem. ChainTweak can useany of these as a subroutine (Algorithm 3 in Supp. Info.).ChainTweak iteratively calls the subroutine
SlideWin
(Algorithm 2 inSupp. Info.). Given a starting backbone conformation,
SlideWin
ﬁnds anew backbone conformation using a sliding window approach (Fig 2a). Awindow of 3 residues (9 points) is chosen. After ﬁxing 3 points on bothends, this results in a 6-DOF loop closure problem. We use Manocha et al.’salgorithm when omega angles are unrestricted and Coutsias et al.’s algorithmwhen omega angles have to be restricted to particular values (say, 180
◦
). Awrapper around these routines (
LoopClsr6
, Algorithm 3) suggests up to15 alternative conformations for the conformation inside the window. Of these, we randomly select one conformation, biasing our choice towards aconformation that has phi-psi angles in favorable/acceptable regions of theRamachandran Plot (Fig 3). Residue and secondary structure informationcan thus be encoded by designing appropriate phi-psi preference maps.A single iteration of
SlideWin
moves each residue by about 0.5–1.5˚A.ChainTweak (Algorithm 1 in Supp. Info.) iteratively applies
SlideWin
K
times to achieve a much larger deviation from the starting conformation; theoutput conformations of one iteration form the input for the next. Betweeneach iteration, some conformations may be pruned out, depending on theirRMSD from the srcinal structure. The exact pruning policy is described by

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...Sign Now!

We are very appreciated for your Prompt Action!

x