OxCompBio/tutorials/Docking/docking-2020.html at master · IAlibay/OxCompBio · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<title>Introduction to in silico docking</title>

<link rel="stylesheet" href="css/dock.css">
</head><body>

<h1>Session 4: Introduction to <i>in silico</i> docking</h1>
<b>Phil Biggin</b>  Previous input from Greg Ross and Ranjit Vijayan<br><a href="mailto:philip.biggin@bioch.ox.ac.uk">philip.biggin@bioch.ox.ac.uk</a>
<P>

</p><h2>Table of Contents</h2>
<p>
</p><ol>
<li><a href="#intro">Introduction</a></li>
<li><a href="#install">Installing the packages locally</a></li>
<li><a href="#data">Obtaining the data files</a></li>
<li><a href="#exercise1">Exercise 1: Visualising the protein</a></li>
<li><a href="#exercise2">Exercise 2: Preparing a ligand and protein for docking</a></li>
<li><a href="#exercise3">Exercise 3: Docking indinavir into HIV-1 protease</a></li>
<li><a href="#exercise4">Exercise 4: Docking nelfinavir into HIV-1 protease</a></li>
<li><a href="#exercise5">Exercise 5: Generating the electrostatic surface of the protein</a></li>
<li><a href="#discussion">Concluding remarks</a></li>
<li><a href="#references">References</a></li>
<li><a href="#appendixI">Appendix I: Useful links/resources</a></li>
<li><a href="#appendixII">Appendix II: Useful Unix/Linux commands</a></li>
</ol>

<ol>
<h2><li><a name="intro" id="intro">Introduction</a></li></h2>
<h3>Why is docking important?</h3>
<p>The purpose of this practical is to give a flavour of how one could dock small flexible molecules to a protein structure. Such <i>in silico</i> methods are extremely useful for both finding potential binding sites and also to discover and/or engineer new molecules that could bind to a known site. This is a multi-billion dollar industry. Virtual screening and blind docking are often employed in an attempt to discover new medicines.</p>

<h3>A case study: HIV-1 protease</h3>
<p>The HIV-1 protease <a href="#references">[1]</a> is an enzyme that is vital for the replication of HIV. It cleaves newly formed polypeptide chains at appropriate locations so that they form functional proteins. Hence, drugs that target this protein could be vital for suppressing viral replication. A handful of drugs - called HIV-1 protease inhibitors (saquinavir, ritonavir, indinavir, nelfinavir, etc.) <a href="#references">[2]</a> - are currently commercially available that inhibit the function of this protein, by binding in catalytic site that binds the polypeptide.
<table align="center" border="0">
<tr><td align="center">
<img src="images/hiv-protease.png" alt="HIV-1 protease" width="500"><br>
HIV-1 Protease
</td></tr>
</table>
</p>

<h3>What does this practical cover?</h3>
<p>We will first look at the protein structure (PDB ID: 1HSG) <a href="#references">[3]</a> used in the related molecular dynamics practical (but this practical is completely stand alone) in a bit more detail. We will then try to dock a couple of drug molecules into the binding site to see how well docking can reproduce the binding pose. We will also generate the electrostatic surface of the protein to study a bit more how the drug interacts with the protein.
</p>
<P>
Anything that starts with the symbol <tt>%</tt> should be run in a command line terminal/command prompt and <tt>PyMOL&gt;</tt> should be run in a PyMOL command line.  Very little knowledge is assumed and this really is an undergraduate level introduction to using these tools and docking.  If you are trying to do this on your own locally, then the hardest step is likely to be installing all the relevant software parts!
</p>

<h2><li><a name="install" id="install">Installing the packages locally</a></li></h2>

<h3>Getting the relevant software</h3>
<p>
This practical is written to be as agnostic to platform as possible and as such it should be possible to run this on your laptop.
We use the docking program known as Autodock Vina (or just Vina)  (Trott et al, 2009) as it is freely available, very fast, reasonably accurate and is widely used.
There other tools we need to use are apbs and pymol.   <P>

If you are on a linux box you may like to also try using AutoDockTools, which is an
excellent set of tools for preparing docking runs, but does not work on all platforms (including Mac Catalina at the time of writing) and can be a little bit tricky
to install.  Because of that, we use show/use a different combination of alternative tools that should work regardless of your operating system, but if you are able to use it, AutoDockTools is very powerful and has a nice GUI.<P>

Most of software necessary can be very simply installed under the conda framework.  Conda is available on windows, linux and mac.
<ol type="i" class="roman">
<li>This is the first step if you have not already installed conda (or miniconda) on your laptop - Rather than repeating the instructions here, simply head over to <a href="https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html">https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html</a> and pick
up the relevant instructions for your operating system.<P>

<li>Next, install all necessary packages with conda and activate the new environment using
<pre>
  % conda env create -f oxdocking-env.yml
  % conda activate OxDocking
</pre>

<li>Next we need to install a package called ADFR - head over to this link to obtain and install the relevant one for your operating system <a href="https://ccsb.scripps.edu/adfr/downloads/">https://ccsb.scripps.edu/adfr/downloads/</A><P>

<li> If you are on a mac or linux then you can install vina via conda like this as well:
<PRE>
  % conda activate OxDocking
  % conda install -c bioconda autodock-vina
</pre>

If you are using windows, you will have to install vina and autodock tools separately:<P>

 Full install instructions for vina are here - (including how to build from the source) are here- <a href="http://vina.scripps.edu/manual.html">http://vina.scripps.edu/manual.html</a>)<P>

<li>Optional:  For all operating systems AutoDockTools has to be downloaded and installed separately.  You can obtain and install it from here - its part of the mgltools package: <a href="http://mgltools.scripps.edu/downloads">http://mgltools.scripps.edu/downloads</a><P>

	<li> Windows users you will have to add the locations of the executables for vina, ADFR_install/bin/reduce, ADFR_install/bin/prepare_receptor and adt.bat to you PATH - google how to add things to your path if you don't know how to do this - in windows 10 there is a gui that allows you to browse and select paths.
<P>

</ol>

<h2><li><a name="data" id="data">Obtaining the data files and checking it all works</a></li></h2>

Now we have all the packages installed we should just check through that all of the following tools are working.

<pre>
% cd ~/Desktop
% vina --version
% apbs --version
% pdb2pqr --help
% pymol -c
% obabel -H
% ADFR_install/bin/reduce -h
% ADFR_install/bin/prepare_receptor -h
% adt -h (optional if you want to use ADT which can replace obabel and ADFR steps)
</pre>


<ol type="i" class="roman">
<li> Now lets get all the data files you need for this practical - they can be found here: <br><a
href="http://sbcb.bioch.ox.ac.uk/phil/teaching/docking-2020.tar.gz">http://sbcb.bioch.ox.ac.uk/phil/teaching/docking-2020.tar.gz</a>. Download and save it to your Desktop. (<i>Note:</i> To do this, you might have to right click the link, select "<b>Save Link As...</b>" and make sure the folder in "<b>Save in folder</b>" is <b>Desktop</b>. Click <b>Save</b>)<P>

<li> On your Desktop, right click the downloaded file <tt>docking-2020.tar.gz</tt> and select <b>Extract here</b>. This should create a directory called <tt>dock-prac</tt> on your Desktop.  Alternatively, copy the archive to your desktop from where it was downloaded to  and then do "tar xf docking-2020.tar.gz" (note this will work in windows 10 now directly - older windows will probably require winzip or equivalent to unpack).
</ol>

<h2><li><a name="exercise1" id="exercise1">Exercise 1: Visualising the protein</a></li></h2>
<p>In this practical, we will use the structure of the HIV-1 protease used in the molecular dynamics practical - PDB ID: <b>1HSG</b>. This is a 2&Aring; resolution X-ray crystal structure of HIV-1 protease with a bound drug molecule <a href="http://en.wikipedia.org/wiki/Indinavir" taget="new"><b>indinavir</b></a>. We will use <tt>pymol</tt> to view the protein, the binding site and the drug molecule. <tt>pymol</tt> options/controls that you learned in the homology modelling practical will be handy here. An introduction to PyMOL can be found <a href="http://pymolwiki.org/index.php/Practical_Pymol_for_Beginners" target="new">here</a>.
</p>
<table align="center" border="0">
<tr><td align="center">
<img src="images/indinavir.png" alt="Indinavir" width="250"><br>
Indinavir (Source: http://en.wikipedia.org/wiki/Indinavir)
</td></tr>
</table>
<br>
<ol type="i" class="roman">
<li>In a terminal (open a new terminal if you don't have one open already), change to the docking practical directory and activate the conda environment you created in section 2.
<pre>
% cd ~/Desktop/dock-prac
% conda activate OxPython
% mkdir 1HSG
% cd 1HSG
</pre>
<li>Download the protease structure in PDB format (PDB ID: 1HSG) from the Protein Data Bank (<a href="http://www.pdb.org">http://www.pdb.org</a>). Search using the PDB ID 1HSG. On the results page, click <b>Download Files</b> and select <b>PDB File (Text)</b>. Save it in <tt>~/Desktop/dock-prac/1HSG</tt>. If the internet is down you can get a copy from <tt>~/Desktop/dock-prac/backup/data/1HSG.pdb</tt>. (<i>Note:</i> Once again you might have to right click, select <b>Save Link As...</b> and browse to <tt>/home/biocomp/Desktop/dock-prac/1HSG</tt>. Make sure you set the filename <b>Name</b> to 1HSG.pdb. Click <b>Save</b>)
<li>Load the structure into <tt>pymol</tt>. You should see the protein structure displayed as lines and water molecules as little red crosses.
<pre>
% pymol 1HSG.pdb &amp;
</pre>
<li>Lets start by hiding everything. Click the <b>H</b> (for hide) next to <b>all</b> in PyMOL's <b>Object Control Panel</b> (the panel on the right with buttons <b>A</b>ctions, <b>H</b>ide, <b>S</b>how, <b>L</b>abel and <b>C</b>olour) and select <b>everything</b>. The screen should be clear now.
<li>Now show (<b>S</b>) the protein (<b>1HSG</b>) using the <b>cartoon</b> representation and colour (<b>C</b>) by chain to show that it is a homodimer.
<li>Let's select the ligand indinavir and show it as sticks. In the PyMOL command line (with a "<tt>PyMOL &gt;_</tt>") type
<pre>
PyMOL&gt; select indinavir, resn MK1
</pre>
<tt>resn MK1</tt> selects the residue MK1, which is indinavir. You should now have an object in the object control panel called <b>(indinavir)</b>. Display it as sticks. Click anywhere in the display screen to unselect the selected atoms.
<li>Now, rotate (left mouse button), zoom (right mouse button) and move (middle mouse button) the molecule to get an idea where the binding site is. You will need to know where it is for the next exercise .
<li>Water molecules have the residue name HOH. Select and display all water molecules as red spheres. If you think the spheres are too big, type
<pre>
PyMOL&gt; set sphere_mode, 4
PyMOL&gt; set sphere_scale, 0.4
</pre>
<li><div id="question">Q: Water molecules normally have 3 atoms. Why do we see just one atom per water molecule?</div>
<li><div id="question">Q: There is a conserved water molecule in the binding site. Can you identify this water molecule?</div>
<li><div id="question">Q: Can you think of a way in which the ligand could enter the binding site?</div>
<li>If you would like to save a nice image of the protein, ray trace it and save it as an image file.
<pre>
PyMOL&gt; bg_color white
PyMOL&gt; set depth_cue,0
PyMOL&gt; ray
PyMOL&gt; png protein.png
</pre>
<li>Close PyMOL by typing <tt>quit</tt> in PyMOL command window or clicking the <b>x</b> in the PyMOL Tcl/Tk GUI window.
<li>Using <tt>eog</tt>, you can visualise the image you just generated.
<pre>
% eog protein.png
</pre>
</ol>

<h2><li><a name="exercise2" id="exercise2">Exercise 2: Preparing the protein and ligand for docking</a></li></h2>
<p>Docking algorithms require each atom to have a charge and an atom type that describes its properties. However, the PDB structure lacks these. So, we have to prep the protein and ligand files to include these values along with the atomic coordinates. Furthermore, for flexible ligand docking, we should also define ligand bonds that are rotatable. All this will be done in a tool called AutoDock Tools (<tt>autodocktools</tt>).</p>

<h3>Prepare the protein</h3>
<ol type="i" class="roman">
<li>The PDB file (<tt>1HSG.pdb</tt>) contains protein, ligand and water oxygen atoms. First we have to extract just the protein atoms, which are lines that start with the keyword <b>ATOM</b>. Each protein chain is terminated with a line that starts with <b>TER</b>. (If you would like to confirm this, open <tt>1HSG.pdb</tt> in a text editor and scroll through the text.)
<pre>
% egrep "^(ATOM|TER)" 1HSG.pdb > 1HSG_protein.pdb  [If you are on linux or mac]
% findstr "^ATOM ^TER" 1HSG.pdb > 1HSG_protein.pdb  [If you are on windows]
</pre>

<li>Proteins obtained from the PDB don't have hydrogens (unless the resolution is sub Angstrom or the structure is obtained by neutron diffraction).  So the first step is to add hydrogens, which we will do using ADFR:

<pre>
% ADFR_install/bin/reduce 1HSG_protein.pdb > 1HSG_protein_H.pdb &amp;
</pre>
<pre>
% ADFR_install/bin/prepare_receptor  -r 1HSG_protein_H.pdb -o 1HSG_protein.pdbqt &amp;
</pre>
You can skip the next paragraph into "Prepare the Ligand"

</ol>
<h3>Prepare the protein using AutoDockTools</h3>
<ol type="i" class="roman">
You can also do the previous steps with AutoDockTools
<pre>
% adt &amp;
</pre>
<li>Load the protein using <b>File &gt; Read Molecule</b>. Select <tt>1HSG_protein.pdb</tt>. Click <b>Open</b>.<br>
<b>Note:</b> In ADT, you can translate the molecule by clicking and holding down the right mouse button while moving the mouse, rotate by clicking and holding down the middle button and zoom in/out by using the scroll wheel of the mouse.
<li>Bonds and atoms are shown in white. For better visualisation, colour the structure by atom type - <b>Color &gt; By Atom Type</b>. Click <b>All Geometries</b> and then <b>OK</b>.
<li><div id="question">Q: Can you locate the binding site visually?</div>
<li>Crystal structures normally lack hydrogen atoms. <span id="question">Q: Why?</span> However, hydrogen atoms, or more specifically polar hydrogen atoms are required for appropriate treatment of electrostatics during docking. Add hydrogen atom to the structure using - <b>Edit &gt; Hydrogen &gt; Add</b>. Click <b>OK</b>. You should see a lot of white dashes where the hydrogens were added.
<li>Now we need to get ADT to assign charges and atom type to each atom in the protein. We do this with <b>Grid &gt; Macromolecule &gt; Choose...</b>. Choose <b>1HSG_protein</b> in the popup window and click <b>Select Molecule</b>. ADT will merge non-polar hydrogens, assign charges and prompt you to save the macromolecule. Click <b>Save</b>. This will create a file called <tt>1HSG_protein.pdbqt</tt> in the current folder. Open this in a text editor and look at the last two columns - these should be the charge and atom type for each atom. <div id="question">Q: Look at the charges. Does it make sense?</div>
<li>Whilst we are here, we can also demonstrate how AutoDockTools can be used to define grid box that determines the 3D search space were ligand docking will be attempted. Remember the binding site that you observed in one of the earlier steps. If we do not know the binding site, we will either define a box that encloses the whole protein or perhaps a specific region of the protein. In this case, to speed up the docking process, and because we know where it is, we will define a search space that encloses the known binding site.
<li>To define the box, use <b>Grid &gt; Grid Box...</b>. This will draw a box with opposite faces coloured in red, green and blue. Fiddle with the dials and see how you can enclose regions of the protein. In this instance we will use a <b>Spacing (angstrom)</b> of <b>1&Aring;</b> (this is essentially a scaling factor). So set this dial to <b>1.000</b>. So that we all get consistent results, let us set the <b>(x, y, z) center</b> as <b>(16, 25, 4)</b> and the <b>number of points in (x, y, z)-dimension</b> as <b>(30, 30, 30)</b>. Make a note of these values. We will need it later. Close the <b>Grid Options</b> dialog by clicking <b>File &gt; Close w/out saving</b>.
<li>That is all we need to do with the protein file. Delete it from the display using <b>Edit &gt; Delete &gt; Delete Molecule</b> and select <tt>1HSG_protein</tt>. Click <b>Delete Molecule</b> and <b>CONTINUE</b>.
</ol>


<h3>Prepare the ligand</h3>
<ol type="i" class="roman">
<li>Like the protein, the ligand lacks hydrogen atoms. We need to add hydrogen atoms and also define rotatable bonds that will be used for flexible docking.
<li>First, extract the ligand atoms from the PDB. As mentioned in Exercise 1, the ligand residue name for <b>indinavir</b> in the PDB file is <b>MK1</b> and the lines start with the keyword <b>HETATM</b> for heteroatoms. In a terminal type
<pre>
% grep "^HETATM.*MK1" 1HSG.pdb &gt; indinavir.pdb (on windows: findstr "^HETATM.*MK1" 1HSG.pdb > indinavir.pdb)

% obabel -ipdb indinavir.pdb -opdbqt > indinavir.pdbqt
</pre>
</ol>

<h3>Prepare the ligand with AutoDockTools</h3>
<ol type="i" class="roman">
<li>Follow these instructions if you are using AutoDockTools
<li>First, extract the ligand atoms from the PDB. As mentioned in Exercise 1, the ligand residue name for <b>indinavir</b> in the PDB file is <b>MK1</b> and the lines start with the keyword <b>HETATM</b> for heteroatoms. In a terminal type
<pre>
% grep "^HETATM.*MK1" 1HSG.pdb &gt; indinavir.pdb  (on windows:  findstr "^HETATM.*MK1" 1HSG.pdb > indinavir.pdb)
</pre>
<li>Load the ligand structure into ADT using <b>File &gt; Read Molecule</b> and select <tt>indinavir.pdb</tt>
<li>Again, colour by atom type. <b>Color &gt; By Atom Type</b>. Click <b>All Geometries</b> and then <b>OK</b>.
<li>Now we have to add polar hydrogen atoms. Add all hydrogen atoms initially. Non polar hydrogens will be merged in the next step. <b>Edit &gt; Hydrogens &gt; Add</b>. Select <b>All Hydrogens</b> and click <b>OK</b>.
<li>Define this as the <u>ligand</u> in ADT so that ADT assigns partial charges and sets rotatable ligand bonds using <b>Ligand &gt; Input &gt; Choose...</b>. Select <tt>indinavir</tt> and click <b>Select Molecule for AutoDock4</b>. You should see a message that confirms that non-polar hydrogens have been merged, charges added and rotatable bond detected. Click <b>OK</b>. The ligand will now have only polar hydrogens.
<li>To check the rotatable bonds detected by ADT, go to <b>Ligand &gt; Torsion Tree &gt; Choose Torsions....</b> You should see 14 rotatable bonds.
<li><div id="question">Q: What does the three colours - green, magenta and red  - mean? Does it make sense?</div>
<li>Click <b>Done</b> when you are done.
<li>Save the ligand file in PDBQT format. Do this using <b>Ligand &gt; Output &gt; Save as PDBQT...</b>. Click <b>Save</b> to save the file as <tt>indinavir.pdbqt</tt>.
<li>Quit ADT using <b>File &gt; Exit</b> and then <b>OK</b>.
</ol>

<h3>Prepare a docking configuration file</h3>

<ol type="i" class="roman">
<li>Before we can perform the actual docking, we need to create an input file that defines the protein, ligand and the search parameters. We will create the input file in a text editor.
<li>If you have used Unix/Linux before, open your favourite text editor - <tt>vi</tt>, <tt>emacs</tt>, <tt>nano</tt> - or use a GUI based editor called <tt>gedit</tt>.
<pre>
% gedit &amp; (on windows, could use notepad)
</pre>
<li>The input file should look something like:
<pre>
receptor = 1HSG_protein.pdbqt
ligand = indinavir.pdbqt

num_modes = 50

out = all.pdbqt

center_x = XX
center_y = XX
center_z = XX

size_x = XX
size_y = XX
size_z = XX

seed = 2009
</pre>
This defines your protein (<tt>receptor</tt>), ligand, number of docking modes to generate (<tt>num_modes</tt>). All the docked modes will be collated in a file defined by <tt>out</tt> (<tt>all.pdbqt</tt>). You should replace the <b>XX</b> with the center of your 3D search space (<tt>center_x/y/z</tt>) and the size of the box (<tt>size_x/y/z</tt>) that you defined in the "Prepare the protein" section.
<li>Save the file as <tt>config.txt</tt> in the folder containing the protein and ligand <tt>.pdbqt</tt> files.
</ol>
<br>

<h2><li><a name="exercise3" id="exercise3">Exercise 3: Docking indinavir into HIV-1 protease</a></li></h2>
<ol type="i" class="roman">
<li>For this practical, we will use a program called <b>Autodock Vina</b> <a href="#references">[4]</a> for docking. Autodock Vina is a fast docking algorithm that requires minimal user intervention. We will run it from a terminal.
<li>Make sure you are in the folder containing <tt>1HSG_protein.pdbqt</tt>, <tt>indinavir.pdbqt</tt> and <tt>config.txt</tt>
<li>Run <tt>vina</tt> to perform the docking. We will keep a log of all the program output in a file <tt>log.txt</tt>
<pre>
% vina --config config.txt --log log.txt
</pre>
This will take a few minutes depending on how fast your computer is. While you wait, if you are interested, read  <a href="#references">[1]</a> for a review on HIV-1 protease structure, function and drug discovery.
<li>Once the run is complete, you should have a two files <tt>all.pdbqt</tt>, which contains all the docked modes, and <tt>log.txt</tt>, which contains a table of calculated affinities based on AutoDock Vina's scoring function <a href="#references">[4]</a>. The best docked mode, according to AutoDock Vina, is the first entry in <tt>all.pdbqt</tt>.
<li>Lets visualise the docks and compare it to the crystal conformation of the ligand. Load the protein, the extracted ligand and the docks into <tt>pymol</tt>.
<pre>
% pymol 1HSG_protein.pdb indinavir.pdb all.pdbqt &amp;
</pre>
<li>Begin by hiding everything. Now, display the <b>protein</b> in <b>cartoon</b> representation, <b>indinavir</b> in <b>stick</b> representation and the <b>docked conformations</b> as <b>sticks</b> as well.
<li>Cycle through the docks by clicking the playback control buttons on the lower right hand corner of the window.
<li><div id="question">Q: Qualitatively, how good are the docks?</div>
<li><div id="question">Q: Is the crystal binding mode reproduced? If not or if there is a difference, what could be the reason? Is it the best  conformation according to AutoDock Vina?</div>
<li>Close PyMOL.
 </ol>

<h2><li><a name="exercise4" id="exercise4">Exercise 4: Docking nelfinavir into HIV-1 protease</a></li></h2>
<ol type="i" class="roman">
<li>In the previous exercise, AutoDock Vina should have found a near native docked conformation for the drug <b>indinavir</b>. In this exercise, we will try to dock another drug <a href="http://en.wikipedia.org/wiki/Nelfinavir" target="new"><b>nelfinavir</b></a> into the protein structure 1HSG broadly along the lines of how one would dock a small molecule into a target structure. This approach could be refined and extended to the "<a href="http://en.wikipedia.org/wiki/Virtual_screening" target="new">virtual screening</a>" approach.
<table align="center" border="0">
<tr><td align="center">
<img src="images/nelfinavir.png" alt="Indinavir" width="250"><br>
Nelfinavir (Source: http://en.wikipedia.org/wiki/Nelfinavir)
</td></tr>
</table>
<br>
<li>Start by creating a new directory for this docking run. Your current directory should now be: <tt>~/Desktop/dock-prac/1HSG</tt>.
<pre>
% cd ~/Desktop/dock-prac
% mkdir 1HSG_nelfinavir
% cd 1HSG_nelfinavir
</pre>
<li>Since we will be using the same protein structure to dock <tt>nelfinavir</tt>, copy all the protein related files that you created in "Prepare the protein" section.
<pre>
% cp ~/Desktop/dock-prac/1HSG/1HSG* .
</pre>
<li>We need a structure of <tt>nelfinavir</tt>. There should be a copy in <tt>~/Desktop/dock-prac/backup/nelfinavir.pdb</tt>. Copy it to the current directory
<pre>
% cp ~/Desktop/dock-prac/backup/data/nelfinavir.pdb .
</pre>
<li>Prep the ligand file and generate the PDBQT file - <tt>nelfinavir.pdbqt</tt> - using the steps use in the "<b>Prepare the ligand</b>" section.
<li>Since the binding site is the same, copy the docking config file - <tt>config.txt</tt> - from the previous exercise to the current directory.
<pre>
% cp ~/Desktop/dock-prac/1HSG/config.txt .
</pre>
<li>You have to change one entry in the configuration file. Make the change.
<li>Now you should have everything required by AutoDock Vina to generate a series of docking modes.
<pre>
% vina --config config.txt --log log.txt
</pre>
This will take a few minutes to run...
<li>Load the protein and the docked structures and show the protein as cartoon and the docks as sticks?
<pre>
% pymol 1HSG_protein.pdb all.pdbqt &
</pre>
<li><div id="question">Q: Qualitatively, how good are the docks?</div>
<li><div id="question">Q: Is it docked in the binding site?</div>
<li><div id="question">Q: Is the best docked conformation of <tt>nelfinavir</tt> similar to <tt>indinavir</tt>?</div> <li><div id="question">Q: Based on the literature, what is common among this class of HIV-1 protease inhibitors?</div>
<li><div id="question">Q: How would you check whether the conformation makes sense?</div>
<li><div id="question">Q: How would you look at it from a "virtual screening" perspective?</div>
<li>Fortunately, in this instance, there are crystal structures of <tt>nelfinavir</tt> bound to HIV-1 protease inhibitor that you can use to validate your docking results.
<li>Use the Protein Data Bank and download the PDB file for <b>1OHR</b>. Save it in the current directory - <tt>~/Desktop/dock-prac/1HSG_nelfinavir</tt>. (If the internet is down, there should be a copy in <tt>~/Desktop/dock-prac/backup/data/1OHR.pdb</tt>).
<li>Load this structure into PyMOL using <b>File &gt; Open...</b>. Select <b>1OHR.pdb</b> and click <b>OK</b>.
<li>Hide all representations of <b>1OHR</b> using <b>H</b> and then <b>everything</b>. Now, display <b>1OHR</b> as <b>cartoon</b> and <b>colour</b> by <b>chain</b>.
<li>You may have observed that the two HIV-1 protease structures are in different orientations in space. To make sense of the docking results, you have to align the two structures. You can use the PyMOL <tt>align</tt> command to do this. In the PyMOL command line, type:
<pre>
PyMOL&gt; align 1OHR, 1HSG_protein
</pre>
<b>1OHR</b> should now be overlaid on <b>1HSG_protein</b>
<br><i>Note:</i> A copy of the aligned structure can be found in <tt>~/Desktop/dock-prac/backup/data/1OHR_aligned.pdb</tt> if the <tt>align</tt> command does not work.<br>

<li><div id="question">Q: What is the RMSD of the alignment? Is it qualitatively and quantitatively good?</div>
<li>You may have observed that moving the structure around the window is a bit difficult since the origin of the view has been altered when you loaded <tt>1OHR.pdb</tt>. To reset it, try:
<pre>
PyMOL&gt; reset
</pre>
<li><b>nelfinavir</b> has the residue name <b>1UN</b> in <b>1OHR</b>. Select and display it as sticks.
<pre>
PyMOL&gt; select nelfinavir, 1OHR and resn 1UN
</pre>
<b>C</b>olour it <b>by element</b> and pick any colour combination.
<li><div id="question">Q: Is the "best mode" generated by AutoDock Vina similar to the crystal conformation of <tt>nelfinavir</tt>? If not, why not?</div>
<li><span id="question">Q: How about the "second best mode"? Why do you think this is the case?</span> (Hint: Look at the affinities reported in the log file)
<li><div id="question">Q: In this instance we assumed that chain A of 1HSG corresponds to chain A of 1OHR. Is this true always?</div>
<li>Let us align chain A of 1OHR to chain B of 1HSG.
<pre>
PyMOL&gt; align 1OHR and chain A, 1HSG_protein and chain B
</pre>
<li><div id="question">Q: Compare the first two AutoDock Vina docks with the crystal conformation. What does this tell you? What could be a reason for this observation?</div>
<li>Close PyMOL.
</ol>
<br>

<!--
*** This sort of works - puts the peptide in the right place at least but not convincingly correctly possibly because of peptide construction - might not get right charges?. Perhaps at some point I should refine this...

<h2><li><a name="exercise5" id="exercise5">Exercise 5: Docking a design dipeptide into HIV-1 protease</a></li></h2>
The HIV-1 protease cleaves the peptide bond of newly formed polypeptide chains at specific recognition sites. Many of the HIV-1 inhibitors and synthetic peptide blockers make use of this property to bind and inhibit the protease's catalytic action. Now, let us do something quite speculative. Let us try to create a dipeptide - <b>Tyr-Pro</b> - and dock it into the protease structure to see whether there is anything common to the dipeptide and the inhibitor poses.
<ol type="i" class="roman">
<li>Create a new directory and copy the protein files and also the ligand file for comparison.
<pre>
% cd ~/Desktop/dock-prac
% mkdir 1HSG_dipeptide
% cd 1HSG_dipeptide
% cp ../1HSG/1HSG* .
% cp ../1HSG/indinavir.pdbqt .
% cp ../1HSG/config.txt .
</pre>
<li>Open PyMOL and create the dipeptide structure.
<pre>
% pymol
</pre>
Go to <b>Build &gt; Residue</b>. Click <b>Tyrosine</b>. Repeat and reselect <b>Proline</b>. <li>Save this structure using <b>File &gt; Save</b>, select <b>pkmol</b> and click <b>OK</b>. Save it as <tt>YP.pdb</tt>
<li>Quit PyMOL.
<li>Launch ADT and load <tt>YP.pdb</tt> using <b>File &gt; Read Molecule</b>.
<li>Follow the steps in Exercise 4 to add hydrogens, define it as a ligand and save the PDBQT file as <tt>yp.pdbqt</tt>
<li>Modify <tt>config.txt</tt> and run AutoDock Vina.
<pre>
% vina --config config.txt --log log.txt
</pre>
This will take a few minutes to run...
<li>Once done, launch load the protein, ligand and the dipeptide docks into PyMOL
<pre>
% pymol 1HSG_protein.pdb indinavir.pdb all.pdbqt
</pre>
<li><div id="question">Do you see any similarities between <tt>indinavir</tt> and the docked dipeptide structures?</div>
</ol>
<br>

http://www.jbc.org/content/263/34/17905.abstract

-->

<h2><li><a name="exercise5" id="exercise5">Exercise 5: Generating the electrostatic surface of the protein</a></li></h2>
By now you might have realised that electrostatics play a very important role in docking molecules. The purpose of this exercise is to generate and view the surface of the HIV-1 protease structure and to see how a ligand interacts with this.
<ol type="i" class="roman">
<li>As we found out earlier, PDB structures do not contain partial charges for the atoms in the structure. Partial charges are crucial for electrostatic calculations. Hence, we need to add that to a PDB file.
<li>Create a new directory for this exercise and copy the protein structure - <tt>1HSG_protein.pdb</tt> into this folder.
<pre>
% mkdir ~/Desktop/dock-prac/1HSG_APBS
% cd ~/Desktop/dock-prac/1HSG_APBS
% cp ~/Desktop/dock-prac/1HSG/1HSG_protein.pdb .    (on windows:  copy ..\1HSG_protein.pdb .)
% cp ~/Desktop/dock-prac/1HSG/indinavir.pdbqt .     (on windows:  copy ..\indinavir.pdbqt .)
</pre>
<!--
<li>Assign partial charges and atomic radii using a program called <b>PDB2PQR</b>. We will use the  <b>AMBER</b> forcefield to assign partial charges and atomic radii.
<pre>
% pdb2pqr --ff=amber 1HSG_protein.pdb 1HSG_protein.pqr
</pre>
You should now have a file call <tt>1HSG_protein.pqr</tt>. Open it in a text editor and verify that the last two columns have partial charge and radius for each atom.
-->
<li>PyMOL has a nice plugin for computing and viewing electrostatic surfaces generated using a program called <b>Adaptive Poisson-Boltzmann Solver</b> (APBS) <a href="#references">[5]</a>.  The exact implementation of this plugin will depend a little on the PyMOl installation you have - here we assume you have the one from schrodinger.

<li>Open <tt>1HSG_protein.pdb</tt> in PyMOL.
<pre>
% pymol 1HSG_protein.pdb &
</pre>
<li>Launch the PyMOL APBS plugin using <b>Plugin &gt; APBS Electrostatics</b>.
<li>On the tab titled <b>Main</b> you should see a button at the bottom that says <b>Run</b>.  Hit that and wait a couple of seconds for it generate a surface with the electrostatics contoured onto it.  After it finishes, it will ask if you want to keep the dialogue - say no and it will close.

<li>Load the ligand <tt>indinavir</tt> into PyMOL and display it in stick representation.  You should see it sticking in a nice clearly defined pocket that is coloured red.
<pre>
PyMOL&gt; load indinavir.pdbqt
</pre>

<li><div id="question">Q: Show/Hide the electrostatic surface and compare with the exposed protein atoms. Does it match up?</div>
<li><div id="question">Q: Does anything on the protein surface (in the binding site) stand out?</div>
<li><div id="question">Q: Why is the base of the binding site largely negative? How is it related to catalysis in this case?</div>
<li><div id="question">Q: Does the protein surface charge (or lack thereof) correspond to complementary regions in the ligand?</div>
</ol>

<h2><li><a name="discussion" id="discussion">Concluding remarks</a></li></h2>
In this practical, we have looked at how small molecules could be docked into a protein. This approach is widely used to detect binding sites and also to screen a library of small molecules to find potential drugs that could bind to a known binding site. Scoring functions play an important role in identifying a "good dock" and as such remains an area of active research. Several other considerations are worth noting including water molecules and ions in the binding site, flexibility of binding site residues, etc. Some docking programs (including AutoDock Vina) allow you to define a subset of flexible sidechains. Whilst this permits binding site rearrangement to accommodate distinct ligands, the computational search space increases many fold. Defining water molecules that are important in the binding site also remains an area of active research. In this practical, we have considered only protein-ligand docking. Protein-protein docking is also widely used, but not considered here due to the significantly higher search space that has to be considered. As the number of published and unpublished protein structures continue to grow, <i>in silico</i> docking will continue to play an important role in the drug discovery process.


<h2><li><a name="references" id="references">References</a></li></h2>
<ol>
<li>Brik A, Wong CH. "<a href="http://dx.doi.org/10.1039/b208248a" target="new">HIV-1 protease: mechanism and drug discovery</a>". Org. Biomol. Chem. (2003) 1:5–14.
<li>Wensing AM, van Maarseveen NM, Nijhuis M. "<a href="http://dx.doi.org/10.1016/j.antiviral.2009.10.003" target="new">Fifteen years of HIV Protease Inhibitors: raising the barrier to resistance.</a>" Antiviral Res. (2009) Online in advance of print.
<li>Chen Z, Li Y, Chen E, Hall DL, Darke PL, Culberson C, Shafer JA, Kuo LC. "<a href="http://www.jbc.org/content/269/42/26344.long">Crystal structure at 1.9-A resolution of human immunodeficiency virus (HIV) II protease complexed with L-735,524, an orally bioavailable inhibitor of the HIV proteases.</a>" J. Biol. Chem. (1994) 269:26344-26348.
<li>Trott O, Olson AJ. "<a href="http://dx.doi.org/10.1002/jcc.21334" target="new">AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading</a>", J.  Comput. Chem. (2009)
<li>Baker NA, Sept D, Joseph S, Holst MJ, McCammon JA. "<a href="http://dx.doi.org/10.1073/pnas.181342398" target="new">Electrostatics of nanosystems: application to microtubules and the ribosome.</a>" Proc. Natl. Acad. Sci. (2001) 98:10037-10041.
</ol>

<h2><li><a name="appendixI" id="appendixI">Appendix I: Useful links/resources</a></li></h2>
<table>
<tbody><tr><td width="150">PyMOL</td><td><a href="http://www.pymolwiki.org/index.php/Main_Page" target="new">http://www.pymolwiki.org/index.php/Main_Page</a>  (might need to install with conda install -c samoturk pymol)</td>
</tr><tr><td>AutoDock Tools</td><td><a href="http://autodock.scripps.edu" target="new">http://autodock.scripps.edu</a></td>
</tr><tr><td>AutoDock Vina</td><td><a href="http://vina.scripps.edu" target="new">http://vina.scripps.edu</a></td>
</tr><tr><td>APBS</td><td><a href="http://www.poissonboltzmann.org/apbs" target="new">http://www.poissonboltzmann.org/apbs</a></td>
</tr></tbody></table>


<h2><li><a name="appendixII" id="appendixII">Appendix II: Basic Linux commands</a></li></h2>
<table>
<tbody><tr><td width="150"><tt>ls -lrt</tt></td><td>provides a "long" list of all files in the current directory in reverse order of time.</td>
</tr><tr><td><tt>cd dir</tt></td><td>         change directory to the directory 'dir'</td>
</tr><tr><td><tt>pwd</tt></td><td>            print the current working directory on the screen</td>
</tr><tr><td><tt>rm file</tt></td><td>                delete (remove) 'file'</td>
</tr><tr><td><tt>mv file newfile</tt></td><td> rename file to newfile</td>
</tr><tr><td><tt>cat file</tt></td><td>       print the contents of file to the screen</td>
</tr><tr><td><tt>more  file</tt></td><td>      print the contents of file to the screen but with more navigation possible.</td>
</tr></tbody></table>
</body>
</html>