Computational Chemistry

Category: Tutorials

Obtention of the Reaction Mechanism Along a Complex Reaction Coordinate

Potential of Mean Force (PMF)

Adaptive Finite Temperature String Method in Collective Variables.

In this tutorial, we will explore an efficient and practical approach to acquiring the free energy profile along a reaction coordinate for complex chemical reactions, eliminating the requirement for multidimensional free energy surfaces. By simply providing a guessed path for the chemical reaction, the string method will effectively identify the closest minimal free energy path within the specified degrees of freedom in chemical space.  Also, you don’t need to have identified the exact position of the reactants and product structures for this method, as the string method is able to find them.

Before we begin, I highly recommend checking out the tutorial written by Kirill Zinovjev in his GitHub repository. This tutorial is specifically designed to replicate the reaction mechanism proposed in the original paper of this methodology. The system under consideration is small enough to be tested within a time frame of less than half an hour on a single cluster node. Furthermore, in that tutorial, you will gain insights into how the method operates when employing a combination of bonds and angles as collective variables (CVs). As for our particular case, we will solely be utilizing distances as our CVs.

0. Introduction

Roughly speaking, a reaction mechanism provides a logical and detailed description of the steps involved in transforming a given set of reactants into another set known as products. This transformation is associated with an energy landscape, and the reaction predominantly takes place along the minimal free energy path (MFEP) that connects the regions of reactants and products. These regions are separated by their corresponding transition state.

Exploring the free energy landscape and accurately describing a chemical reaction becomes more challenging as the number of involved coordinates increases. Initially, when only a single coordinate, such as forming or breaking a bond, is considered, the task is relatively simple from a computational standpoint. However, during the course of a chemical reaction, multiple coordinates can serve as Collective Variables (CVs) to describe the process. These can include bond lengths, bond orders, bond angles, dihedrals, or changes in atom hybridizations.

The complexity arises when a larger number of coordinates are needed to fully explore the free energy landscape and provide an accurate description of the chemical reaction. In such cases, bidimensional or even three-dimensional free energy surfaces become necessary. In real systems, the number of required coordinates can reach unmanageable levels, resulting in significant computational demands. This challenge stems from constructing a multidimensional free energy surface to represent the reaction, with computational costs growing exponentially alongside the number of involved coordinates. This phenomenon is commonly referred to as the curse of dimensionality.

The problem becomes more tractable when we shift our focus from the entire free energy surface to the region of interest known as the minimal free energy path (MFEP). With the string method, we can incorporate any number of Collective Variables (CVs) to obtain the MFEP without sacrificing efficiency. However, the bottleneck in this approach is similar to other methodologies like QM/MM, which involves considerations of the QM region’s size and the chosen level of theory to describe it.

Accordingly to the Transition State Theory, the activation free energy (\({\Delta{G{^{\ddagger}}}}\))(that is the highest energy along the path) is related to the experimental Inhibition rate constant (\(K_r\)) by the following expression:

\(K_r=\frac{K_BT}{h}e^{(-\frac{\Delta{G{^{\ddagger}}}}{RT})}\)

Where:

\({K_r}\) has s-1 units,

\({K_B}\) is the Boltzmann constant (\({1.38×10^{-23}J\cdot K^{-1}}\)),

\({T}\) is the temperature in Kelvin,

\({h}\) is the Planck constant (\({6.63×10^{-34}J\cdot s}\)),

\(\Delta{G{^{\ddagger}}}\) is the activation free energy (\({J\cdot mol^{-1}} \)) and

\(R\) is the gases constant (\({8.314J\cdot mol^{-1} K^{-1}}\)).

 

If the \({\Delta{G{^{\ddagger}}}}\) along this path agrees with the measures in the lab (\(K_r\)), the reaction mechanism proposed is likely describing what is happening at atomic level.

1. Method

The reaction primarily occurs along a specific path in the multidimensional space of reaction coordinates, known as the “reaction tube.” This forms the foundation of the path-based approach method, which effectively addresses the challenge of describing the reaction mechanism using multiple variables.

Assuming that this process takes place within a narrow region encompassing most of the reactive trajectories connecting reactants and products, the progress of the reaction can be represented by a single path Collective Variable (CV) without losing essential information. This path-CV describes the system’s advancement along the minimal free energy path (MFEP) and serves as the reaction coordinate (RC) for studying the process.

It’s important to note that a string calculation involves two stages. The first stage entails the evolution of the string to locate the MFEP. The second stage involves defining the path-CV, often denoted as ‘s’. Once this path is established, an Umbrella Sampling calculation is conducted along this coordinate to obtain the Free Energy Profile of the reaction (also known as the Potential of Mean Force or PMF). To move from the first to the second stage requires stopping the evolution of the string once it has found the MFEP, and this will be explained in further detail later.

I encourage you to read the publication on this methodology to gain a broader perspective of its implementation https://pubs.acs.org/doi/10.1021/acs.jpca.7b10842

Adaptive string method implementation in sander (AmberTools)

To run the simulations in parallel, the method was implemented in multisander. Therefore, the necessary changes must be made in the source code. Once these changes have been completed, reconfigure and reinstall.

The complete procedure for the implementation can be found in the afore mentioned GitHub repository of Kirill Zinovjev.

Now with the string implemented in your Amber installation it is time to get the MFEP for our chemical reaction.

2. Chemical reaction

Since this tutorial is aimed at intermediate-advanced Amber users, I will skip the system preparation process and the protocol for running classical molecular dynamics simulations. However, if you are not familiar with these basics, you can first learn them from Amber introductory tutorial.

In this tutorial you will find the way to obtain the minimal free energy pathway for the mechanism of inhibition of the SARS-CoV-2 main protease by the inhibitor PF-00835231 (see doi.org/10.1002/anie.202110027).This reaction is a perfect example to show how the string method can deal with a complex collective variable. Here the reaction coordinate is obtained as a combination of 7 CVs (see Fig. 1).

Fig. 1 Set of Collective variables used to describe the reaction coordinate

This proposal begins by exploring the formation of the Cys/His ion-pair intermediate, with the proton transfer process described by CV7 and CV6. Once the ion-pair structure is established, the cysteine-sulfur undergoes a nucleophilic attack on the inhibitor’s carbonyl carbon (CV1). This sulfur attack is concerted with the double proton transfer: from the His to the hydroxyl group (CV6 and CV5), and from the hydroxyl group (CV4) to the oxygen in the carbonyl group (CV3).

Next, we provide a brief overview of each file used to run the string method in Amber (please note that this tutorial assumes the use of a customized Amber installation, as described in Section 1). As you progress through this tutorial, each of these files will be explained in detail.

3. Files

      • CVs : contains the information about the atoms involved in every collective variable that form the set of CVs.
      • guess : contains the initial path that roughly describes the reaction mechanism.
      • in : this is a template to be used as input for every string node.
      • in.sh : is a bash scripts to create the Amber’s string.groupfile and the input file for every node.
      • string.groupfile: the group file needed to run multiple independent simulations in Amber.
      • STRING : this file enables the usage of the string method in the Amber calculation. It specifies the path to the results folder and the location of the guess file.
      • string.sh: this is an example to submit the simulations in a machine using the slurm workload manager.
      • STOP_STRING: this file is manually created once the string has found the MFEP. It contains the time interval (in simulation steps) during which the force constants and reaction coordinate “s” are averaged. These averaged values are then used to generate the final_parameters.dat file, which contains the necessary information for obtaining the PMF. (This file will be described in detail in the section 4)

To begin with, we require at least two structures—one for the reactants and one for the products—described with the same Hamiltonian. It is crucial that these structures are pre-equilibrated at the QM/MM level of theory before usage. While it is a general recommendation in molecular dynamic simulations, having a well-equilibrated system provides better control over the overall process. Ideally, a structure for each intermediate is also necessary. This facilitates the generation of structures for the string nodes during the preparation step.

All files are Amber files:

Then, we define the set of CVs that describe the reaction progress.  Here we have a CV set with 7 distances.

3.1. CVs

To understand this file, note that the first line specifies the number of collective variables describing the chemical reaction. Each collective variable is represented by a separate $COLVAR block. The COLVAR_type keyword indicates the type of variable, such as “BOND,” “ANGLE,” “DIHEDRAL,” or “PPLANE.” The atoms keyword includes the Amber indices of the atoms associated with the specific COLVAR_type. For distances, two indexes are used; for angles, three indexes; and for dihedrals and planes, four indexes are utilized. Additionally, comments after the indexes will help to identify the variable being described (optional). Finally, in some $COLVAR blocks, there may be an optional box keyword that constrains the collective variable within a specified range, typically to prevent the string from getting lost in diffusion processes.

7
$COLVAR
COLVAR_type = "BOND"
atoms = 2242, 4674      !S····C=O
box = 1.3, 4.0
$END

$COLVAR
COLVAR_type = "BOND"
atoms = 4674, 4675      !C=O
$END

$COLVAR
COLVAR_type = "BOND"
atoms = 4675,   4678    !C=O····HO
$END

$COLVAR
COLVAR_type = "BOND"
atoms = 4678,  4677     !H-O inh
$END

$COLVAR
COLVAR_type = "BOND"
atoms = 4677,   618     !HO····HEHis
$END

$COLVAR
COLVAR_type = "BOND"
atoms = 618, 617        !His.HE-NE.His
$END

$COLVAR
COLVAR_type = "BOND"
atoms = 2242, 618       !S····HEHiS
box = 1.0, 4.0
$END

Next we have to define roughly the reaction mechanism pathway.

3.2. guess

In this file we will propose our hypothesis for the reaction mechanism pathway:

5
        3.20    1.20    3.00    1.00    3.20    2.00    1.30
        2.80    1.25    2.85    1.00    2.60    1.50    1.30
        2.40    1.30    2.70    1.00    2.05    1.00    2.20
        2.00    1.35    1.40    1.21    1.40    1.00    3.00
        2.00    1.40    1.00    2.53    1.00    1.80    3.00

The first line indicates the number of steps used to describe the reaction, which is five in this case. Following that, each subsequent line represents the values of the collective variables (CVs) at each step. Each column corresponds to a specific CV, and each row represents a different step.

Typically, the values for the CVs in the reactants, products, or any stable intermediate steps are obtained by averaging their values from previous molecular dynamics simulations at those specific steps. While it is recommended, this approach is not mandatory. If these values are obtained in this manner, the ends of the string can be fixed, meaning their values remain constant throughout the simulation. This can be specified in the STRING file.

Fig. 2 Change in the collective variables from reactants to products

The first step in the reaction mechanism according to the initial guess shown in Figure 2 is the proton transfer from Cysteine to Histidine (CV7····CV6). The H-S bond (grey line) only starts to be broken when the H is close enough to be transferred to the \(N_\epsilon\) in the histidine (orange line).. We call this intermediate the Ion Pair (IP).

After the IP is fully formed, the proton in the hydroxymethyl group of the inhibitor begins to approach the oxygen in the carbonyl group (represented by the green and yellow lines corresponding to CV4 and CV3, respectively). Simultaneously, the recently transferred proton undergoes another transfer, this time from histidine to the oxygen atom (depicted by the orange and purple lines denoting CV6 and CV5). The nucleophilic attack of the cysteine sulfur on the carbonyl carbon occurs in coordination with these proton transfers (represented by the red line corresponding to CV1).

3.3. in

This is a standard input file for Amber. It does not contain any information specific to the string method. The string method is activated only when the STRING file is present in the working folder. The “ig = XXXXX” value will be replaced with a random value during the calculation, ensuring that the calculation can be replicated if needed.

QM/MM MD
&cntrl
  imin      = 0,        ! Minimization off
  ntf       = 1,        ! Bond interactions involving H-atoms omitted, use with NTC=1
  temp0     = 300.0,    ! Reference temperature at which the system is to be kept
  tempi     = 300.0,    ! Initial temperature
  ntt       = 3,        ! Switch for temperature scaling. Use Langevin dynamics with the collision frequency given by gamma_ln
  irest     = 0,        ! restart a simualtion
  ntx       = 1,        ! Reading the coordinates and velocity from the  rst file
  ntb       = 1,        ! Volume constant
  cut       = 8.0,     ! Non bonded cut
  gamma_ln  = 5.0,      ! Collision frequency
  nstlim    = 1000000,    ! Steps, for 10 ps
  dt        = 0.001,    ! Every step of 1 fs
  ntpr      = 50,      ! Print the progress every ntpr steps
  ntwx      = 50,      ! Every ntwx steps, the coordinates will be written to the mdcrd file.
  ntwr      = 50,      ! Every ntwr steps during dynamics, the restrt file will be written
  ntxo      = 1,        ! Format of the final coordinates,
  ifqnt     = 1,        ! Switch on QM/MM coupled potential
  ig        = XXXXX
/
 &qmmm 
 qmmask ='@609-620, 2239-2242, 4654-4658, 4674-4681', !index 608 to 619 2238 to 2241 4653 to 4657 4673 to 4680 
 qmcharge=0, 
 spin=1, 
 qm_theory='DFTB3', 
 writepdb = 1, !QM region will be written on the very first step to the file qmmm_region.pdb 
 qmcut=8
/

3.4. in.sh

This script generates the group file necessary for the calculation. It is in this file that the structures of the reactants and products will be specified.

In the calculation, the first half of the string nodes will use the structure of the reactants as their starting point, while the second half of the nodes will use the structure of the products. In this particular example, 48 string nodes will be utilized, with each node running on a separate CPU.

execute it by typing:

sh in.sh 48

in.sh file content:

#!/bin/bash

# If groupfile exists, remove it
if [ -f string.groupfile ];
then
  rm string.groupfile
fi

PARM=parameters.prmtop  # path to topology file
REACT=reactants.rst   # path to reactants structure
PROD=products.rst     # path to products structure

# Number of string nodes is provided as command line argument
nodes=$1
for i in $(seq 1 $nodes); do
  # generate sander input for node $i 
  sed "s/XXXXX/$RANDOM/g" in > $i.in

  # use reactants structure for the first half of the nodes and products for the rest
  if [ $i -le $((nodes/2)) ];
  then
    crd=$REACT
  else
    crd=$PROD
  fi

  # write out the sander arguments for node $i to the groupfile
  echo "-O -rem 0 -i $i.in -o $i.out -c $crd -r $i.rst -x $i.nc -inf $i.mdinfo -p $PARM" >> string.groupfile
done
echo "N REPLICAS  = $i"

 

After executing this script with the number of nodes specified as a parameter (in this case, 48), it will create 48 input files, one for each simulation. Additionally, it will generate the string.groupfile .

3.5. string.groupfile

Each line in this file will submit a simulation for each node.

-O -rem 0 -i 1.in -o 1.out -c reactants.rst -r 1.rst -x 1.nc -inf 1.mdinfo -p parameters.prmtop
-O -rem 0 -i 2.in -o 2.out -c reactants.rst -r 2.rst -x 2.nc -inf 2.mdinfo -p parameters.prmtop
.
.
.
-O -rem 0 -i 47.in -o 47.out -c products.rst -r 47.rst -x 47.nc -inf 47.mdinfo -p parameters.prmtop
-O -rem 0 -i 48.in -o 48.out -c products.rst -r 48.rst -x 48.nc -inf 48.mdinfo -p parameters.prmtop

3.6 STRING

This is the most crucial file. Without it Amber will run restrain-free molecular dynamic simulations and the string dynamics will not be performed.

$STRINGSETUP
dir = "results/"
remove_z_bias = .false.
fix_ends = .false.
guess_file = "guess"
preparation_steps = 2000            
CV_file= "CVs"
REX_period = 50

The “dir” directory is where the files containing the string information will be saved. If “remove_z_bias” is set to “.true.”, it will remove the orthogonal bias once the potential of mean force (PMF) calculation starts. By setting “fix_ends” to “.false.”, the final and initial nodes are allowed to move towards their minima. The “preparation_steps” keyword determines the number of steps used to increase the force constant value for each collective variable (CV) from the starting structure to the configuration given in the guess file. The “CV_file” contains the path to the file to the CVs file. The “REX_period” parameter specifies the frequency at which replica exchange will be attempted during the string or PMF phase.

Here are the default values for the mentioned keywords (note that these values are not mandatory and can be customized):

  • preparation_steps = The steps that will be used at the beginning of the calculation to increase the value of the force constant for every CV from the starting structure to the configuration in the guess file. Default = 1000
  • REX_period = The default value is 50, indicating that replica exchange will be attempted every 50 steps during the string or potential of mean force (PMF) phase.

Additionally, there are other optional keywords that can be included to modify the frequency at which the string writes certain outputs:

  • output_period = The default value is 100, specifying that information or files will be printed in the results folders every 100 steps.
  • buffer_size = the interval range used to compute the PMFs along the string and the frequency of integration along the PMF phase. Default = 2000

If the string have already converged and you want to submit only the PMF phase, you must include the following keyword in the STRING file:

  • only_PMF = perform the PMF stage only and not the string evolution

3.7 string.sh

The content of this file will depend on the architecture of your cluster machine (for example, in a SLURM environment, here it was named “string.sh“). Regardless of the specific details, the file should include the following essential information:

#!/bin/bash
export I_MPI_COMPATIBILITY=3

mkdir results

mpiexec sander.MPI -ng 48 -groupfile string.groupfile

At this point if everything goes well your string simulation must be running and if it had already finished the preparation step phase we can start analysis how it is searching for the MFEP of our chemical reaction.

4. STRING ANALYSIS

First things first. Check that the string is running, inside the 1.out you should find  something like this

---String method------------------
Number of CVs                    7
Number of nodes                 48
Node                             1
----------------------------------

Also, in the results folder the files 1.dat to 48.dat should have been created.

As long as nothing is written in the .dat files the calculation will be running the preparation steps. That means the force constants for every CV in every node are increasing to reach the optimal value before starting the search for the minimal free energy path. In this way the next analysis can be made only once the phase of the preparation steps has been finished, 2000 step in this example. Nevertheless in order to gain a real insight about how the string is evolving it would be better to wait at least until the first preliminary .PMF be written, 2000.PMF in this case.

Inside the results folder are stored all the files to be used to check the evolution of the string or the PMF obtained during the PMF phase.

At any moment of the simulation you can check the integrity of the structures by checking if the “vlimit exceeded” message appears in some .out file using the next command:

for i in {1..48};do echo ${i}; grep 'vlimit exceeded' ${i}.out | tail -n 1;done

If there is something wrong with a structure, check the .rst corresponding to the number printed above the vlimit exceeded message. If nothing is printed means that the structures are OK.

If everything is OK, the string is looking for the MFEP.  We can move now to the results folder to check the evolution of the string.

4.1 Convergence

To check if the string is still moving and observe its convergence you can plot the ‘convergence.dat’ file stored in the results folder using a tool like gnuplot using the following command:

plot 'convergence.dat' w l

This plot will provide visual insights into whether the string has found the minimal free energy path (MFEP). It is expected to display a pattern similar to the following example:

The convergence of the string to its minimal free energy path (MFEP) can be determined by observing the reaction coordinate over time. If there are no significant changes in the reaction coordinate, and it remains fluctuating near a specific value (e.g., 0.1), it indicates that the string has converged. In this specific case, this behavior has been observed for 3000 steps (equivalent to 3 picoseconds).

Once the string has converged, you can proceed to the potential of mean force (PMF) phase by creating the “STOP_STRING” file within the results folder. For instance, in this case, the “STOP_STRING” file will contain only two lines, with the numbers 3000 and 5000, representing the region where the convergence remains close to 0.1. If everything is in order, the required final files for running the PMF phase will be generated after the subsequent simulation state, including files such as “0_final.string,” “final_parameters.dat,” “1_final.dat,” and so on.

4.2 Visualization of the changes in the CVs during the convergence of the string

While the string is progressing towards its minimal free energy path (MFEP), we can observe the evolution of the collective variables (CVs) throughout the process. Every 100 steps, a “step_number.string” file will be generated. These files have similar characteristics to the guess file, where each column represents a CV, and each line corresponds to the value of that CV for each node of the string.

Here is an example illustrating the evolution of CV1. The initial guess for CV1 is depicted in bold purple. The 0.string file contains interpolated values for each string node, providing more accurate representations. The values of CV1 gradually change as the string converges from the initial guess to the final 5000.string file, which is illustrated by the bold blue/cyan line.

plot '0.string' u 1 w l lw 2 title 'initial guess CV1'
replot for [i=200:4800:200] ''.i.'.string' u 1 w l notitle
replot '5000.string' u 1 w l lw 2 title 'converged CV1'

Now the same can be made for every CV, or if you prefer you can plot all at once with the next gnuplot command.

plot for [i=3000:5000:500] for [j=1:7] ''.i.'.string' u j w l notitle

To avoid cluttering the plot, only the values of the collective variables (CVs) taken every 500 steps in the converged region are plotted. This selective plotting helps to maintain clarity and prevents overcrowding of the graph when all data points are plotted simultaneously.

4.3 Replica exchange and structure sorting

To ensure that the structures are sorted back to their initial positions as replica exchange is being performed, follow these steps:

  1. Create a folder to store the sorted structures.
  2. Inside that folder, execute the following Python script:
#!/usr/bin/env python3

import os

if os.path.exists('../results/plot_final.REX'):
    plot = 'plot_final.REX'
else:
    plot = 'plot.REX'

with open(f'../results/{plot}', 'r') as f:
    idx = f.readlines()[-1].split()[1:]

for i in range(len(idx)):
    print(f'{i+1} is now {idx[i]}')
    os.system('cp ../' + str(i+1) + '.rst ' + idx[i] + '.rst')

With the structures sorted, you can use VMD to examine how the reaction is occurring loading the visual.vmd file by typing  in the bash console:

vmd -e visual.vmd

Please take into account that the visual file needs to be placed in the same folder as the parameters.prmtop file. Within this folder, make sure to include a ‘sorted’ folder containing the restart files.

additionally if you have problems using the VMD visualisation file you can  follow these steps:

  1. Load the “parameters.prmtop” file in VMD.
  2. Open the VMD TkConsole.
  3. Type the following command in the TkConsole to add the structures corresponding to every string node:
for {set i 1} {$i < 49} {incr i} {
mol addfile path/to/sorted/structures/${i}.rst type rst7 waitfor -1
}

Replace “path/to/sorted/structures” with the path to the folder where you stored the sorted structures. This command will loop through each string node and add the corresponding structure to the VMD display.

Once you with your structures loaded, you will be able to visualize the structures corresponding to each string node in VMD, allowing you to observe the reaction’s progression.

This an example of what you will observe, your visualization will look less smooth.

To assess the performance of replica exchange, you can plot the data from the plot.REX file (or plot_final.REX during the PMF phase) using gnuplot or similar tools. Here’s an example command to plot the file:

plot for [i=2:49] 'plot.REX' u 1:i w l lw 2 notitle

Please note that white spots in the plot indicate areas where the sampling could be improved. If the structures are OK, these white spots will disappear as the simulation progresses and more sampling is performed.

Moving forward, you can examine the preliminary potential of mean force (PMF) files that have been written. These files contain valuable information about the energy landscape and the reaction pathway. By analyzing the PMF files, you can gain insights into the thermodynamics and kinetics of the chemical reaction.

4.4 Plotting the .PMF files.

During the evolution of the string, within the “results” folder, a “buffer_size.PMF” file will be generated every buffer_size steps. These files provide a rough approximation of the potential of mean force (PMF) and will continue to change as the simulation progresses. It is important to note that these files should be used cautiously and only to obtain a preliminary understanding of the thermodynamics and kinetics of the chemical reaction. For more accurate and reliable results, it is recommended to wait for the final PMF files generated at the end of the simulation.

Inside the results folder you can plot those preliminary files using gnuplot:

plot for [i=2000:4000:2000] ''.i.'.PMF' u 1:6 w l lw 3 title ''.i.' steps'

Preliminary PMFs

If your string has converged and you are in the final PMF stage, a path collective variable (path-CV) is defined using the MFEP that was found. At this point, you will begin to observe the “buffer_size_FINAL.PMF” files. These files contain information about the PMF obtained along the reaction coordinate. They provide valuable insights into the energy landscape and the thermodynamics of the chemical reaction. Analyzing these files will allow you to examine the energetics and barriers along the reaction pathway.

You can plot the “buffer_size_FINAL.PMF” files in the same way as you plotted the preliminary PMF files during the string phase.

It is crucial

to note that this tutorial has provided a basic understanding of the string method and its implementation. Further exploration, experimentation, and refinement are encouraged to adapt the approach to specific systems and research goals. The string method offers a powerful tool to unravel the reaction mechanisms, thermodynamics, and kinetics of complex chemical reactions, paving the way for advancements in various fields, including computational chemistry, drug discovery, and materials science.

Possible problems and solutions

P1: There are no minima in reactants and products region in the PMF

S1: Check if there is a CV trying to go to a further distance than the one in the ‘box’ defined in the CVs file. If so, increase the box range and resubmit the string using now as initial guess the values in the last XXXX.string files written. Also, you can use as initial structures the last structures obtained from the previous calculation. Remember, they have to be sorted previously in case you used replica exchange.

S2: Check that there are not a restraints interfering the evolution of CVs freely.

S3: Force the Collective Variables (CVs) to take higher or lower values than they should in the reactant and product regions, and fix the ends of the string.

Making movies in VMD

Introduction

This tutorial describes an easy way for making videos in VMD capturing the screen with Kazam on Linux, QuickTime Player on mac or Xbox Game Bar on windows.

As you have seen, I created a smooth transition between different points of view of the system. I then played the animation of the reaction mechanism until the formation of the Transition State (TS) structure, where another green molecule appears. After that, there’s a transition between different points of view until returning to the point where this transition began. Once there, the green molecule disappears, and the reaction continues. When the reaction finishes, a final transition is played to return to the initial point of view.

To make this video I used the view_change_render.tcl  script to create the transitions between selected points of view.

In order to replicate this video I am sharing the files I used:

Tutorial

Loading Files

If you have your own files you can skip this part.

  • Open VMD from the file-containing folder using the command line in the bash console. If you are using Windows, once in VMD, go to Extensions -> Tk Console and type the path where you downloaded the files (something like cd D:/Users/Carlos/Downloads/ ¯\_(ツ)_/¯ )

For Windows and M1 chip Mac users, the netcdf file will not work. Decompress the trajectory.rar file so that every *.rst file is in the same folder as visual_11a_scan.vmd.

  • Load the files using the visual_11a_scan.vmd file. (File -> Load Visualisation state .. visual_11a_scan.vmd)
    • This is not necessary for creating transitions, points of view, or even the video. This file simply loads the structures into VMD and then displays the atom selection as shown in the video.
  • For Linux or Intel Mac users, I recommend using the netcdf file (though you can use the rst files if you prefer). Load it in the VMD main console. Right-click on the molecule with ID=0 (11aIP.prmtop), then select “Load data into molecule”, browse and load the reaction_11a_3clpro.nc.

Creating Visualisation Points of View

First in the Tk console load the view_change_render.tcl  by typing:

source view_change_render.tcl

Save the viewpoints between which you want to create transitions using the command: save_vp 1, save_vp 2, save_vp 3, save_vp 4, save_vp 5, and so on.

This is entirely up to you. These are the viewpoints I selected for making this video.

  • save_vp 1

  • save_vp 2

  • save_vp 3

Once the viewpoints are created/saved, it’s time to create the entire animation. In the Tk console, type:

source movie.tcl.

This command will run the script that animates the transitions and plays the reaction.

Now That everything is ready, we can save this animation as an mp4 file.

Saving the Animation as mp4

  • Linux

For me, the most convenient way to do this is using Kazam. It can be downloaded from the software store, at least in Ubuntu.

    • Open Kazam.
    • Select screen cast and the recording method. The “Area” option works best for me.
    • Press capture.
    • Source the the movie.tcl file in the TK console of VMD  when it starts recording.
    • To stop recording, go to the top-right corner of the screen, click on the video camera icon, and select “Finish Recording.”
  • MacOS

I prefer using QuickTime Player for this, as the recorded videos can be easily shared among Apple users without encountering codec problems, which sometimes occur with other software.

    • Open QuickTime Player.
    • Right click on the icon that appeared in the dock, select “New Screen Recording.”
    • Select the area of the window you want to capture , and then click “Record.”
    • Source the the movie.tcl file in the TK console of VMD  when it starts recording.
    • To stop the recording, go to the top right corner of the screen and click the white circle with a black square inside.
    • The video will appear immediately; you can save it using the QuickTime Player menu (File/Save…) or trim it before saving (Edit/Trim).
  • Windows

There are many options on Windows, but the default option works well. This is accomplished using the shortcut “Windows key + alt + R” (Xbox Game Bar shortcut).

    • Source the the movie.tcl file in the TK console of VMD.
    • Press “Windows  key + alt + R” to start recording.
    • To stop recording, click on the blue circle with a white square inside.
    • The recordings will appear in the folder “Videos/Captures”.

movie.tcl Script explanation

Go to the first frame with:

animate goto 0

Then go to the first visualisation point created with:

retrieve_vp 1

Create a smooth transition from viewpoint 1 to vp 2 (150 represents the number of frames).

move_vp 1 2 150

Iterate using a for loop from reactants to the transition state structure.

for {set i 0} {$i < 32} {incr i} { 
animate goto $i 
display update 
}

Make another smooth transition, this time from vp 2 to vp 1.

move_vp 2 1 150

To make the green ligand appear, use:

mol on 1

Move smoothly from vp 1 to vp 3.

move_vp 1 3 150

Then from vp 3 to vp 2

move_vp 3 2 150

To make the green ligand disappear, use:

mol off 1

Iterate again using a for loop from the TS structure to the products.

for {set i 31} {$i < 64} {incr i} { 
animate goto $i 
display update 
}

and finally return to the initial point of view using:

move_vp 2 1

If you are interested in the topic of this video, I encourage you to read our research study. The aim of this study was not only to clarify the reaction mechanism between the inhibitor 11a identified as a potent inhibitor, but also to highlight how the transition state involved in this reaction resembles the crystallographic structure of one of the inhibitors developed by Pfizer (PF-00835231). It is currently undergoing human trials, and we hope it will soon become a drug that helps stop COVID-19.

I hope this tutorial has been helpful. I would appreciate any comments you have to improve this tutorial.

Amber Custom Residue Parameterization

L-tert-Leucine

This is a tutorial for intermediate level users. To comprehend it, you will need to know how to:

  • Build a system to run molecular dynamics (MD) with Amber.
  • Submit calculations using Gaussian.

This tutorial was designed as an alternative (or as a supplement) to the official Amber tutorials for parameterizing a custom residue. It summarizes information from three Amber tutorials:

I invested a considerable amount of time following Amber’s recipes to parameterize custom residues. I achieved my goal by combining parts of them to create my own protocol. I hope it also helps you. As I mentioned, this is a recipe; there is no technical or theoretical information included. That part is well explained in the Amber tutorials.

This is not the only or the best method, but it is one that works.

This tutorial can be viewed from a bird’s eye perspective by advanced users who are just looking for a few lines or commands to remember (or to tweak something in their regular workflow). Alternatively, beginners can follow along at a slower pace by reading the basics of what is being done. If additional information is needed, links to explanations will be included.

I) Residue Structure Selection

To calculate the partial atomic charges, we will need a 3D representation of our non-standard residue. For this, we can:

  1. Build it using, Gaussview, Maestro or any other software you are familiar with.
  2. Obtain the atomic coordinates from a PDB file or any other source that includes 3D information about the molecule.

In this case, the residue to be parameterized is L-tert-Leucine, which I refer to as LTL. However, the side chain can be anything you desire.

Keep in mind:

In a amino acid C=O is the tail and the NH is the head. (Fig. 1)

When naming your custom residues, use three or four capital letters, ensuring that the name hasn’t already been used in the names of Amber force field residues.

The residue is capped using NME and ACE groups, which has a purpose beyond maintaining the unit charge of the amino acid. When a CH3 is bonded to the pre-tail (C=O) or post-head (NH) atoms, Antechamber will recognize the C atom as an SP3 alpha-carbon and the Hs as alpha hydrogens. So, by using these capping groups, Antechamber will create the parameters for the backbone of our L-tert-Leucine (angle and dihedral values). This way, the residue will automatically bond to the previous and next residues in the chain. Please continue reading for further details.

Fig. 1 Capped L-tert-Leucine ACE-LTL-NME

II) Geometry Optimization

To acquire the parameters for our custom residue, its geometry needs to be optimized. This optimized geometry will be used to compute the electrostatic potential of the system. This procedure will be conducted in vacuo using B3LYP and a 6-31G* basis set.

B3LYP geometry optimization input

LTL_opt.com

Submit the Calculation as usual:

g16  LTL_opt.com

Examine the log file (LTL_opt.log) in GaussView, or your preferred software, to confirm that everything is in order..

III) Electrostatic potential Charges calculation (ESP-charges)

Now, with the system in its optimized geometry, a Hartree-Fock (HF) calculation must be performed to obtain the Electrostatic Potential.

The procedure used in the development of the Amber force field to acquire the partial atom charges for every residue will be employed here. In other words, the electrostatic potential will be obtained in the gas phase using a HF/6-31G* QM calculation because “Conveniently the error in the HF/6-31G* calculation is close to the difference between the charge distribution in gas phase and that in solution “

Remember, do not change the QM level of theory when deriving charges. Use the same QM level that was used to derive the charges in the Amber forcefield.

LTL_hf.com

 

Submit:

g16  LTL_hf.com

 

Output: LTL_hf.log

IV) Restrained Fitting of the Partial Charges to the Electrostatic Potentials (RESP Calculation) 

The electrostatic potential must be in a format that the resp program can understand. Run the following command to extract the electrostatic potential from the Gaussian output using espgen:

submit:

espgen -i LTL_hf.log -o esp.dat

Output: esp.dat

Now, we need to derive the partial charges for every atom in our residue, fitted to the ESP. Additionally, we must fix the values of the partial charges of the atoms in the ACE and NME residues to their corresponding values in the Amber force field. (Fig. 2)

Fig 2. ACE y NME partial charges in the amber forcefield

 

Once you have the esp.dat file, you will need two additional files to instruct the RESP program which atoms will maintain a given partial charge value, and for which ones the charge will be calculated.

The first is the resp.in file. This serves as the input for this calculation (the file here contains some comments that will help you to understand it better). The description of this file isn’t included in the Amber manual, but you can read about it here: : http://ambermd.org/tutorials/advanced/tutorial1/section1.htm

resp.in (with comments, delete them before using the file)

CAUTION: Pay attention to the spaces, and blank lines at the end of the resp.in file, as it is sensitive to these.

The Second file is the resp.qin file. This is where the charges of the NME and ACE are provided to the RESP calculation.

resp.qin

This file has the following characteristics: each value is a partial charge, each line contains 8 entries (first line corresponds to atoms 1 to 8, the second line to atoms 9 to 16, and so on), with 10 characters for each entry. A new value is read subsequently. Each value here corresponds to an atom in the resp.in file. Thus, if a value here is different from 0.0000000, a -1 should be written in the second column for that atom in the resp.in file. Comparing the resp.in and the resp.qin files can enhance your understanding.

submit:

resp -O -i resp.in -o resp.out -p resp.pch -t resp.chg\
-q resp.qin -e esp.dat

Output: resp.out, resp.pch, resp.chg

If you encounter the “At line 403 of file resp.F (unit = 5, file = ‘resp.in’)” error, you may need to update your AmberTools. This issue was resolved in Amber 18 but not in Amber 16. Another possible error is related to the end-of-file during read. In such a case, include a blank line at the end of the resp.in file.

Another common error eported in the comments of this tutorial is:

At line 386 of file resp.F (unit = 5, file = ‘resp.in’)
Fortran runtime error: Bad value during floating point read

As Adrian Garcia pointed out in his response, the “number of the molecule” in the resp.in file is read as a floating point, while it’s written as an integer in the file. For him, changing 1 to 1.0000 solved the problem. Thanks for sharing your solutions to this issue.

In the resp.chg file are now the partial charges for each atom in the custom residue.

VI) .ac File Generation or Starting to Speak Amber Language

Here, we’ll associate the calculated partial charges with each atom in the previously optimized structure. We’ll also assign a name to the residue using Amber’s nomenclature. This is all accomplished in a single file, LTL.ac.

antechamber -fi gout -i LTL_hf.log -bk LTL -fo ac\ 
-o LTL.ac -c rc -cf resp.chg -at amber

 

output : LTL.ac

VII) Creating the Residue for Amber Recognition

Now, we have to remove the capping groups from the custom residue and turn it into a new residue using prepgen. The main chain file contains the information needed for this. LTL.mc

The names used in the main chain file (.mc) are the same as those in the .ac file.

prepgen -i LTL.ac -o LTL.prepin -m LTL.mc -rn LTL

The final step is renaming the atom names in the .prepin file to match the atom names in the PDB file, if for some reason they differ.

output: LTL.prepin

In the 4-Hydroxyl-Proline (PR4) tutorial, the final step simply involves loading the .prepin file to obtain a system ready for simulation. However, if you’ve modified the backbone or are attempting to model a highly modified N or C terminal, prepgen may not generate the appropriate atom types for these unique atoms. Thus, you’ll need to generate a file with the atom types in the GAFF AND another with the atom types in the ff14SB. The next section demonstrates how to accomplish this when necessary. For a “standard custom residue”, we’re done.

VII.2) Generating the Parameters Files for Special Cases 

By “special cases”, I refer to residues that possess a bond, angle, dihedral, or functional group not present in the protein Amber force field. As such, these parameters must be obtained using the General Amber Force Field (GAFF). To obtain them, execute:

parmchk2 -i LTL.ac -f ac -o LTL_gaff.frcmod

This command yields the file: LTL_gaff.frcmod, which needs to be included in the next step before loading the “LTL_ff14SB.frcmod” parameters. This ensures that the ff14SB parameters take precedence over GAFF whenever applicable.. 

You’ll know if you need to perform this step if, after completing the following step, you see lines with the message “ATTN: needs revision”:

parmchk2 -i LTL.ac -f ac -o LTL_ff14SB.frcmod -a Y -p $AMBERHOME/dat/leap/parm/parm10.dat

VIII) Finally, Putting It All Together

open tleap by typing: tleap

once inside, build a test system typing the following lines

loadamberprep LTL.prepin
source leaprc.protein.ff14SB
loadamberparams LTL_gaff.frcmod !if you obtained lines with the ATTN message in the ff14SB file
loadamberparams LTL_ff14SB.frcmod !after having deleted the lines with the ATTN message aa = sequence {ACE LTL NME} saveAmberParm aa aa.prmtop aa.rst7

If everything is OK, now the system is ready to run the simulation. If this isn’t the case, please leave a comment to improve this tutorial.

If you need help with a specific topic, let me know in the comments.