QUELO User Manual
Table of Contents
An Introduction to QSP Life
1.1 Understanding our software
The software you are about to use, QSP Life, represents a novel approach to carrying out calculations relevant to pharmaceutical discovery that integrates quantum mechanics (QM) to a degree never before possible. This software provides simple access to state-of-the-art QM tools that have been carefully optimized for use on expansive cloud resources and makes it possible to move beyond the limitations of traditional tools that use classical force fields. The QSP Life platform provides access to QSimulate’s QUELO, a state-of-the-art platform that makes possible–for the first time–Free Energy Perturbation (FEP) calculations using a QM representation of the ligand and surrounding
binding site. If desired, QUELO can also be used to run FEP calculations with only a traditional classical force field, taking advantage of cloud resources so that anyone can run these calculations without investing in local computer resources.
Calculations are set up via an easy-to-use portal. The calculations are performed on the backend using the resources of a Cloud provider, that grants high throughput capabilities to anyone, no matter what your local computing situation might be. QSP Life is implemented using the Software as a Service (SaaS) paradigm. In this paradigm, you interact with the program through a standard browser, such as Google Chrome, and all calculations are carried out externally. This paradigm has become increasingly popular over the past years for several reasons, including:
- There is no need to install software locally.
- The software is always up-to-date.
- You can take advantage of massive compute resources available through the Cloud without the need to build out specialized hardware locally.
- The tools can be run on any platform that supports a standard browser, including any laptop, a tablet, or even a phone.
- You do not need to be physically at work to run the software, you only need a web browser and an Internet connection.
- If you log off, or if your local computer crashes, is lost, or is taken down, your data and jobs are not affected.
- Open a standard browser.
- Navigate to the unique URL that has been created for your organization:
- https://(YOUR_ORGANIZATION).qsimulate.com
- Replace (YOUR_ORGANIZATION) with the shorthand name QSimulate has provided to you.
- Enter your login credentials (provided once you purchase the software).
- Upload your data, choose what you would like to calculate, and run.
- Once finished, the results from a run can be downloaded via the platform.
Logging in to QSP Life
QSP Life is run through a standard Internet Browser. This platform has been developed and tested using the Google Chrome browser and that is the browser we recommend. However, you should be able to run from any modern browser (Chrome/Edge/Firefox/Safari/etc.). If you encounter any unexpected display issues using a non-Chrome browser, it is recommended you try using Chrome to check if this resolves the issue.
You may run your browser on any hardware that supports a browser. This includes laptops, tablets, and even phones (although the limited size of a phone screen will render the experience less than ideal). Because all calculations are run on the cloud, the CPU and memory requirements for your local hardware are nominal. If your hardware is typically capable of running a browser without issue, it will work with our software.
To access the platform, visit the following URL:
https://(YOUR_ORGANIZATION).qsimulate.com
where (YOUR_ORGANIZATION) is replaced with the shorthand name that QSimulate will have provided to you
2.1 Account creation & logging in
When you visit the landing page described above, you will be presented with a dialog for logging in:
If this is your first usage of the platform, click on Sign up to create your account. You will be prompted to enter your information (full name, e-mail, password) as well as a token:
The token is a unique key code that has been provided to the primary contact individual at your organization. Only the person(s) with the token can create new accounts. Either the contact person will share the token with you, or else that person will create an account for you.
Note: the token will only allow the use of e-mail addresses from the associated institution. For example, if the token was provided to the institution whose domain is “AAA.com”, then accounts can only be set up that correspond to e-mail addresses of the form name@AAA.com. This token is only required the first time a new account (for a new e-mail) is created.
Tokens are only provided after an agreement is in place between your institution and QSimulate.
Once your account has been created, you can log in using the credentials you specified for the account when you created it: e-mail and password.
2.2 Forgotten password
If you have forgotten your password, you can recover it by clicking on “I forgot my password,” which will ask for your name and the e-mail address you specified when you set up the account. Your recovery credentials will be sent to that e-mail.
2.3 Troubleshooting
If, after reading this section, you continue to encounter problems creating your account or logging in, you can contact us at the QSimulate support portal:
https://qsimulate-ticket.atlassian.net/servicedesk/customer/portals
QSP Life User Interface
3.1 The Task Table
Once you log in, QSP Life will present you with the following interface. This is the Task Table, which lists all of the calculations you have set up through the platform. Each time you want to set up a new calculation, you start by adding a new Task (for which you supply a name), and that Task will appear in this table.
If this is your first time using the platform, this list will be blank, as shown below:
Once you have created one or more Tasks (calculations), they will be saved and will populate the table when you return to the Task Table page, which you do at any time by clicking on the word “Home” in the upper left part of the page:
3.1.1 Task Table Contents
Within the table, for each Task you have set up, the following information is provided:
- Name: The name you assigned to the Task when you created it. You can rename a task, if desired, using the “Rename” button to the top right.
- Status: The status of the Task.
- Staged: The FEP simulation options and input are still in the process of being defined, and the Task has not yet been submitted.
- Running: The FEP Task is currently running
- Stopped: The FEP Task was stopped by the user after it had been started, and before completion. Stopping and restarting of jobs is supported by buttons accessible through the page for the Task
- Complete: The FEP Task has been run and completed successfully.
- Failed: The FEP Task was run, but failed for some reason. Details on the reason for the failure can be found on the page for the Task.
- Updated: The time and date when the Task was last updated
- Created: The time and date when the Task was first created
All columns of the Task table can be sorted by clicking on the column header. Clicking on the Name and Status headers once will sort in ascending alphabetic order, and clicking on them again will reverse the sort. Clicking on the Updated and Created headers once will sort in the order Newest → Oldest. Clicking on them again will reverse the order of the sort.
3.1.2 Choosing a Task
Clicking on any of the Tasks in the list will take you to a page where you can either set up and run the job (if the Status
is Staged or Stopped) or else look at results and/or error issues (if the status is Running/Complete/Failed). The contents of the Task page itself will be described in the chapter on FEP.
3.1.3 Task Renaming
In front of each Task name is a check box. If you check the box in front of a single name, the Rename button at the top right of the table will become active, and allows you to rename that Task. Note that two tasks cannot have the exact same name.
3.1.4 Task Deleting
You can Delete one or more Tasks by selecting the check box(es) in front of the Tasks, and then selecting the Delete button at the upper right. The panel supports the multi-selection of boxes for the Delete operation. If you click one box, then Shift+Click a second box, all boxes between the first box and the second box will be simultaneously selected. When you click the Delete button, you will be presented with a confirmatory dialog to ensure you want to perform the Delete. Note that once Tasks are deleted, they cannot be recovered.
3.1.5 Task Cloning
You can “clone” one or more Tasks by selecting the check box(es) in front of the Tasks, and then selecting the Clone button at the upper right. The panel supports multi-selection of boxes for the Cloning Operation. When you clone a task, a copy of the task is created with the same name, pre-pended by the words “Copy of.” An example is shown below. A cloned Task copies over all the user input data and options. Once a cloned job is created, the user is free to modify the inputs of the job. For example, you could clone a calculation that was run using a “classical” force field, then change the option to use the “QM” force field. Or you could modify the input data set, or modify the mutation map.
Once you create a cloned Task, it is entirely independent from the Task it was cloned from. It can independently be run, renamed or deleted.
3.1.6 Creating a New Task
Below the Task Table, you will find the dialog that allows you to create a new Task. To add a new Task (calculation), type the name you want to give that task into the text entry region next to “Name” and then click the Add button. Task names must be unique. If you attempt to add a Task with the exact same name as a task that already exists, you will get the error “Task name already used.” When you add a new Task, the setup page for that Task will open immediately.
3.2 Navigation And Menu Buttons
Above the Task Table, you will find the Navigation display at the upper left (next to the QSP Life logo). And you will
find the Menu button at the upper-right.
3.2.1 Navigation
Navigation is displayed to the upper left of the panel, and starts with the name “Home.” If you are within a Task page, then the navigation will show as
Home > Task_Name
Clicking on the word Home from any page will take you back to the Task List view (the same view as when you logged in).
3.2.2 Documentation
Clicking on the Documentation button will bring you to an online version of the documentation for this platform.
3.2.3 Settings and Expert Mode
The settings button (shown above) opens a dialog where you can change various settings for your account: Display user name, password, and turn on Expert Mode:
- Email: In the settings menu, you can see the email address associated with your account, but this email address cannot be changed.
- New Name: You can change the display name associated with your account. This name does not affect your login.
- Password: You can change your password through this dialog. Enter your old password and new password. You must adhere to the password rules: 8-or-more characters, both lower and uppercase letters required, and your password must also include at least one number.
- Expert Mode: A key feature in this settings panel is the ability to enable “Expert Mode.” Expert Mode, which is turned off by default, will display a variety of additional options in the FEP setup panel. These options are not required for default behavior, but may be useful to advanced users. These options include the ability to modify the timestep, the amount of equilibration sampling, the extent of the periodic water box, the timestep, and the method used to derive classical force field charges. Note that if you turn on Expert Mode, it will revert to “off” once you log out of your current session.
3.2.4 Logout
If you click the logout button, you will be logged out of the platform and returned to the Login screen. (Note: A user will also be automatically logged out after a certain period of inactivity.)
Free Energy Perturbation (FEP) using QUELO
4.1 Overview
This panel provides the user with the ability to run relative free energy calculations, using the free energy perturbation (FEP) approach. Features of this panel include:
- Ability to run FEP using a classic molecular mechanics (MM) force field
- Ability to run FEP using a combined quantum mechanics (QM) /molecular mechanics (QM/MM) force field
- Both the bound ligand and interacting protein residues in the binding region can be treated using QM
- Up to several hundred atoms can be treated at the QM level without any sacrifices in sampling or the FEP
approach
- Fully automated: A simple-to-use interface replaces complicated setup and workflow for the user
From the user standpoint, performing FEP using a QM/MM force field is nearly identical to performing FEP using a traditional classical MM force field. The only difference is that in the QM/MM case, the user needs to define the region of the system that will be treated using QM. This is simply done using options provided through the interface.
The FEP calculation is fully automated to make these calculations simple to carry out. Among the Tasks that are automatically handled on the backend when a calculation workflow is started are:
- Parameterization for MM regions
- Partitioning between the QM and MM regions (for QM/MM simulations)
- Creation of a periodic solvation lattice
- Superposition of ligands to a bound reference ligand
- Creation of an optimized matrix of calculations to run to calculate relative free energies for the full input set
- Analysis and compilation of results
- Parallelized distribution of Tasks
Below, the QUELO interface is described in detail.
4.2 The FEP Task List
When you enter the platform, you will be presented with the FEP Task List, a list of calculations (Tasks) that you have previously set up and/or run, as well as a dialog to create a new Task. Clicking on a Task will bring you to the setup/results page for that Task.
For more details on the FEP Task List, see the chapter “QSP Life User Interface.”
4.3 Expert Mode
A number of options (described below) are only shown in Expert Mode. The options shown in (default) Standard Mode are sufficient for most users to run a reliable simulation. If you need access to Expert Mode, that is accomplished via a toggle in the User Settings panel.
For more details on enabling Expert Mode, see the chapter “QSP Life User Interface.”
4.4 File input specification
When you click on a Task with Staged status in the FEP Task table, you enter the setup dialogs for that Task. At the top of the setup, you will need to specify the input files that are used to run the calculation. These files provide structural information on the protein receptor, as well as a set of ligands bound to the receptor for which free energy calculations will be carried out.
Note that after you submit a calculation to run, in some cases you may decide you wish to add additional ligands to the
compute set. This is possible, and the process for this is described in its own section (Adding Additional Ligands to a
Computed Set) later in this document.
An FEP calculation will calculate the relative free energies of binding of a set of ligands to a common protein receptor. Limitations of a standard FEP implementation require that both that the ligands be broadly similar to one another, and that we calculate relative binding energies among those ligands rather than absolute binding free energies.
To run a calculation, the following components must be defined by the user, using the Browse/Upload dialogs in this section of the panel.
4.4.1 Errors, Warnings, and Deleting an Input Structure
After an input file is uploaded, the contents of that file will be summarized below the Upload button. The summary will include the name, the SMILES string (except for the protein receptor), and an Information field, which will be populated with an informative message when there is a warning or error related to the upload. In the case of an error/warning, in addition to a message in the Information field, the NAME and SMILES strings will be colored red (rather than black) so that it is readily apparent when there is an issue with the input. There is also a red “X” at the right of each summary line, which allows you to delete the uploaded molecule, if desired.
4.4.2 Protein Receptor
This is the common protein receptor to which all the ligands that will be scored are binding. It is assumed that all ligands bind to the same receptor site, and that a single input conformation of the receptor binding site is suitable for all ligands.
The input format of the receptor is “PDB.” Care should be taken to ensure that the input PDB file is suitable for the calculation. See the chapter “Preparing a PDB file for use with QUELO” for detailed information on how this file must be processed before using it with QUELO. QUELO is capable of performing some processing Tasks to prepare a PDB file for use, but other Tasks may need to be carried out by the user before importing the file.
Issues that will need to be addressed before import include
- Removal of symmetry replicates from the unit cell
- Addition of missing loops near the binding region
- Ensuring that the residue naming convention is consistent with that expected by QUELO
- Possible removal of unnecessary cofactor molecules
- Possible identification of structurally important waters and removal of other waters
A properly prepared input protein receptor file is critical to obtaining high-quality results, and the user should read the PDB preparation chapter carefully.
4.4.3 Reference Ligand
The user must ensure that at least one ligand is pre-oriented properly in the binding site. For example, the user may obtain this ligand from a crystal structure of the complex or from modeling. This ligand must be supplied with 3D coordinates (SDF or MOL2), and it will be used as the reference binding conformation for any other ligands that need to be reoriented (see below). The reference ligand must include all atoms, including hydrogens.
Whether you use an SDF or MOL2 format reference ligand input structure, the structure must include a full description of topology information. This requirement ensures that the program properly resolves any structural ambiguities. If you wish to use a bound ligand in a protein/ligand complex structure as your reference ligand, you will need to create a file that contains only the ligand structure and upload it here, separately, to use it as the reference ligand, and you will need to convert the format of the ligand to SDF or MOL2.
The user must specify a name for the reference ligand that is being uploaded in the appropriate box (to the left of the
Upload button). If this name is not supplied, the Upload button will not be activated.
4.4.4 All Ligands (excluding reference)
The user can specify any number of additional ligands for which FEP calculations will be performed. These ligands are provided as 3D structures in either SDF or MOL2 format.
You can supply more than one input file of input molecules. After one file is imported, clicking Browse and then Upload for additional files will append the molecules in the subsequent files to the bottom of the list.
For each molecule in the input, a Maximum Common Substructure (MCS) alignment with the reference ligand will be carried out to align the 3D structure with the reference ligand.
After the input file for this section is uploaded and parsed, the names and canonicalized SMILES representations for all the ligands will be shown in a table below the input dialog line. For each input structure, the name is taken from the input file (if provided) or is otherwise the LNNN name auto-generated by the platform. If you click on the red X at the end of any input ligand line, that ligand will be removed from the input (a confirmatory pop-up will appear when you click on the X to ensure you don’t inadvertently delete a structure). Additionally, you will find a red “X” in bold font at the top of the column of red X buttons. If you wish to delete all the added ligands in this section at once, click
on this bolder X.
Note that no two ligands can be identical. If identical ligands are found in the input, these will be annotated as errors in the Information field and shown in red.
You must remove (using the red X) all problematic ligands before you can proceed with the calculation.
Expert Mode Options for this section are highlighted in the sample figure by red boxes. These options are only presented when you have enabled Expert Mode. They are described below:
4.4.5 Generate all stereoisomers for additional ligands with undetermined chiral centers (Expert Mode)
- Ticbox checked: For input structures (other than the reference), if there are atoms with ambiguous stereochemistry, all possible stereoisomers will be generated.
- Ticbox not checked: If an input structure with ambiguous stereochemistry is encountered, it will result in an error, which will be reported to the user in the Information field for that ligand.
4.4.6 Use 3D Alignment (Expert Mode)
Default: ON Every additional ligand input needs to be aligned on the reference ligand before an FEP calculation can proceed. (An exception is when the user specifies that all ligands have been pre-aligned before input, see the options section). The default approach to alignment uses a topological approach to maximum common subsstructure (MCS). Better alignement can often be obtain using MCS metrics based on 3D structural information. If you enable this option, 3D structural information is used in the MCS alignment.
4.4.7 Use Star Map (Expert Mode)
Default: OFF By default, a mutation map that identifies closed mutation cycles will be generated. Such a map can be beneficial by both providing better estimates of calculation errors and by reducing certain errors. However, there is a significant computational cost to such a mutaiton map (with typical maps increasing the number of mutations carried out by a factor of two or more). Sometimes, for efficiency, speed, or cost, a user may wish to instead carry out a “Star Map” calculation, where all mutations are connected to common residue by a single mutation (edge). The total number of mutations carried in this case is N-1, where N is the number of ligands. The center of the star map will be the reference ligand, as specifed during input.
4.4.8 Max #Charge-Changing Mutations (Expert Mode)
Defines the maximum number of connections between ligand nodes in the mutation map that have differening formal charges. The default is a maximum of two connections. This value is described in more detail in the description of the mutation map below.
4.4.9 2D Ligand Viewer
A 2D representation of any ligand that has been imported can be viewed by clicking on the row for that ligand.
4.4.10 Rotate Bond (Expert Mode)
In the 2D ligand viewer panel (above), if the user is in Expert mode, it is possible to define a bond for rotation. The rotatable bond, if defined, is a bond for which both the input geometry and a second geometry, rotated 180 degrees about that bond from the input geometry, will be used during the free energy workflow. (If the Rotate Bond values are left blank, this rotation will not be performed). Note that rotations about bonds in the molecule are naturally possible during the molecular dynamics simulation. But for some molecules, there may be a bond for which the rotational barrier is high and so the molecule will not be able to sample sufficiently between conformers separated by that high barrier. In situations where it is ambiguous which conformer is more likely, the Rotate Bond function can be useful.
Start and end are set to the atom numbers (see in 2D plot) defining the rotatable bond. Rotation will be performed with the atoms on the “End” side of the bond moving, relative to the fixed positions on the “Start” side. A second conformed is generated by a rotation of 180 degrees about the rotatable bond. This results in two geometries: The starting (input) geometry, and a second geometry reflecting a 180-degree rotation about the selected bond.
The Mutation Map (below) will be generated as normal. But when a rotatable bond has been defined for a molecule, all mutations (edges) to/from the node corresponding to that molecule will be carried out twice, once to the starting conformer, and a second time to the 180-degree rotated conformer. Subsequently, the two free energies of mutation for each edge connecting to the node with two conformers will be Boltzmann averaged.
For a molecule where an additional conformer has been defined using the Rotate Bond option, two molecules will subsequently be associated with the molecule (node). The first retains the name as read from the input ligand file (or auto-assigned by the program if no name was given), INPUT_NAME. The second will be named INPUT_NAME-alt. For example, in the case where rotable bonds are defined for two ligands with the names “ref” and “6e”, the mutations table (described in a section below) would list the mutations list as:
4.5 Mutation Map
Once the file uploads are complete, you can create the mutation map. The mutation map creates a set of connections (known as mutations or “edges”) among the ligands that define the set of FEP calculations that will be used to evaluate the relative free energies of binding for the set of ligands. The Mutation Map must be determined before continuing with the calculation. Once all the input files have been supplied, press the Generate Mutation Map button that appears on the bottom right of the File Uploads section. An automated iterative optimization is carried out to define the mutation map. If the user wants to make changes to the map generated, that is possible using the editing tools (see Mutation Map Editing Below).
The mutation map has three goals
- Evaluate similarity among the ligands, and create mutation paths between ligands that are closely related (for which relative free energy calculations tend to converge more readily)
- Ensure that there is at least one pathway that connects every input ligand to the other input ligands
- Add redundant mutation paths that can help reduce the overall error in the calculated free energies
A sample map, for eighteen input ligands, is shown above. The map consists of nodes (shown as circles) and edges (lines that connect the nodes). Each node corresponds to one of the input ligands as defined by the user. The character string next to each node is the name of the ligand (see the Name field of the ligands table). Each edge is a mutation that will be carried out by the program. The width of the edge reflects the similarity, with thicker edges being more similar, and thin edges being less similar. If you move the cursor over any edge, a number will appear that provides a similarity score, which can range from 0.0 (completely dissimilar) to 1.0 (identical).
You can enlarge/reduce the size of the map using the scroll wheel of your mouse. Enlarging the local map is sometimes useful to better see the varying widths of the edges, or to better read the names next to the nodes.
Very low similarity indices (especially 0.0-0.4) should be examined carefully to ensure that the ligands connected by the edge are acceptably similar to make it possible to determine a reliable free energy. Note that there will usually be more edges (free energy mutations to calculate) than input ligands. As noted, the additional edges are to reduce the overall error in the net calculated relative free energies.
Hovering the cursor over any node will present the 2D representation of the ligand corresponding to that node.
The user should evaluate the mutation map carefully before proceeding, ensuring both that the edges make sense, and that the 2D representations are as you expect for the input ligands. Once you are satisfied with both, you can move on to specifying Simulation Options and then running the simulation.
4.5.1 Mutation Map Progress Bar
While the calculations are being performed to produce the mutation map are being carried out, a progress bar will appear at the top of the Mutation Map section of the panel. When the mutation map determination is finished, the progress bar will disappear and the map will be shown.
4.5.2 Mutation Maps with Formal Charge Changes
In the case where all the ligands have the same formal charge (e.g. +1, 0, -1, etc.), then the mutation map is generated as described above. If the ligand set includes ligands with a mixture of formal charges, then in addition to topological and conformational similarity, we must also consider the charge change in optimizing the mutation map. In this case, the process followed is:
- Separate ligands into sets based on their formal net charge
- Generate individual mutation maps for each group of formal net charges
- Also generate a mutation matrix for all ligands, ignoring formal charge (full set)
- Edges from the full set that connect the charge-based sets are introduced (on the basis of similarity and molecular weight) to ensure that there is edge connectivity for all nodes in the map (regardless of charge)
- By default, the maximum number of edges connecting nodes with differing charges is 2. This default can be changed in Expert Mode. In Expert Mode, the field where you can change that value appears to the left of the Generate Mutations button
Changing the Maximum Number of Charge Changing Mutations will have no effect if all ligands have the same formal
net charge.
4.5.3 Mutation Map Editing
Using the editing tools, it is possible for the user to clean up the map presentation on the screen and add or remove edges (mutations) from the map, as described below.
- Prettify: This button will automatically adjust the positions of the nodes on the screen to improve the visual presentation. Use this if nodes are overlapped, or if you have adjusted the map edges via editing and want a better view. The Prettify button does not change the underlying simulation matrix. Clicking this button won’t change what mutations will be carried out–it merely provides a nicer presentation of the matrix on the screen.
- Download: This button allows you to download the current view of the mutation map in the .SVG format.
- Edit: This button puts the mutation map in Edit mode. In Edit mode, you can modify the edges (lines) that connect the nodes (ligands) together. These define the set of mutations that will be carried out. If you click on the Edit Edge Button, two additional buttons, “+” and “-” appear at the lower right region of the region. Click on the “Done” button at any time to save the edits you have made and go back into View mode. You can click on Edit again, after “Done”, to make more edits, if desired.
- Stop: The Edit button changes to the Stop button when you enter Edit mode. Clicking on the Stop button commits any changes you have made during editing, and returns to the View mode.
To remove an edge, click on the center of that edge. The color of the edge will turn red. Then click on the “-” button to remove that edge. If you decide you do not want to delete that edge after clicking on it, simply click on another element in the map (another edge or a node), and the previously-selected edge will be de-selected. (You can also click on the Stop Editing button to achieve the same thing).
To add an edge, you need to click on two nodes you want to connect, then click on the “+” button to add the edge. The order in which you selected the nodes is important: The mutation direction will be from the first node you select to the second node you select. When you select a node, it will turn red. (If you want to de-select that node, just click it a second time). You cannot add an edge to two nodes that are already connected by an edge. When you click on the first node in Edit mode (Node_1), that becomes the root node for all additions in this edit session. Clicking on a second node (Node_2) then on the “+” button adds an edge between those two nodes (Node_1-Node_2). Then, clicking on a third node (Node_3) and the “+” button will add another edge between the first node and the third (Node_1-Node_3). If you wish to pick a different root node for editing, click the “Done” button, then click “Edit” again to start a new set of edits. (The previous edits are saved and retained when you click Done).
When editing edges, you need to ensure that every node is somehow connected to every other node through a pathway of edges. More generally, to reduce the net statistical errors in the calculated binding energies, every node should be part of at least one closed loop.
If you create singletons (nodes with no connections) or islands (a group of nodes connected to each other but not connected by any edge to other another island of nodes), the automated workflow will fail. If the ligand set is such that there is no obvious way to connect certain nodes (because islands of nodes are too dissimilar to each other), you may need to set up multiple simulations, one for each island of nodes.
When you are satisfied, click the “Stop Editing” button. If you make a mistake and wish to return to the default automatically generated map, you can re-generate the map by clicking the “Generate Mutation Map” button.
If you make a mistake during editing and wish to revert to the starting map, click on the Generate Mutation Map button at the bottom of the file input section. This will delete the current mutation map and replace it with one auto-generated from the input files.
4.5.4 Star Mutation Maps (Expert Mode)
If the user has chosen the “Generate Star Map” option in the previous section, then instead of the closed cycle map described above, a star map is generated, where one edge connects the reference ligand to every other ligand in the set, as shown below:
This simple map requires the minimum number of mutation edges to be calculated and can be useful to save time or compute resources, although at the possible expense of increased overall error. The reference ligand, as specified during input, appears at the center of the star map. All other functions (Prettify/Download/Edit) operate as described for the standard Mutation Map, above.
4.6 Adding Additional Ligands to a Computed Set
On occasion, after starting or completing a simulation, you may decide you wish to add more ligands to the set. While you could run an entirely new simulation specifying the larger set, if you have already run calculations for some members of the expanded set, you may wish, instead, to simply add the extra mutations to those already evaluated to save on computational cost.
You can add additional ligands to a calculation that has either been paused (stopped) or else a calculation that has finished. You cannot add ligands to a job that is in “queued” or “running” status. If you have submitted a job, and the job has not yet been completed, and wish to add additional ligands, click on the Stop button, wait for the status to change to Stopped, add the ligands, and then Restart the job.
To add ligands to a stopped or completed job, you simply upload additional ligands via the All ligands (excluding reference) dialog, as described above under File Input Specification. The additional ligands will appear in the list of uploaded structures that appear under the upload dialog, at the end of the list.
After uploading additional ligands, if you subsequently wish to delete any of the added ligands, you can delete them from the list using the red “X” buttons on each line of the uploaded ligands set. (Be careful not to delete ligands from the original set; if you delete one of those ligands, any calculations related to that ligand will be deleted from the project and cannot be recovered without uploading that ligand again and re-running).
Once you have uploaded the additional ligands, you will see a pop-up that warns you that you will need to regenerate the mutations map, using the Generate Mutation Map button. Clicking on this button will generate a new map that includes the additional ligands, but retains the map that was generated prior to adding the additional ligands. Edges and nodes that include the newly added ligands will be shown in blue. Below, as an example, is the original map generated for a set of ligands, and the updated map that was generated after adding additional ligands. Note that the map for the original ligands may appear slightly different in the new map, but the connections (edges) will be the same. In this example, note that the edge between ligands 17 and 1q is drawn pointing downward in the original map, and it is drawn pointing upward in the map with the additional ligands. This has no effect on the calculation but aids in presenting a clear view of the map with the added vertices/edges.
By forcing the original map to be retained, the number of additional edges that is generated through this process is minimized and computational cost is kept in check. The resulting map can be edited just as described for the original map, and you can both add and remove edges. Note that if you attempt to remove an edge for a calculation that was already carried out in the first part of the calculation (before adding ligands), the mutation for that edge will be permanently removed and cannot be retrieved. A warning will appear to avoid inadvertently deleting edges for the previous calculations.
Note, also, that because the original map (from before adding additional ligands) is retained in any expanded map that results from adding additional ligands, it is likely that the overall map that is generated when adding ligands to a pre-existing calculation will be different from the map that would be generated if the full set of ligands (from the two steps) had been specified from the beginning. The original map generation attempts to optimize similarity connections while limiting a total number of edges. The resulting edges can and will differ depending on the set of ligands provided. Typically, the edge structure of a calculation performed without adding additional ligands later will be overall better
optimized. That said, the map generated in two steps will still connect all vertices with at least one edge.
Some other requirements when adding additional ligands:
- Do not change the map type (standard vs star map). Whatever option you chose originally should be maintained.
- Do not change the Simulation Options (below).
After you have uploaded the additional ligands and generated the map (and edited the resulting map, if desired), you can click on the Resume button (below) to restart the simulation for the revised map/ligand set.
The upside of simply adding to an existing calculation is that you reduce the number of new edges that need to be calculated. The downside is that in order to reduce the number of edges added, the resulting map may not be as well optimized as if you had run everything at once.
4.7 Simulation Options
The options displayed will depend on whether you are running in Standard Mode (default) or in Expert Mode. As noted earlier, Expert Mode, if desired, is turned on in the User Options panel. The standard options will be described first. Expert Mode options will be described in the subsequent section. You will only see one of the FEP Options sets, depending on the status of the Expert selection toggle.
4.7.1 The Overall FEP Workflow
The overall FEP workflow, which places many of the options described in the following sections into context, is shown in the following diagram. Here, you can see how the pre-FEP MD equilibrated structure feeds into each lambda simulation, where it is used as the starting point for additional equilibration and then data collection.
4.8 FEP Options (Standard)
A limited number of options are provided here. Reasonable defaults are provided for most options related to FEP and are not shown here. The remaining options include:
- FEP type: Defines whether this FEP simulation will use only a classical MM force field, or whether this simulation will incorporate a quantum region around the ligand, and use a QM/MM force field.
- Classical: Use a classical force field for the entire system
- Quantum: Use a QM representation for the ligand and surrounding residues and a QM/MM force field.
- #windows: Number of windows that connect the two ligands for which a free energy difference is calculated.
Convergence is improved by breaking up the free energy simulation into a set of more similar lambda intermediates and then summing the free energies for those intermediate states to get the total free energy. The number of lambda points connecting the endpoints is #windows+1. The default is #windows = 12 for Classical, and #windows = 13 for Quantum. For Quantum, the extra window is inserted between the two lambda points near the endpoint to improve convergence. - Simulation time (ns): The amount of molecular dynamics (MD) sampling done for each FEP window to obtain a converged value of the quantity that determines the free energy change for the corresponding lambda window. The default value is 5.0ns for a Classical simulation, and 2.0ns for a Quantum (QM/MM) simulation.
- Reset to Default: Reset all options to their defaults.
- Start Simulation: This option will start the FEP simulation. This option only appears for FEP_Type=Classical.
For FEP_Type=quantum, additional options related to the quantum region need to be set after system preparation. If you need to set the calculation to pause after the Preparation stage for a classical simulation, tick the box next to Stop after preparation (found above this button). - Start Preparation: This option will start FEP preparation. This option only appears for FEP_Type=Quantum.
For FEP_Type=classical, stopping after preparation is not typically necessary and the preparation and simulation steps are combined. - Stop after preparation: Pauses the workflow after the preparation step. This is the default (and can’t be changed) for FEP_Type=quantum, because the user must subsequently define the quantum region. For FEP_type=classical, the user can force a pause in the workflow by clicking this box.
4.9 FEP Options (Expert)
If you have enabled the Expert Mode in the User Settings panel, then a larger number of FEP options are accessible.
These options will appear below the Mutation Map.
- FEP type: Defines whether this FEP simulation will use only a classical MM force field, or whether this simulation will incorporate a quantum region around the ligand, and use a QM/MM force field.
- Classical: Use a classical force field for the entire system
- Quantum: Use a QM representation for the ligand and surrounding residues, and a QM/MM force field.
- Pre-aligned Ligands: By default, additional ligand that is input (except the reference ligand) will be automatically reoriented to best overlay the reference ligand, using MCS. If you tick this box, then no reorientation will be performed for additional ligands. This option requires that the user ensure they have been pre-oriented to the reference ligand. If the user chooses this option and fails to pre-orient their ligands before importing them, the resulting calculation can fail to converge or fail entirely. With this option toggled on, the single and dual topology regions of a pair of molecules are determined by comparing atomic coordinates. Any atoms closer than 0.1 Angstrom will be considered part of the single topology region.
- Timestep Length (ps): The integration timestep in picoseconds. Default is 0.002ps. Generally, you should not exceed this value, but lower values are acceptable (though lower values will require more computational timesteps to achieve the same effective length of simulation).
- Random Number Seed: A random number that will be used to create a chain of random numbers. These random numbers are used to set the initial velocities for the system. Using a different random number will create a different set of velocities and a different overall trajectory. A value of “-1” uses the default random number seed, which is based on the current Unix time when the job is submitted. Submitting two jobs with the default value of -1 and no other differences would generate two trajectories that vary due to their stochastically-generated initial velocities. If you wish to be able to exactly reproduce the same results when running the same job twice, you need to specify a (non-default) random number seed. E.g. running the calculation twice with a random number seed of 32177 would give identical results.
- Pre-FEP Equilibration (ns): The total amount of MD equilibration to perform before starting the FEP runs, in nanoseconds. This value does not include any MD requested by Heating Duration and Pressurization Duration (below). The default is 0.25ns (250ps). This value also does not include any additional QM/MM equilibration (below) that is performed in the case of a Quantum simulation.
- Heating Duration (ns): The temperature will be ramped up from the initial temperature of 0K to a target temperature of 300K over the specified number of nanoseconds of the simulation. This Heating Duration ramping time is not included in the total Pre-FEP Equilibration time (above). The default is 0.1ns (100ps). If no temperature ramping is desired, then set the Heating Duration to 0.00.
- Pressurization Duration (ns): The pressure will be ramped up from the value calculated from the input system to the Target Pressure of 1atm over the specified number of nanoseconds of simulation. This Pressurization Duration ramping time is not included in the total Pre-FEP Equilibration time (above). The default is 0.1ns (100ps). If no pressure ramping is desired, then set this value to 0.00.
- QM/MM equilibration (ns): For a QM/MM simulation, additional pre-FEP equilibration is generally applied to more thoroughly allow the system to reflect the force field. The QM/MM equilibration is performed using the QM/MM force field that will also be used for the FEP calculation. QM/MM equilibration is not applied for “Classical” mode. For QM/MM, total equilibration is therefore: (Pre-FEP Equilibration) + (QM/MM equilibration) The default is 1.0ns.
The temperature heating ramp is performed before the pressurization ramp, after which the Pre-FEP Equilibration MD is performed. Total FEP Pre-Equilibration simulation time will be:
(Heating Duration) + (Pressurization Duration) + (Pre-FEP Equilibration) + (QM/MM Equilibration).
(Default is 450ps total for a classical simulation, 1,450ps for a QM/MM simulation).
The remaining options define the number of intermediate “lambda” windows, and the amounts of equilibration and data collection performed at each window. The defaults are set to widely-accepted values used in literature studies. A full free energy calculation consists of two separate calculations, one for the ligand bound to the receptor, and a second for the ligand free in solution. The values defined here will be used for both separate calculations.
The FEP simulation at every lambda point starts with the equilibrated system as generated according to the Pre-Equilibration Options above. Additional equilibration is performed at that lambda point, using a potential energy function that reflects that specific lambda point. The amount of additional equilibration to perform is defined by the “Equilibration (ns)” parameter below.
- #windows: Number of windows that connect the two ligands for which a free energy difference is being calculated. Convergence is improved by breaking up the free energy simulation into a set of more-similar lambda intermediates and then summing the free energies for those intermediate sets to get the total free energy. The number of lambda points connecting the endpoints is #windows+1. The default is #windows = 12 for Classical, and #windows = 13 for Quantum. For quantum, the extra window is inserted between the two lambda points near the endpoint to improve convergence.
- Random Number Seed: A random number that will be used to create a chain of random numbers. These random numbers are used to set the initial velocities for the MD used in generating the FEP statistics. Using a different random number will create a different set of velocities and a different overall trajectory. A value of “-1” uses the default random number seed.
- Equilibration (ns): The number of nanoseconds of MD equilibration to perform at each lambda point before data collection starts. The default is 0.5 ns.
- Simulation time (ns): The amount of molecular dynamics (MD) sampling to perform for each FEP window to obtain a converged potential energy trajectory used to determine the corresponding free energy change for that window. The default value is 5.0ns for a Classical simulation, and 2.0ns for a Quantum (QM/MM) simulation.
- Reset to Default: Reset all options to their defaults.
- Start Simulation: This option will start the FEP simulation. This option only appears for FEP_Type=Classical. For FEP_Type=quantum, additional options related to the quantum region need to be set after system preparation. If you need to set the calculation to pause after the Preparation stage for a classical simulation, tick the box next to Stop after preparation (found above this button).
- Start Preparation: This option will start FEP preparation. This option only appears for FEP_Type=Quantum. For FEP_Type=classical, stopping after preparation is not typically necessary, and the preparation and simulation steps are combined.
- Stop after preparation: Pauses the workflow after the preparation step. This is the default (and can’t be changed) for FEP_Type=quantum, because the user must subsequently define the quantum region. For FEP_type=classical, the user can force a pause in the workflow by clicking this box.
4.10 FEP Preparation
4.10.1 Ligand Alignment
The first stage in preparation is to accurately align the ligands to allow the automated determination of topological correspondences required for FEP calculations. At this stage, parameter assignments for the MM region and other preparatory work is performed. The progress of this preparation portion of the workflow is indicated by a progress bar, as shown.
4.10.2 QM Region Specification
The QM region specification options are described in more detail in the subsequent chapter, “QM Region Selection.” Here, a simple description of the options is provided.
If you have set FEP Type = Quantum in the FEP options, then an additional panel will appear below the FEP Options section. If you are running an FEP Type = Classical simulation, the entire system is treated using MM, and this section is hidden.
In this section, the user can specify the extent of the region to be treated at the quantum mechanics level. The MD calculations are carried out using a QM/MM implementation, where the core residues (the ligand and chosen surrounding residues) are treated with the higher accuracy QM approach, and the remainder of the system is treated using a classical molecular mechanics force field. Note that explicit water solvents and ions are always treated using MM.
For QM/MM simulations, the ligand must always be included in the QM region. The ligand is represented by the keyword “ligand” (lowercase, no quotes) in the definition box.
Residues included in the QM region are filtered by the maximum number of atoms in that region. The maximum atoms filter is applied when generating an automated definition of the QM region (Suggest a QM region), along with a cutoff distance of 5 Angstroms. The distance between the ligand and a protein residue is defined as the shortest distance of any atom of the ligand to any atom of that residue. If that distance is less than 5 Angstroms, the residue is a candidate for being included in the suggested QM region.
- Region Max Atoms: The maximum number of atoms that will be included in the QM region if the region is automatically generated using the Suggest a QM Region button (below). The atom cutoff is implemented on a residue basis, so that any residue is either entirely within the QM region, or it is excluded from that region. The default is 100 atoms. Increasing the number of atoms in the QM region will increase the computation required for the simulation to finish.
- Selection Algorithm: [Simple/Optimal] This selection applies when using the automated “Suggest a QM Region button (below). In both cases, an initial set of protein residue candidates for inclusion in the QM region is generated based on a cutoff distance of 5A from the ligand. A ranking value is assigned to every residue candidate, and then residues are removed from the candidate pool, based on rank, until the Region Maximum Atoms limit is met.
- Simple: The ranking value for each protein residue in the candidate set is based on the minimum distance of any atom of the protein residue to any atom of the ligand. A higher ranking value corresponds to a larger distance. Residues are removed from the candidate pool in order of ranking (high to low).
- Optimal: The ranking value for each protein residue in the candidate set is based on both the minimum distance of any atom of the protein residue to any atom of the ligand and whether the protein residue makes specific close-distance polar interactions with the ligand. A higher ranking value corresponds to a residue making less significant interactions. Residues are removed from the candidate pool in order of ranking (high to low).
- Region Max Atoms (Charge)
The atom maximum here applies only if there are mutations between residues that have differing net formal charges. If there are no mutations between nodes with differing formal charges, then this option will be greyed out. The QM region for mutations where a formal charge change occurs is selected via a different approach than the remainder of the mutations. For formal charge change mutations, the chosen QM region is informed by QM-based polarizability calculations on the binding site. Typically, a somewhat larger maximum number of atoms is required for these mutations. With this option, you can set the maximum number of atoms for charge change mutations (or accept the default). - Algorithm (Charge)
The algorithm selector here applies only if there are mutations between residues that have differing net formal charges. If there are no mutations between nodes with differing formal charges, then this option will be greyed out. As with the standard Selection Algorithm, there are two options, Optimal (default) and Simple. The operation of these options is described above, under “Selection Algorithm”. - The text input area: In the middle of this region, you will find a text input box. You can specify residues that are to be included in the QM region in this text box using the format described in the chapter “QM Region Specification.” This box can be populated either manually, by writing in the definitions on your own, or else automatically by pressing the “Suggest a QM Region” button. The ligand must always be included in the QM region, using the definition “ligand” (lowercase, no quotes). If you use the Select a QM Region button, the ligand will appear at the top of the suggestion text automatically.
- Suggest a QM region: Auto-populates the text input area with a set of residues that meet the criteria described Selection Algorithm (above). The user can modify the list once it is generated (adding or subtracting residues).
- Validate: This button allows the user to validate the set of residues that are specified in the text input area above the button. Validate will ensure the residue definitions correspond to residues in the input PDB file. Validate will also add capping residues, if necessary, to ensure proper QM/MM boundaries for the calculation. If you use the Suggest a QM Region button and do not modify the contents of the definition box, the contents are automatically validated, and the Validate button is inactive. Only if you change the contents of the definition box will the Validate button be active.
Visualize: Creates a pop-up window that shows the selected QM-region residues in the context of the protein/ligand complex. The reference ligand will be shown in blue, bound to the protein. The protein residues included in the QM region will be shown in red. You can rotate the view using your mouse (left button) or translate the view with the mouse (right button). Double-click on any atom to reset that atom as the center for rotations. If you hover the mouse over any atom, information on the atom and the associated residue (name, number) will pop up. To close the pop-up window, click the “X” near the top left of the box.
- QM Region Summary: To the right of this section, you will find the QM Region Summary, which is populated after you validate the QM region contents. (The definition is auto-validated, and this region is auto-populated if you use the Suggest a QM Region button). In this section, you will find:
- Status: Whether the contents of the definition box are properly validated. The possibilities are “Validated” or “Invalid.” If the contents are Invalid (there is a problem) or if they are valid but a warning was issued, the status will be shown in red, and you can ascertain more information by hovering the cursor over the question
mark. - Number of link atoms: Number of hydrogen atoms that were added to address unfulfilled valencies that arose from using separated protein residues. These are required for the QM simulation.
- Number of total QM atoms: Total atoms in the QM region, excluding the ligand itself. Based on the reference ligand.
- Total QM charges: Total QM-determined charge in the QM region (ligand + other residues), based on the reference ligand.
- Status: Whether the contents of the definition box are properly validated. The possibilities are “Validated” or “Invalid.” If the contents are Invalid (there is a problem) or if they are valid but a warning was issued, the status will be shown in red, and you can ascertain more information by hovering the cursor over the question
4.10.3 Start Simulation
Once you are satisfied with your quantum region selections, you can start the simulation using the Start Simulation button at the bottom right of this part of the panel.
4.11 Simulation Status
This portion of the panel provides the status of the calculation once it has been started using the Start Simulation button. It also provides the ability to stop and restart a calculation that has been submitted.
- Stop: Stop a calculation that was previously submitted and is in progress. A stopped calculation is saved in the cloud storage associated with your account and can be restarted later, using the “Run” command.
- Run: Run a previously Stopped job. If the job has not previously been started using “Start Simulation,” then the Run button has the same effect as “Start Simulation,” i.e., it will begin the calculation.
- Light Grey: That segment of the calculation workflow has not been run.
- Dark Blue + Light Grey: That segment of the workflow is in progress and the dark blue portion of the segment reflects a progress bar.
- Dark Grey: That segment of the workflow has been completed.
- Red: That segment of the workflow is completed with errors.
Further down in this portion of the page, the status is provided at a more detailed level, showing the progress for every ligand mutation (every mutation represented by an edge in the mutation map. The names of the starting and ending ligands for each mutation are shown in the table, along with a four-segmented progress bar. The four segments for the progress bar represent Preparation/Equilibration/Production/Analysis. Each segment is colored as described above.
4.12 Results
Below the “Simulation Status” section, you will find the results of your calculation. The information in this section will be auto-populated when the simulations requested have been completed. The data is provided in four sections:
4.12.1 Result for Each Mutation
One result will appear in this table for every edge in the Mutation Map. The edges (lines) represent pairs of ligands between which free energy calculations will be carried out. The number of connections (edges) will typically be larger than the number of input ligands. The relative free energy changes for all the edges will be calculated. Performing more than the bare minimum number of changes required to connect the ligands helps provide more reliable net free energy difference values and associated errors in the Free Energy Results table (below). Note that information from the redundant pathways are not used in the Mutations results table, which provides energy values and errors determined from only the single mutation in each row.
All columns of the Result for Each Mutation table are sortable. Clicking on the header once will sort low-to-high. Clicking on the header a second time will sort high-to-low.
For each mutation that was run, the following will be reported:
- Mutation: Sequence number for the mutation calculation, starting from one.
- Start Ligand: The name of the starting ligand for the mutation. The name is either taken from the user input ligand file or, if not present in that file, assigned by the program as LNNN, where “NNN” is the counter for ligands without user-supplied names, which starts at 1. These names are auto-assigned, if necessary, for ligands supplied in the “All ligands (except reference)” input. For example, the first ligand read in without a name would be L001, and the tenth ligand read in without a name would be L010.
- End Ligand: The name of the ending ligand for the mutation. The name is obtained in the same fashion as for Start Ligand.
- ddG: The computed free energy of binding (ddG) (in kcal/mol) for the specified mutation between the Start and End Ligands. ddG is calculated as the difference [dG(complex) - dG(ligand)], which provides the net free energy of binding via the thermodynamic cycle relationship.
- dG(complex): The free energy (kcal/mol) calculated for the mutation between the Start and End ligands in the ligand/protein complex.
- dG(ligand): The free energy (kcal/mol) calculated for the mutation between the Start and End ligands calculated free in solution.
Below the Mutations table, an Export button is provided.
- Export table: Download the data in the Mutations table as a CSV-formatted file.
4.12.2 Accumulated free energy and convergence plots
If you select any row of the Status for Each Mutation table, then two graphs will appear as a pop-up.
- Accumulated free energy: This graph presents the accumulated ddG free energy differences versus lambda for the transformation selected in the Mutations table. The lambda values for the physical endpoints are always 0 and 1. Values are presented for the net ddG = [dG(complex) - dG(ligand)] at each value of lambda. Rolling the cursor over any point of the plot will show the (x,y) values for that point.
- Convergence with respect to sampling: This graph presents how the calculated net free energy for the mutation would vary if we used only a subset of the collected data, up to the maximum amount of data collected. From this curve, one can infer how well converged the results may be, and whether the sampling was sufficient. Rolling the cursor over any point of the plot will show the (x,y) values for that point.
In addition, 2D representations of the ligands corresponding to the two endpoints of the mutation are presented at the top of the pop-up window.
4.12.3 Free Energy Results
In this table, the net relative free energies for all the input ligands are presented. The net free energies are evaluated by averaging the energies for multiple paths (edges) that connect the ligand to the reference ligand. The Sampling Error represent the maximum error for any non-redundant closed loop of edges that includes the ligand.
All columns of the Free Energy Results table are sortable. Clicking on the header once will sort low-to-high. Clicking on the header a second time will sort high-to-low.
- Ligand: The name of the ligand. The name is either taken from the user input ligand file or, if not present in that file, assigned by the program as LNNN.
- ddG: The free energy of binding for the ligand (kcal/mol), relative to the reference ligand. The reference ligand is arbitrarily assigned a free energy of binding of 0.00. Relative free energy values ddG are calculated as ddG = [dG(complex) = dG(ligand)].
- Sampling Error: Sampling Error: Sampling error (kcal/mol) is estimated as the maximum error for any nonredundant closed loop that includes the ligand. If a mutation is not part of any closed loop defined by edges, then the estimated error is obtained from the error of the nearest ligand on a non-redundant closed loop. If there are no closed loops (i.e., a Task with a single mutation), the error is obtained from a combination of the statistical analysis carried out during the BAR processing for that mutation, plus the hysteresis for the calculation (energies calculated in ascending and descending direction for each lambda window). The sampling error is to be considered a lower bound.
Clicking on any row of the Free Energy Results table will show the 2D structure of the ligand for that row, on the right-hand side of the panel.
4.12.4 Result Download
At the bottom of the panel, you will find the Result Download button. You can use this button to download a ZIP format archive, named BATCH_NAME_raw_outputs.zip. (BATCH_NAME is the name you have given this batch calculation; For example, if your Batch was named FEP1, then the file would be named FEP1_raw_outputs.zip). Within this archive, you will find the following directories:
- ligands subdirectory: Contains a set of subdirectories with the input ligand structures, as provided by the user.
- receptor subdirectory: Contains the input protein receptor structure, as provided by the user.
- ligand_output subdirectory: Contains a CSV format file with the calculated free energies, one line per input ligand.
- mutations subdirectory: Contains several entities.
- mutation_results.csv is the calculated relative free energies for each of the mutations (edges) defined by the mutations map, in CSV format.
- specific mutations subdirectories: Each of these mutation subdirectories has a name of the format ligand1__ligand2, where ligand1 and ligand2 are the user-supplied (or program auto-generated) names. There is one such directory for every edge in the mutations map. Data within ligand1__ligand2 subdirectory is for the FEP calculation between nodes ligand1 and ligand2.
Within each of these mutation (edge) ligand1__ligand2 subdirectories you will find
bound (directory)
unbound (directory)
convergence_trajectory.csv
lambda_trajetory.csv
The two CSV files contain the data used to produce the convergence and accumulated free energy vs. lambda plots that are presented when you click on any of the mutations in the GUI.
Within both the bound and unbound subdirectories will find a series of additional subdirectories, one for each lambda window used for that FEP calculation, each named
windowNNN
where N ranges from 001 to the number windows used. Within each of these subdirectories, you will find a PDB format file (“final.pdb”) that corresponds to the entire system (ligand/protein/waters/cofactors/ions) on the final step of the MD sampling trajectory used for that window. This file can be read into any standard molecular display program.
In both the bound and unbound subdirectories, at the top level of that subdirectory, you will also find the file “initial.pdb”, which is the starting structure of the entire system (ligand/protein/waters/cofactors/ions) at the start of that window calculation.
Preparing a PDB File for use with QUELO
5.1 Introduction
The most common file format used for large molecule coordinates is “PDB.” The PDB format has undergone several revisions over the years, but the fundamental elements have not changed. PDB format is a fixed format, with a set of descriptors for each atom line in specific columns. A detailed description of the format is available from the Protein Data Bank: PDB format.
Protein structure files downloaded from the PDB repository will generally be in this format (with a suffix of .PDB or .ent). Generally, macromolecular structures determined by all common structure determination approaches (X-ray crystallography, Cryo-EM, NMR) are deposited in this format.
5.2 Preparing a PDB file for QUELO
While the general format of these files is standardized, if you wish to use a PDB (or ENT) file for computation, it must generally be cleaned up before the first use. Clean-up of a PDB file that may need to be performed may include (and may not be limited to) addressing the following:
- Removal of multiple replicates of the protein in the unit cell due to space-group symmetry
- Removal of unnecessary cofactors
- Removal of bulk waters
- Removal of unsupported metal/cation/anions (see “Supported atom types” below)
- Adding missing hydrogens
- Determining the protonation state of residues where multiple protonation states are possible at physiological pH (e.g., His)
- Determining the proper rotamers of rotatable sidechains that can make hydrogen bonds
- Addition of non-hydrogen atoms that may be missing in the structure (due to insufficient resolution or conformational variability)
- Ensuring residue names are compatible with the computational platform
- Addressing residues with non-standard names
- Ensuring that disulfide bonds are properly reflected (using CYX for the parent residue names)
- Ensuring that the PDB format is adhered to (the required information for each line is provided in the correct columns).
How a user chooses to address these various questions will depend on the particular system. While some general-purpose routines can be developed to tackle some of these questions, in general, the user is urged to spend some time ensuring that they are starting with a model of the protein receptor (and cofactors, if any) that is sufficient for their needs. A little time spent properly assessing and cleaning up the starting structure can pay significant dividends in terms of better computational results.
QUELO expects that the PDB input file adheres to some rules in terms of cleanup and suitability. Some additional automated cleanup tools are integrated into the QUELO workflow that can address other issues. The preparation that QUELO can perform/automate within the workflow is described below, following the general information for PDB files.
5.3 Understanding the PDB file (basics)
Every line in a PDB file starts with a line-type designator in the first six columns. Protein atoms typically start with ATOM in this field. Non-protein atoms often use HETATM in this field, although some files will use ATOM for these atoms as well.
A sample PDB format line is shown below:
A description of the contents of the various fields in a typical PDB file line is given below. Note that some of these elements are not used in many files, and some of these fields, in particular the fields that follow the x/y/z coordinate fields, are sometimes used to hold other information–although this is not so common for files that are deposited with the PDB.
The fields that QUELO expects to find for each record are shown in the table above. The alternate conformer signifier, if found, specifies that there are two or more conformations that contribute to the atom at that position. In this case, QUELO will automatically take the first position. The fields indicated as not Required by QUELO can still be specified but won’t necessarily be used.
5.4 Protein residue naming convention used by QUELO
The formal protonation/charge state for each protein residue is inferred from the residue name. It is important that you prepare your protein with this in mind. Protein structures obtained from sources like the PDB do not always adhere to these conventions. For most standard amino acid residues, there is only one variant that exists at physiological pH. But for a few residues–ASP, GLU, LYS, and HIS–multiple charge states can exist within the range of physiological pH, and depending on the local environment. The residue name convention used by QUELO (and atom naming conventions associated with each residue type) follow the conventions below:
5.5 Non-canonical amino acids recognized by QUELO
In addition to the 20 standard amino acids (and their charge variants) that are recognized by QUELO, as described in the previous section, QUELO also recognizes a variety of phosphorylated amino acids, which are sometimes found in protein structures. Phosphorylated amino acids that are recognized are shown in the table below. The PO4 group can exist as either PO4 (2-) or HPO4 (1-), which accounts for the two charge variations of each residue type. (For the phosphohistidines, the group is N-PO3 or N-HPO3). You must ensure you use the naming convention shown:
5.6 Reserved Ligand Residue Name
QUELO reserves the residue name UNL for the ligand residue. The user should not use this residue name in the input PDB file.
5.7 Supported Atom Types
At present, QUELO is designed to deal with all types of standard organic molecules, but there are limitations on what metal atoms and ions can be included:
- Organic molecules: All standard organic atom types that are found in proteins, ligands, and cofactors are supported (H, C, N, O, P, S, Cl, Br, F, I, etc.)
- Metal atoms: QUELO supports all metals that appear in the Protein Data Bank at least a few times. The atom name of the metal must match the residue name. Note that iron and copper have more than one possible charge state; pay particular care when choosing their residue/atom names and element names. Here is a list of the supported metals, alongside their required names and charges:
- Solvent ions: QUELO reads in Na+ and Cl- ions from the input PDB file. Additional Na+ and Cl- ions will also be generated to match physiological concentration and neutralize the simulated system.
The Na+ residue must have the residue name Na+ to be recognized. The Cl- residue must have the residue name Cl- to be recognized. - Waters: The program supports water molecules with either of the following residue names: WAT or HOH.
5.8 Automated processing that can be performed by QUELO
Below is a list of the PDB file processing that QUELO can automate. This is a subset of the types of processing that may be necessary for a PDB file, and the user may need to do additional processing before uploading the file to QUELO. Similarly, if the user uses another program to fully preprocess the input PDB file, then QUELO will skip the processing stages that are unnecessary.
The processing QUELO will perform, if necessary:
- Hydrogen atoms will be added to the protein structure. If any hydrogens are found on the input PDB protein structure, they will be removed and replaced. Hydrogens are added on the basis of residue name, using the conventions described above.
- Adding missing side chain atoms. Note that this only works if all the corresponding backbone heavy atoms for the corresponding residues are present.
- Adding terminating/capping residues ACE (Acetyl) and NME (N-methyl). To avoid issues with naming consistency, if ACE and NME capping groups are found in the input receptor PDB file, they will be removed and replaced.
- Chain breaks: If the residue numbers within a chain are discontinuous, it is assumed that there is a gap/break in the chain, and the residues to the sides of that gap will be capped using ACE/NME
- Missing loops: A missing loop cannot be added using QUELO. It will be treated as a chain break, and treated as described above. If the missing loop is critical for proper simulation, you should use a tool that allows prediction and insertion of the loop before using QUELO.
- Solvent/ion residues of the following types will be recognized and processed: NA+, CL-, WAT, HOH. If you are retaining waters in the input PDB structure (generally, if any waters are retained, only a small number of structurally important waters should be retained), be sure that they are within the Solvent Buffer Distance of the protein, as specified in the advanced input options. The Solvent Buffer Distance defines the width of the solvent box, in terms of the minimum distance between dimensions of the box and the closest atom of the protein/ligand system. Waters in the input PDB file that fall outside the box defined by Solvent Buffer Distance will be removed.
- Cofactors (residues not otherwise recognized) that are included in the PDB file can be processed and included in the simulation. Rules for specifying cofactors are provided below.
5.9 Rules for Specifying Cofactors
Cofactors are a special class of molecules when it comes to processing a PDB file. Residue names are matched to identify proteins, certain ions (NA+, CL-), waters (HOH or WAT), and metal atoms (ZN2+). Any residues not otherwise recognized are termed “cofactors.”
The user is responsible for cleaning up the input PDB file before uploading it into the platform. Details of what this means are described earlier in this chapter. One task for the user is to remove any cofactors that should not be included for the simulation.
If the user has decided that certain cofactors are important to include in the simulation, then the following rules must be adhered to to ensure that the cofactor molecules are properly processed by the QUELO platform.
- Contiguous atoms sharing the same residue number and chain are considered part of the same cofactor
- Note that this means that if the cofactor initially consists of multiple residues, both the residue names and residue numbers must be modified so that all atoms are part of the same residue
- Cofactors cannot have a name that is the same as those of standard recognized residues (standard amino acids, etc.)
- Cofactors cannot contain atoms with the names C, N, CA, or O. These names are reserved for amino acids. Inclusion of atoms with these names in a cofactor will raise an error.
- Cofactors must have reasonable geometries and must have all atoms (including hydrogens) present.
- There should be CONECT records within the PDB file that define the topology (bonds) for each cofactor. If these are not provided, it is assumed that all bonds are single. In addition, if CONECT records are not provided, there is the possibility that the topology may not be correctly interpreted.
- Atoms assigned a formal charge in the molecule need to be identified with special atom types in fields 77-80 (see PDB file fields in the previous section). Two special atom types are recognized in this context:
- “O1-”: Charged oxygen in carboxylates, etc.
- “N1+”: Charged nitrogen in amines, etc.
The atomic charges are still determined via fitting of the results from quantum mechanics for atoms specified as
having a formal charge. These designations merely ensure the proper distribution
QM Region Selection
Running a QM/MM simulation requires the specification of the set of atoms that will be treated using a quantum mechanical representation. The selected atoms are treated using QM, and the remaining atoms in the system are treated using a classical molecular mechanics (MM) force field.
Significant increases in the size of the QM region will necessarily increase the computing cost and lengthen the turnaround time for a calculation. For this reason, one generally identifies a QM region just large enough to contain the most significant interactions of the binding region with the bound ligand. QM regions of 100-200 or fewer atoms tend to reflect a good compromise, although one can increase the QM region to up to 300 or so atoms and still obtain turnaround on a timescale that is acceptable by modern drug discovery standards.
Atoms are selected on a full residue basis, with the exception of certain backbone atoms on adjoining protein residues, which can be included to improve stability.
6.1 QM Region Specification
The QM region can either be generated automatically, or else the user can define the region by specifying exactly the residue set they wish to be included in this region. No matter which option is chosen, the mutating ligand must always be included in the QM region.
In the case of auto-generated QM region residues, a set of criteria are applied:
- CUTOFF_DISTANCE: The CUTOFF_DISTANCE is fixed at 5 Angstroms and cannot be modified by the user.
The cutoff distance is used to automatically identify residues that are eligible for further consideration as part of the QM region. The CUTOFF_DISTANCE is not used if the user is specifying the QM region manually, via residue definitions in the text input box (below). A search is performed to identify residues for which at least one atom is <= the cutoff distance from at least one atom of the mutating ligand. Any residue that meets this search criterion will pass the cutoff selection filter. Selection is performed on a full-residue basis (if one or more atoms of the residue are close enough, the entire residue is included). The cutoff distance is applied to both the protein receptor and any cofactors that may have been included in the input PDB file. Water molecules in the input PDB file are not included in the cutoff. - MAXIMUM_ATOMS: The maximum number of atoms that will be included in the QM region when auto-selection is being carried out. If the number of atoms selected for auto-select is greater than MAXIMUM_ATOMS, the selection will be trimmed (on a residue basis) until the MAXIMUM_ATOMS criterion is met.
Note that if you are in Expert Mode, there is an atom maximum threshold, which is applied only to mutations
between residues that have differing net formal charges. The QM region for mutations where a formal charge
change occurs is selected via a different approach than the remainder of the mutations–one based on assessing
the polarizability of the binding region. Typically, a somewhat larger maximum number of atoms is required for
these mutations. A default value of 130 is applied, but if you are in Expert mode, you can change this. This
option appears below the QM Region Summary - Selection_Algorithm=Simple: Filtering to define the QM region is first performed based on CUTOFF_DISTANCE. If the number of atoms in the filtered group is greater than MAXIMUM_ATOMS, the minimum distance of each residue in the filtered region to the mutating ligand will be determined, and the list will be sorted from largest to smallest. Residues will be removed from the list in order, starting with the largest minimum distance, until the MAXIMUM_ATOMS criterion is met.
- Selection_Algorithm=Optimal: Residues that are within CUTOFF_DISTANCE are evaluated for whether they are making specific high-value interactions with the ligand. A Significance Index, based on whether it is making specific high-value interactions, with be determined for each residue. Residues are removed from the list of residues within the cutoff distance in reverse order of Significance Index (least_significant -> most_significant) until the MAXIMUM_ATOMS filter is met. The Significance Index for each residue is determined by a combination of
- Distance to the mutating ligand (closer is better)
- Reduced significance for residues that present only aliphatic or non-polar sidechains to the mutating ligand
- If two (non-ligand) residues in the Optimal set are also involved in a salt bridge with one another, both residues will be given the same significance index (the greater of the values for the two residues)
- The Residue Definition Text Box: In the middle of this section, you will find a text box. In this box, the definitions of the residues that will be included in the QM region are provided. As noted, you must always include the ligand in the QM region definition. To do this, use the keyword “ligand” (all lowercase and no quotes).
The Residue Definition Text Box can be populated either automatically (using the “Suggest a QM Region” button
(below)), or else the user can provide a series of residue definitions “by hand”.
The user can specify the following residue types to be part of the QM region:- Protein residues (amino acids, capping groups)
- Cofactors
Waters (HOH or WAT) cannot be specified as part of the QM region.
The syntax for specifying atoms in the SELECT mode is:
RES_NAME:RES_CHAIN:RES_NUMBER-TYPE
There is a single exception to the above syntax, and that is the specification for the interacting ligand, which uses the keyword ligand, alone on a line.
Examples:
ligand
PHE:A:42-all
TYR: :109-sidechain- RES_NAME: 3-character residue name. The name must exactly match the name in the input PDB. Case is ignored.
- RES_CHAIN: Chain identifier that exactly matches that in the input PDB file. If there is no chain ID in the input file, leave this blank. Case is ignored.
- RES_NUMBER: The residue number. It must exactly match the residue number in the input PDB file.
- TYPE
- all: Include all atoms in the residue (this is the only allowable option for cofactors and for non-standard amino acids).
- backbone: Include only backbone atoms (C, O, CA, HA, N, H)
- sidechain: Include only sidechain atoms (everything but the backbone)
- backboneC: Include only the backbone atoms (C, O, CA, HA). Using this TYPE option requires that this is not a terminal residue, that the next residue in the protein backbone is also included in the SELECT set, and that the following residue is specified with TYPE “all” or “backbone”
- backboneN: Include only the backbone atoms (CA, HA, N, H). Using this TYPE option requires that this is not a terminal residue, that the preceding residue in the protein backbone is also included in the SELECT set, and that the preceding residue is specified with TYPE “all” or “backbone”
- Suggest a QM Region: This button will auto-populate the Residue Definition Text Box, based on the criteria (CUTOFF_DISTANCE, MAXIMUM_ATOMS, Selection_Algorithm) described above. After clicking this button, the Residue Definition Text Box will contain a series of residue identifiers. The user is free to modify the auto-generated list, removing or adding to the list, as desired.
- Validate: Validation reads the list of residue specifiers in the Residue Definition Text Box and ensures that the definitions are valid for the input PDB file. Validation is automatically performed if the text box is filled using the Suggest a QM Region button. The Validate button is only active when new or changed definitions are included in the Residue Definition Text Box that has not already been validated. Functions of Validation, which must be carried out before running the simulation, include:
- Ensure every residue definition matches a residue in the input PDB file
- Add capping groups, if necessary, to ensure usable QM/MM boundaries
- Determine the total number of atoms specified by the residues included in the QM region
- A user-supplied or modified definition of the QM region (i.e. one not generated using the Suggest a QM Region button) will not be subject to the Maximum Atoms filter. If you modify the region, after validation, be sure to note the total number of atoms in the QM region (QM Region Summary, below) and make sure it is reasonable.
- Visualize: Creates a pop-up window that shows the selected QM-region residues in the context of the protein/ligand complex. The reference ligand will be shown in blue, bound to the protein. The protein residues included in the QM region will be shown in red. You can rotate the view using your mouse (left button) or translate the view with the mouse (right button). Double-click on any atom to reset that atom as the center for rotations. If you hover the mouse over any atom, information on the atom and the associated residue (name,
number) will pop up. To close the pop-up window, click the “X” near the top left of the box. - QM Region Summary: To the right of this section, you will find the QM Region Summary, which is populated after you validate the QM region contents. (The definition is auto-validated, and this region is auto-populated if you use the Suggest a QM Region button). In this section you will find:
- Status: Whether the contents of the definition box properly validated. The possibilities are “Validated” or “Invalid.” If the contents are Invalid (there is a problem) or if they are valid but a warning was issued, the status will be shown in red, and you can ascertain more information by hovering the cursor over the question mark.
- Number of link atoms: Number of hydrogen atoms that were added to address unfulfilled valencies that arose from using separated protein residues. These are required for the QM simulation.
- Number of total QM atoms: Total atoms in the QM region, excluding the ligand itself. Based on the reference ligand.
- Total QM charges: Total QM-determined charge in the QM region (ligand + other residues), based on the reference ligand.
QUELO Background / Whitepaper
7.1 Introduction
QUELO revolutionizes computational drug discovery, breaking through the barriers of the previous generation of technology and delivering for the first time a truly integrated solution that combines the accuracy of quantum mechanics (QM) with free energy perturbation (FEP). The platform performs hybrid quantum/classical free energy calculations without any compromises to the QM theory, is easy to use, and delivers results at unprecedented speed and cost. This document provides an overview of the QUELO FEP platform and its core capabilities. This includes how QM has been seamlessly integrated with molecular mechanics (MM), as well as an overview of the unique software design that
powers the platform.
7.2 Core Capabilities
QUELO delivers the accuracy of QM without sacrificing functionality, features or usability from standard MM-based FEP. QUELO QM/MM limits the QM representation to the ligands and residues near the binding site, while using a standard (and computationally less demanding) MM approach for the remainder of the system. The software that carries out the QM calculations, QSimulate’s proprietary QM engine, is written from the ground up to maximize efficiency, to take advantage of modern cloud compute structures, and to optimize compatibility with the host molecular mechanics platform. QUELO reliably predicts binding affinities for traditionally challenging systems like highly polarizable
functional groups, charged ligands, metalloprotein targets, and covalently binding drugs.
The platform offers the same user experience as classical FEP thanks to a fully automated, intuitive front-end portal. All normal controls and options that a user might specify for MM-based FEP remain the same for our implementation (integration approach, temperature and pressure coupling methods, lambda window distribution, sampling at each window, etc.) The only additional option that the user needs to define is the extent of the region treated with QM, which is facilitated by providing the radial distance of residues to consider from the ligand binding site and the maximum number of atoms to include.
QUELO achieves an acceptable turnaround time and cost through QSimulate’s powerful QM engine. The unique and efficient parallelization for both the MM and QM parts of the engine eliminates the historical cost and time hurdles presented by past iterations of QM technology. The QM calculation time (wall clock) is as small as 0.1 second per time step, allowing QM/MM molecular dynamics throughput of up to a nanosecond a day. Through QUELO, QM-based free energy calculations that take conformational sampling into account are finally affordable - surpassing both classical FEP and static, QM-based endpoint methods.
7.3 Platform Implementation Details
QUELO fully automates the complicated process of relative, hybrid topology QM/MM-FEP from end to end. In this form of FEP, binding affinities for a set of ligands to a receptor are calculated from the relative difference in binding affinities (mutational ddG) of a closed series of alchemical mutations (called a mutational map). Mutational ddG are calculated as the difference in free energy between two FEP simulations: one in the receptor-bound state and the other in the unbound state. Each mutation utilizes a hybrid topology that scales interactions and energies of only the changing atoms. During FEP, QM calculations are handled by QSimulate’s proprietary QM engine, while GENESIS is used as the MM engine. The user has a choice controlling the level of theory employed: QM can be executed with two state of the art semiempirical methods, GFN2-xTB or GFN-xTB, while MM is performed using the Amber force field. To obtain binding affinities, the platform prepares input topologies, performs QM-based FEP, and provides the user with a detailed analysis of the results.
QUELO conducts the tedious and error-prone task of preparing input topologies for relative QM/MM-FEP calculations. The platform first generates a mutational map, which is optimized to (1) minimize the structural differences between ligands in each mutation and (2) include all ligands within closed networks to reduce error. Input topologies are then generated for each mutation. Ligands are aligned based on their maximum common substructure to the reference ligand provided by the user to yield an initial receptor-bound structure, and then realigned to each other for each mutation - consistent with a hybrid FEP topology. The user-provided receptor structure is cleaned to remove unwanted crystallographic artifacts and patch certain missing atoms. For each mutation, the mutating ligands are then combined, along with the receptor for the bound topology, and an appropriately-sized solvent box is generated. Finally, force field parameters are generated for the bound and unbound states of each mutation, and the inputs are ready for QUELO to perform QM/MM-FEP.
QM has been carefully integrated into FEP to obtain reliable structures and energies without instabilities or artifacts. Each timestep comprises an MD integration step and requires a pair of QM force calculations and energy evaluations, one for each ligand. The QM region used in the force and energy calculations is cut to user specifications, with the excluded MM region treated through electrostatic embedding to account for polarization. MM charges are used in the construction of one-electron terms in the Hamiltonian of the QM region. Direct space fast multipole expansion is used for consistency with the periodic boundary conditions of the MM code. To avoid instabilities in the MD trajectory, the platform relies on a soft-core potential for the atoms of the QM region, and the interactions between the QM and MM regions. To maintain the integrity of the molecular topology around the first and final FEP windows, a chaperone formalism tethered to MM is enlisted, which scales with the value of in the quantum region. A correction for the inclusion of the chaperone is later applied in post-processing. The QM/MM implementation used with FEP is algorithmically and scientifically stable, as demonstrated by NVE simulations (Figure 1).
The platform conveniently prepares a full analysis of the results for the user. BAR is performed to obtain the ddG for each mutation. Using the mutational map, a cycle closure correction (CCC) is applied to each mutation to correct for hysteresis and determine error estimates. Finally, network analysis extracts the binding affinities from the mutational ddG. The analysis suite printed out for the user comprises: thermodynamic values (binding affinities, mutational ddG, the dG of the bound and unbound mutations) and associated errors, and both a convergence plot over sampling time of the ddG and a plot of the free energy as a function of (or FEP window) for each mutation (Figure 2). All of the data in the displayed tables and graphs, as well as relevant structures from each simulation and full trajectories of the energies, can be downloaded by the user if they wish to perform further analyses.
7.4 Summary
QSimulate’s QUELO platform for QM/MM FEP is a first-of-its-kind breakthrough technology, revolutionizing computational drug discovery. Pushing past the previous limitations for classical FEP, QUELO delivers the accuracy of QM at the binding region with full sampling. This new paradigm extends computational R&D to new drug classes, such as drugs that bind to metalloproteins, drugs that include halogen atoms, drugs that act covalently, and drugs where polarization effects are critical.
Free Energy Perturbation (FEP) using QUELO: A Simple Tutorial
8.1 Overview
This chapter provides a guide to getting started with the QUELO panel. It is intended as a step-by-step introduction to running a sample calculation in the simplest manner. Additional options and details are described in the previous chapters.
In this tutorial, we will set up and run a short Classical FEP simulation. This tutorial also describes the simple changes that would be associated with running the same simulation using a Quantum (QM/MM) approach, although that is not run here.
The QUELO panel makes it very simple for a user to run a set of Free Energy Perturbatation (FEP) calculations for a
ligand/protein system. The calculation can be run with two types of force field:
- Classical: Using a classical, traditional, molecule mechanics (MM) force field (AMBER).
- Quantum: Using a QM/MM force field. The bound ligand and its environment are represented using quantum mechanics. The remainder of the system is represented using a MM force field.
The complex workflow that is required is performed automatically by the platform. The user is only responsible for providing the following information. (See the chapters “Preparing a PDB File for use with QUELO” and “Free Energy Perturbation (FEP) Using QUELO” for more details):
- Protein Receptor Structure: A PDB formatted file that contains the protein structure. The structure should be cleaned up by the user before use so that the only molecules that remain in the file are
- The protein receptor (Remove symmetry replicates if necessary)
- Any co-factors you wish to include in the simulation (remove all others)
- Structural waters (if any)
- NA+ or CL- ions
- ZN metals (only if part of a Zinc Finger)
- Reference Ligand: A ligand structure (in MOL2 or SDF format) that is properly bound in the binding site, and will be used to align all other ligands to the binding site before running the calculation
- Binding Ligands: The remaining ligands that will be binding to the protein receptor, in SMILES format. The relative free energy of binding for each member of the binding set, relative to the reference ligand, will be calculated by the platform.
For this tutorial, you will be using a set of three files that have been supplied for use with the tutorial:
- Protein_Receptor_CDK2.pdb
- Reference_Ligand_1q.sdf
- Ligands_Small_Set_CDK2.smi
The demo will describe how to run the panel in Standard mode. There is also an Advanced mode that provides additional
options that an expert might wish to address. Advanced mode is described in detail in the “Free Energy Perturbation
(FEP) Using QUELO,” but will not be discussed here.
8.2 The FEP Task List
When you enter the platform, you will be presented with the FEP Task List, a list of calculations (Tasks) that you have previously set up and/or run, as well as a dialog to create a new Task. Clicking on a task will bring you to the setup/results page for that Task.
If this is the first time you have used the platform, your Task list will be empty and will appear as shown.
To create a new task for this tutorial, specify a name in the blank field next to name (in the example, shown by the red arrow, we have used the name FEP_tutorial). Then click the Add button, shown by the red arrow to the right. Once you add the new task, the panel will immediately open to allow you to set up and run the calculation. If you wish to navigate back to the task list, click the word “Home” at the top of the panel, to the right of the QSP Life logo, and you will be taken to the table, which, if this is the first job you have added, will look like this:
From the task table, just click on the FEP_Tutorial Task line to open the setup panel again.
8.3 File input specification
In this section of setup, you will specify the input files with the protein receptor, the reference ligand, and all remaining ligands among which relative free energies will be evaluated. This is the only input the user needs to provide in most cases–platform chosen defaults are sufficient for everything else.
For each file class, click on the Browse button to identify the file you wish to use on your local computer, then click on Upload to import that file. Checks will be performed during the upload process to ensure that the file is suitably formatted. Sample files for each of these inputs have been provided to you for this tutorial, as noted in the Overview section.
Note that you must supply a name for the reference ligand before you can perform the upload. The name can be any alphanumeric string. You must supply a name even if the file you are importing contains a name.
Once you uploaded the files, this portion of the panel should look as follows:
Note that the input ligand structures are all referenced now by both the corresponding SMILES strings and by a name. Because all the input SMILES file for input in this example contain names, those are used. If names were not provided, default names would have been provided. A red “X” appears at the end of each input molecule, and if you click on that X, the corresponding molecule will be deleted from the input. Additionally, you will find a red “X” in a bolder font at the top of the column of red X buttons for the set of “All ligands”. If you wish to delete all the added ligands in this section at once, click on this bolder X.
The option “Generate all stereoisomers for additional ligands with undetermined chiral centers” is provided for cases where the chirality in the input SMILES is not specifically defined. Generally, this should be avoided. This is not a problem with the tutorial data set, and we do not use this option here.
8.4 Mutation Map
Once the file uploads are complete, you can create the mutation map. The mutation map creates a set of connections among the ligands that define the set of FEP calculations that will be used to evaluate the relative free energies of binding for the set of ligands. The Mutation Map must be determined before continuing with the calculation. Once all the input files have been supplied, press the Generate Mutation Map button that appears on the bottom right of the File Uploads section. An automated iterative optimization is carried out to define the mutation map.
The mutation map has three goals
- Evaluate similarity among the ligands, and create mutation paths between ligands that are closely related (for which relative free energy calculations tend to converge more readily)
- Ensure that there is at least one pathway that connects every input ligand to the other input ligands
- Add redundant mutation paths that can help reduce the overall error in the calculated free energies
The map generated for this Tutorial includes four ligands–the Reference ligand and the three additional ligands. The map consists of nodes (shown as circles) and edges (lines that connect the nodes). Each node corresponds to one of the input ligands as defined by the user. The character string next to each node is the name of the ligand (see the Name field of the ligands table). Each edge is a mutation that will be carried out by the program. The width of the edge reflects the similarity, with thicker edges being more similar, and thin edges being less similar. If you move the cursor over any edge, a number will appear that provides a similarity score, which can range from 0.0 (completely dissimilar) to 1.0 (identical).
Hovering the cursor over any node will present the 2D representation of the ligand corresponding to that node.
8.4.1 Mutation Map Editing
Using the editing tools, it is possible to clean up the map presentation on the screen, and to add or remove edges (mutations) from the map. For this tutorial, we won’t be changing the map. In addition, the Prettify button can be used to adjust the positions of the nodes on the screen to improve the visual presentation. The map in this example is very simple, so the changes with Prettify will be nominal. If you want to set up an example where Prettify effects bigger changes, use the left cursor of your mouse to drag the i9 node so that it effectively overlaps the iy ligand, then click Prettify, the representation will bounce back into a clean-looking square.
8.5 Simulation Options
Next to the Mutation Map, you will find the Simulation Options. Here, you can choose the FEP force field type (Classical or Quantum [QM/MM]), the number of sampling windows in lambda, and the sampling simulation time at each window in ns.
If you choose Classical FEP, you can start the job from this point. If you choose Quantum FEP, then you will need to set up the region of the system that will be treated using quantum mechanics in the next part of the panel.
The panel for the Classical and Quantum options is shown below, and you will notice that with the Quantum option, the Start Simulation button does not appear (because you need to set the QM region), and that the default amount of sampling changes.
For this tutorial, we are going to perform much less sampling than would normally be performed for a production run, since the purpose of this tutorial is simply to get familiar with the panel. Set the Simulation Time (ns) to 1.0, as shown.
Note that there are a number of additional options available to the advanced user, accessible by setting the Expert Mode in settings. We will not describe these in this tutorial, but they are described in detail in the chapter “Free Energy Perturbation (FEP) Using QUELO”.
8.6 QM Region Specification (Not Shown for Classical Simulations)
The QM region specification only appears if you are running a Quantum calculation. If you are running Classical, you do not need to specify the QM region and this panel is hidden. The tutorial is focused on running a Classical simulation, and you will not see this part of the panel. But if you wanted to run a Quantum simulation, this is the one significant change in setup, so it is described here for reference. You can safely skip this section if you are running the Classical demo.
Here, you specify what residues within the system are treated using a QM representation. The default workflow here is typically sufficient, and it is what we will use. If you need to specify specific residues to include, that is also possible, and is described in detail in the chapter “QM Region Selection.”
You must always include the ligand in the QM region, using the special specification keyword “ligand” (all lowercase, no quotes). Here you are setting the remaining residues to be included in that region. The default uses the “Optimal” selection algorithm, which identifies residues making specific polar interactions with the ligand, and prioritizes including those. If there is a capacity to include other residues in the vicinity of the ligand without exceeding the Region Max Atoms, then those will be included on the basis of distance from the ligand (closer ligands first).
The default workflow is accomplished by first pressing the “Suggest a QM Region” button, then clicking on “Start Simulation.” If you wish to modify the contents of the QM region, you can edit the text in the definitions box in the center, using the syntax described in the “QM Region Selection” chapter. If you do modify the contents of this box, be sure to then click on the Validate button to ensure the contents are correct.
If you wish to visualize what the chosen QM region looks like, you can click the Visualize button, and a pop-up with the 3D structure will be shown. It is not required you do this, but it is good practice to take a look before running a substantial calculation. If you choose to visualize, the reference ligand will be shown in blue, bound to the protein. The protein residues included in the QM region will be shown in red. You can rotate the view using your mouse (left button) or translate the view with the mouse (right button). Double-click on any atom to reset that atom as the center for rotations. Hover over any atom to see the residue name and number. To close the pop-up window, click the “X” near the top left of the box.
8.7 Start Simulation
You will start the simulation either using the button below the FEP options (Classical) or to the right of the QM Region Selection (Quantum). Once you start the simulation, information in the Simulation Status region will start to populate.
8.8 Simulation Status
This portion of the panel provides the status of the calculation once it has been started using the Start Simulation button. It also provides the ability to stop and restart a calculation that has been submitted.
- Stop: Stop a calculation that was previously submitted and is in progress. A stopped calculation is saved in the cloud storage associated with your account, and can be restarted later, using the “Run” command.
- Run: Run a previously Stopped job. If the job has not previously been started using “Start Simulation,” then the Run button has the same effect as “Start Simulation,” i.e., it will begin the calculation.
For this tutorial, you will probably not use these buttons. But if you realize you have made a mistake, or wish to stop the job for another reason, these are the buttons that will let you do that.
Below, and also to the right of the control buttons you will find information about the status of your job. The total estimated virtual CPU usage (vCPU) is given, as are progress bars for each of the stages of the simulation: Preparation/Equilibration/Production/Analysis. Color coding is as follows:
- Light Grey: That segment of the calculation workflow has not been run.
- Dark Blue + Light Grey: That segment of the workflow is in progress, and the dark blue portion of the segment reflects a progress bar.
- Dark Grey: That segment of the workflow has been completed.
- Red: That segment of the workflow was completed with errors.
Further down in this portion of the page, the status is provided at a more detailed level, showing the progress for every ligand mutation (every mutation represented by an edge in the mutation map. The names of the starting and ending ligands for each mutation are shown in the table, along with a four-segmented progress bar. The four segments for the progress bar represent Preparation/Equilibration/Production/Analysis. Each segment is colored as described above.
8.9 Results
Below the “Simulation Status” section, you will find the results of your calculation. The information in this section will be auto-populated when the simulations requested have been completed. The data is provided in four sections:
Note that because initial velocities for the simulations are assigned randomly, the results you obtain if you run this tutorial will not be exactly the same as those shown. This is expected.
8.9.1 Result for Each Mutation
One result will appear in this table for every edge in the Mutation Map. The relative free energy changes for all the edges will be calculated. Performing more than the bare minimum number of changes required to connect the ligands helps provide more reliable net free energy difference values and associated errors in the Free Energy Results table (below).
For each mutation that was run, the following will be reported. Note that because the initial velocities for molecular dynamics are assigned stochastically, the results you get will not match these exactly. For converged simulations, though, the results for different simulations should be similar:
- Mutation: Sequence number for the mutation calculation, starting from one.
- Start Ligand: The name of the starting ligand for the mutation.
- End Ligand: The name of the ending ligand for the mutation. for Start Ligand.
- ddG: The computed free energy of binding (ddG) (in kcal/mol) for the specified mutation between the Start and End Ligands. ddG is calculated as the difference [dG(complex) - dG(ligand)], which provides the net free energy of binding via the thermodynamic cycle relationship.
- Sampling Error: Sampling error (kcal/mol) is estimated from statistical analysis carried out during the BAR analysis for the specified mutation.
- dG(complex): The free energy (kcal/mol) calculated for the mutation between the Start and End ligands in the ligand/protein complex.
- dG(ligand): The free energy (kcal/mol) calculated for the mutation between the Start and End ligands calculated free in solution.
Below the Mutations table, an Export button is provided:
- Export table: Download the data in the Mutations table as a CSV-formatted file.
The table can be sorted by any column. If you select a row, the two graphs above the table (described below) will be populated with data reflecting that simulation.
8.9.2 Accumulated free energy and convergence plots
Clicking on any row of the Mutations table will pop up two plots.
- Accumulated free energy as a function of lambda: This graph presents the accumulated ddG free energy differences versus lambda for the transformation selected in the Mutations table. The lambda values for the physical endpoints are always 0 and 1.
- Free energy convergence versus sampling: This graph presents the estimated convergence in ddG for the transformation chosen in the Mutations table. The values in this graph provide a sense of how increasing the amount of sampling performed at each of the lambda points affects the overall convergence of the calculation.
In addition, 2D representations of the ligands corresponding to the two endpoints of the mutation are presented at the top of the pop-up window.
8.9.3 Free energy results
In this table, the net relative free energies for all the input ligands are presented. The net free energies are evaluated by averaging the energies for multiple paths (edges) that connect the ligand to the reference ligand. The Sampling Errors represent the maximum error for any non-redundant closed loop of edges that includes the ligand.
- Ligand: The name of the ligand. The name is taken from the user input ligand file.
- ddG: The free energy of binding for the ligand (kcal/mol), relative to the reference ligand. The reference ligan is arbitrarily assigned a free energy of binding of 0.00. Relative free energy values ddG are calculated as ddG = [dG(complex) = dG(ligand)].
- Sampling Error: Sampling error (kcal/mol) from the closed pathway of edges.