Skip to content

PROTCLEAN: THE PROTEIN PREPARATION TOOL

1.1 Overview

For many of the functionalities provided by the QSimualte platform, you will need to specify a protein receptor. This includes both QUELO FEP and QuValent. These panels integrate some automated functionality for preparing an input PDB protein structure (assigning parameter types, adding hydrogens, removing bulk water, etc.). However for many PDB structures obtained from the protein data bank or elsewhere, additional cleanup and correction may be required.

The purpose of the ProtClean tool is to provide the ability to carry out these additional tasks. Note that you are not required to run the ProtClean, and if you have other protein preparation tools you would prefer to use, that is acceptable. The only requirement is that the PDB structure you provide as input to QUELO or QuValent has properly been treated in some programs.

The primary features of ProtClean are its ability to perform the following tasks:

  • Add any missing hydrogens
  • Determine the protonation states of residues in the protein where protonation can vary near physiological pH (HIS, LYS, ARG, ASP, GLU)
  • Add any missing protein structures (loops, sidechains) that may be missing in the PDB file
  • General cleanup/rationalization of residue IDs and atom numbers to be consistent with the requirements of QUELO/QuValent
  • Allow the user to select which elements (chains, cofactors) are to be retained for the calculation. In particular, this feature allows the user to remove unnecessary symmetry replicates or binding partners.
  • Optionally, add an explicit atom model of the membrane, for membrane-bound proteins

Missing protein structure is added using a locally implemented version of Alphafold, which provides reliable structure generation for many proteins. Because it is locally implemented, structural information remains securely behind the AWS-provided security wall. Missing structure (if any) is determined based on SEQRES records in the input PDB.

Below, the ProtClean interface is described in more detail.

ProtClean has been developed to be able to robustly handle PDB-format-compliant structures, such as those that would be downloaded from the Protein Data Bank (RCSB) website, or that are generated by software that has been developed to adhere to the PDB rules. Note that non-compliant PDB files can trigger errors (that will be reported to the user) during processing.

 

1.2 The ProtClean Task List

When you enter the platform, you will be presented with the ProtClean Task List, a list of calculations (Tasks) that you have previously set up and/or run, as well as a dialog to create a new Task. Clicking on a Task will bring you to the setup/results page for that Task.

For more details on the Task List, see the chapter “QSP Life User Interface.”

 

1.3 Expert Mode

A small number of options (described below) are only shown in Expert Mode. The options shown in (default) Standard Mode are sufficient for most users to run a reliable simulation. If you need access to Expert Mode, that is accomplished via a toggle in the User Settings panel.


For more details on enabling Expert Mode, see the chapter “QSP Life User Interface.”

 

1.4 File Input Specification

When you click on a Task with New status in the ProtClean Task table, you enter the setup dialogs for that Task. At the top of the setup, you will need to specify the input PDB file that you wish to prepare. This file should contain the protein receptor you will be subsequently using in your calculation (QUELO or QuValent). This file may also contain various cofactors, waters, and other proteins (and nucleic acids) that may interact with the protein of interest. For example, a PDB file obtained from the protein data bank will often contain multiple symmetry-related replicates that appear in the unit cell (but are not critical to understanding binding), cofactors that are included only to help with crystallization, etc.

QUELO_Manual_031224 8-4

 

Click on the Browse button to use your browser’s navigation dialog to select the input PDB format file to be used. Click on the Upload button to process the PDB file. A blue bar will appear on the next line and indicate when the upload is complete.

 

1.4.1 Membrane Receptor

This toggle box appears below the PDB file input dialog. If the protein you are preparing is membrane-bound (e.g., a GPCR) you may wish to create an explicit atomic membrane model for the protein. This can be accomplished by toggling this option on. When you toggle this option on, a new section of the panel will appear below the Options section, where you can specify the type of lipid bilayer to use, and where you want the protein to be located relative to the bilayer. (See the Membrane Equilibration description below). If your protein is not membrane-bound, or you wish not to use a membrane model, leave this box unchecked.

Note that using a lipid bilayer will result in a calculation that is more costly to run.

Equilibrating the requested membrane receptor takes some time, and can require a few hours of preparation time.

Also note that if your input structure already includes the full atomic lipid bilayer, you should not use ProtClean.

ProtClean cannot handle input structures with an atomic lipid bilayer already in place. Those structures should be input directly into QUELO/QuValent. The names of the bilayer atoms in such a case must follow the Lipid17 or Lipid21 convention.

1.5 Options

 

1.5.1 The Inclusion Table

Once the PDB file is uploaded, the contents of the file will be analyzed and displayed in a table in the Options section. In this table, you will find a list of all the chains in the file and the contents of those chains.

QUELO_Manual_031224 8-5

There are two user-adjustable columns in this table:

  • Select: This user-selectable toggle appears in the fourth column, and can be used to include or exclude an entire chain from the prepared PDB output file. By default, all chains are included. If you deselect the chain, all elements of the chain (protein residues, water, cofactors) will be excluded from the output. If you include the chain, all protein and water residues are included; the user can choose to include or exclude cofactors (which are, by default, excluded). If you de-select a chain, it will turn to a grey color in the 3D visualizer (see below).
  • Edit: TheeditbuttonappearsintheCofactors(fifth)column. Thisbuttonisactiveifanycofactorswereidentified for that chain. If the Edit button is not selectable, it is because no cofactors were identified for the corresponding chain.

1.5.2 The Edit Cofactors Dialog

Clicking on the Edit button for a chain where cofactors were identified will present a dialog where the user can choose cofactors to include via toggles. By default, all cofactors are excluded. Note that metals and other ions are, like waters, not shown in this dialog (but are retained in the final structure).

The toggle in the third column, Select, can be used to keep the cofactor in the prepared output. For example, in the
following, Cofactor P32, residue 400, would be kept for Chain B:

QUELO_Manual_031224 8-5-2-1

If you click on any of the rows in the Cofactor Inclusion dialog, the visualizer on the left will zoom in on that cofactor. For example, clicking on the P32 row zooms in as shown below:

QUELO_Manual_031224 8-5-2 (1)

Mouse actions on the contents of the visualizer in the Cofactors Dialog are the same as described for the 3D Visualizer (below).

 

1.5.3 The 3D Visualizer

On the right-hand side of the Options portion of the panel, you will find a 3D visualizer of the full system. The protein chains are shown using a cartoon representation. Waters and cofactors (if any) are not shown. Each chain is automatically colored using a different color. If a chain is de-selected (using the Inclusion Table toggle), the color of the corresponding chain will change to grey in the 3D visualizer.

Below is an example: Four crystallographic replicate chains are in the input PDB file. We have deselected all but the “B” chain, and the resulting view (with all but the “B” chain in grey) is shown at the right:

QUELO_Manual_031224 8-5-3

Mouse actions available in the 3D visualizer:

  • Left-click Plus Drag: Rotates the contents of the visualizer
  • Scroll-wheel: Zoom in/out
  • Double-click: Will move the residue you click upon to the center of the box and reset the center of rotation to that residue.
  • Right-click Plus Drag: Moves the contents of the visualizer in the (x/y) plane.
  • Hold down the scroll wheel and move the mouse: Adjust the clipping plane

 

1.5.4 Start Preparation

If you have not requested the generation of an explicit membrane, then the Start Preparation button will appear at the bottom right of the Options section of the panel. Clicking this button will start the protein preparation.

If you have requested an explicit membrane, then this button will instead appear at the bottom right of the next section, Membrane Equilibration.

 

1.6 Membrane Equilibration

This section of the panel only appears if you have toggled the “Membrane Receptor” in the protein input section. In this case, this section of the panel allows you to specify the type of lipid bilayer to create, and where the protein should be situated in the lipid bilayer.

QUELO_Manual_031224 8-6

There are two user adjustments in this section of the panel.

  • Lipid: Specify the type of lipid bilayer to use. Several options are provided:
    • POPC: Phosophatidylcholine phospholipids. Asymetric carbon tails (one 16 carbon, one 18 carbons)
    • POPE: Phosphatidylethanolamine phospholipids. Asymetric carbon tails (one 16 carbon, one 18 carbons)
    • DOPC: Phosophatidylcholine phospholipids. 18 carbon symmetric tails
    • DOPE: Phosphatidylethanolamine phospholipids. 18 carbon symmetric tails
    • DLPC: Phosophatidylcholine phospholipids. 10 carbon symmetric tails
    • DLPE: Phosphatidylethanolamine phospholipids. 10 carbon symmetric tails
  • Bilayer position (slider): Using the slider, the user can adjust the position of the bilayer via z-translation. If the user has obtained the membrane-bound protein from the OPM (Orientations of Proteins in Membranes) database, then it will be typically be aligned so that the default position of the slider (centered on z=0) is optimal. But if the protein structure is coming from another source, it will likely be necessary to adjust the alignment here. If the protein structure is not coming from the OPM, it is critical to ensure that the protein is aligned so that the z-axis of the protein is perpendicular to the x/y plane sides of the rectangular “box” defined by the lipid bilayer.
    The residues of the protein in this viewer are either yellow (hydrophobic) or blue (hyrophilic). If adjusting the slider,
    attempt to maximize the amount of yellow in the bilayer, and minimize the amount of blue in the bilayer.
The chosen bilayer will be included at the specified position. The starting model is an equilibrated all-atom bilayer generated using the Amber Lipid 17 force field. Atoms of the bilayer that would overlap atoms of the protein are omitted, and equilibration is performed on the resulting model. Some additional specifics of the bilayer model used will appear below the Lipid selector dialog on the left side of the panel. These are informative, and the user cannot directly change them (they change with modifying the Lipid model
or the slider).

For more information about the OPM and about the atomic membrane bilayers, see

 

https://opm.phar.umich.edu/about#methods_and_definitions

https://pubs.acs.org/doi/epdf/10.1021/ct4010307 

Note that membrane equilibration can be somewhat time-consuming, possibly taking a few hours to complete.

 

1.6.1 Start Preparation

If you are generating an explicit membrane, then the Start Preparation button will appear at the bottom right of the Membrane Equilibration portion of the panel.

Clicking this button will start the protein preparation.

 

1.7 Simulation Status

This portion of the panel provides the status of the calculation once it has been started using the Start Preparation button. It also provides the ability to stop and restart a preparation that has been submitted.

QUELO_Manual_031224 8-7

  • Stop: Stop a calculation that was previously submitted and is in progress. A stopped calculation is saved in the cloud storage associated with your account and can be restarted later, using the “Run” command. 
  • Resume: Resume a previously Stopped job.

Below, and also to the right of the control buttons, you will find information about the status of your job. The total estimated virtual CPU usage (vCPU) is given, as is an overall progress bar.

Most of the steps of protein preparation are relatively fast. The one step that takes the vast majority of the time is a prediction of missing structural elements. This calculation is performed using a local implementation of AlphaFold. This is an advanced ML method that is trained on a vast amount of protein structure and sequence variational data, and in many cases can very accurately predict missing structure. AlphaFold can sometimes take as much as an hour to run, although it is faster than that for many systems.

 

1.8 Results

Below the “Simulation Status” section, you will find the results of your calculation. The information in this section will be auto-populated when the simulations requested have been completed.

QUELO_Manual_031224 8-8

1.8.1 3D Results Viewer

The 3D viewer in the results portion of the panel allows the user to visualize the resulting prepared structure, and to compare it to the input structure. Three buttons appear in the visualization space:

  • Original: This is the input structure for the chain(s) that were selected
  • Output: This is the output structure, including the protein and (if requested) the membrane.
  • Overlay: Overlays the input structure with the output structure to help visualize any changes that were made For example, if missing residues were replaced in the structure during preparation, the overlay will make these apparent.

Mouse actions are as described in the 3D Visualizer section (above).

 

1.8.2 PDB Download

This button allows the user to download the resulting prepared protein system, in PDB format. The downloaded file can subsequently be input to QUELO or QuValent. The prepared protein will include the elements chosen in the Inclusion Table, and any missing residues of the protein will have been generated using Alphafold. It will also contain the atomic coordinates for the membrane if one was generated.