From CAGT
ClusPro Help
| The ClusPro Web Server software on this system is operational. It is currently being used to participate in the CAPRI competition as the first fully automated Protein-Protein Docking web server. |
|
The ClusPro Algorithm: ***Note that some crystal structures have more than one crystal in the unit cell. In these cases, it is imperative to specify the chain(s) of only one of the crystals*** The user also has the option to upload the PDB files from a local machine. DOCKING Docking with DOT is currently done using DOT 1.0 with a 1A grid spacing (which is adjusted automatically to 2A for really large proteins). During the docking with DOT, only the shape complementarity fuction is used. We do not incorporate any electrostatics into the docking portion. Here, the top 20,000 structures are retained. Docking with ZDOCK is done using ZDOCK v.2.3, and the scoring function is Pairwise Shape Complementarity + Desolvation + Electrostatics. The top 2,000 conformations are retained in a ZDOCK run. ENERGY + FILTERING The complexes generated from the docking step are then rapidly screened, and the Desolvation and Electrostatic energies are calculated. ClusPro retains the top 2,000 structures from the docking stage, regardless of the docking software used. As a default, ClusPro keeps the top 1500 electrostatic conformations, and the top 500 desolvation conformations. PAIRWISE BINDING SITE RMSD CALCULATION For each of the 2,000 ligand conformations retained in the filtering stage, we calculate the residues of that ligand that have at least one atom within 10 Angstroms of any receptor atom. These residues are now considered the binding site of that ligand. Then, the RMSD of that binding site is calculated between that particular conformation and each of the 2,000 other conformations. This will yield a 2000x2000 Matrix R, and it is important to note that Rij != Rji. So, for example, for ligand.1, the binding site is defined by residues 20-50 and 100-135. We then extract only these C-alpha residues from each of the 2000 ligands, and calculate that RMSD. This value would be put into R12 If ligand.2 has the binding site defined by residues 40-95, we will do the same. As you can see, Rij does not equal Rji, as there are different residues being taken into account. This value would be put into R21. CLUSTERING The Clustering is done using a greedy algorithm. Here, we assume that the native binding site is a broad, deep free energy well, while there are local minima scattered throughout the free energy landscape. The number of structures in each energy well should, therefore, be proportional to the size of the well, validating our use of a greedy clustering algorithm. First, in our 2000x2000 RMSD matrix, we find the structure that has the most neighbors under the clustering radius, and call this the first cluster center. We then extract all of those neighbors from the RMSD matrix, and place them in the first cluster. We then go back to the RMSD matrix, and find the structure with the next highest number of neighbors, and call that the second cluster center, and so on. The models returned by ClusPro are ranked according to the size of the cluster based on our assumption of how the free energy wells are populated. HOMO-MULTIMERIC DOCKING The same docking and filtering steps are used as in the standard version of ClusPro. However, with each of the decoys generated, we calculate whether or not it fits the criteria for being symmetric. That is, we calculate the rotation and translation matrix between the receptor and the docked ligand conformation. We then move then move the docked ligand to generate a third structure. This process continues until N+1 structures are generated. A ligand is considered symmetric if the 1st and (N+1)th structures have less than an 8A RMSD and there are no significant overlaps between the 1st and Nth structures. The top symmetric structures are then clustered amongst themselves to eliminate any possible redundant structures. The structure with the tightest symmetry (ie, the lowest RMSD between 1 and N+1) is chosen as the cluster center. The cluster centers are then clustered against the top 2000 energetically favorable conformations. The number of energetically favorable neighbors is then used as the score for the symmetric cluster center. The cluster centers are then ranked according to these scores. ClusPro is able to discriminate between these special cases of tetramers and hexamers and the regular N-mer formations. For a detailed description of the algorithm, please view our paper in the Journal of Structural Biology (currently In Press). Methodology developed by CJ Camacho DW Gatchell and S Vajda. For Software References, please cite: Thank you for using ClusPro |

