UofT PSMDB: Description

PSMDB
The Protein - Small-Molecule DataBase

Lilien Lab
Department of Computer Science
Centre for Cellular and Biomolecular Research
University of Toronto

Description

The database is constructed using the following steps:

The latest PDB release is filtered using the annotation provided from the PDBsum database. The following complexes are selected: structure determination by X-ray crystallography with resolution better than 2 angstrom; at least one chain is longer than 50 amino-acids; the bound ligand is correctly annotated as HETATOM.
Remove ligands with less than 7 (or 13) heavy atoms or with molecular weight greater than 800 Dalton.
Remove ligands that are covalently bound to the target protein. In case of multiple ligands, retain the non-covalent ligands.
Calculate pairwise sequence similarity between all pairs of proteins (bl2seq). Calculate the pair-wise Tanimoto coefficient of the Daylight fingerprints between all pairs of ligands.
Construct similarity matrices (protein vs. protein and ligand vs. ligand) and convert them into binary matrices using a 25% sequence similarity threshold for proteins and a 0.85 Tanimoto coefficient threshold for ligands.
Multiply the protein and ligand similarity matrices (dot product).
Construct a graph in which every vertex corresponds a complex and an edge between two nodes exists if the similarity between the corresponding complexes is higher than the threshold (i.e. there is a '1' in the corresponding entry of the binary similarity matrix).
Select a maximal set of nodes (complexes) such that no two nodes in the set have a connecting edge. That is, select the maximal set of complexes where no two selected complexes are considered similar.

At the Downloads page we provide non-redundant lists combining the following parameters:

7 and 13 minimal number of heavy atoms (see Step 2).
25%, 50% and 90% protein sequence identity (see Step 5).
0.85 and 0.7 Tanimoto coefficient fingerprints similarity (see Step 5).

Additional files: We separate all complexes used for the non-redundant selection (step 8 above) into two files. One file contains the protein structure without the bound ligands in a PDB format. The second file contains all non-covalently bound ligands in a SDF format.

Home | Description | Downloads | Contact Us | CSS by: Hosting Colombia