PSMDB
The Protein - Small-Molecule DataBase
The Protein - Small-Molecule DataBase
Lilien Lab
Department of Computer Science
Centre for Cellular and Biomolecular Research
University of Toronto
Description
The database is constructed using the following steps:
- The latest PDB release is filtered using the annotation provided from the PDBsum database. The following complexes are selected: structure determination by X-ray crystallography with resolution better than 2 angstrom; at least one chain is longer than 50 amino-acids; the bound ligand is correctly annotated as HETATOM.
- Remove ligands with less than 7 (or 13) heavy atoms or with molecular weight greater than 800 Dalton.
- Remove ligands that are covalently bound to the target protein. In case of multiple ligands, retain the non-covalent ligands.
- Calculate pairwise sequence similarity between all pairs of proteins (bl2seq). Calculate the pair-wise Tanimoto coefficient of the Daylight fingerprints between all pairs of ligands.
- Construct similarity matrices (protein vs. protein and ligand vs. ligand) and convert them into binary matrices using a 25% sequence similarity threshold for proteins and a 0.85 Tanimoto coefficient threshold for ligands.
- Multiply the protein and ligand similarity matrices (dot product).
- Construct a graph in which every vertex corresponds a complex and an edge between two nodes exists if the similarity between the corresponding complexes is higher than the threshold (i.e. there is a '1' in the corresponding entry of the binary similarity matrix).
- Select a maximal set of nodes (complexes) such that no two nodes in the set have a connecting edge. That is, select the maximal set of complexes where no two selected complexes are considered similar.
- 7 and 13 minimal number of heavy atoms (see Step 2).
- 25%, 50% and 90% protein sequence identity (see Step 5).
- 0.85 and 0.7 Tanimoto coefficient fingerprints similarity (see Step 5).