Lilien Lab
Department of Computer Science
Centre for Cellular and Biomolecular Research
University of Toronto

Proteins and small-molecules similarity space
The images below demonstrate the similarity distribution of a subset of proteins (high-resolution, x-ray, complexes) in the PDB. Each protein is represented with a blue point. The distance between points reflects sequence similarity. Clumps (clusters) of points indicate groups of proteins with highly similar sequence. Proteins in a cluster are predicted to have highly similar structures.

The first image illustrates the distribution of the whole set of protein - small-molecule (high resolution, x-ray) complexes with respect to the protein pair-wise sequence similarity. The second image shows the similarity distribution of the non-redundant set (25% similarity threshold) and the third image shows the similarity distribution of a randomly selected set of similar size. This demonstrates that our selection of non-redundant proteins more evenly represents protein sequence space than a random subset.


Images produced using the t-Distributed Stochastic Neighbor Embedding algorithm.