A generalized knowledge‐based discriminatory function for biomolecular interactions

B Bernard, R Samudrala - Proteins: Structure, Function, and …, 2009 - Wiley Online Library
B Bernard, R Samudrala
Proteins: Structure, Function, and Bioinformatics, 2009Wiley Online Library
Several novel and established knowledge‐based discriminatory function formulations and
reference state derivations have been evaluated to identify parameter sets capable of
distinguishing native and near‐native biomolecular interactions from incorrect ones. We
developed the r· m· r function, a novel atomic level radial distribution function with mean
reference state that averages over all pairwise atom types from a reduced atom type
composition, using experimentally determined intermolecular complexes in the Cambridge …
Abstract
Several novel and established knowledge‐based discriminatory function formulations and reference state derivations have been evaluated to identify parameter sets capable of distinguishing native and near‐native biomolecular interactions from incorrect ones. We developed the r·m·r function, a novel atomic level radial distribution function with mean reference state that averages over all pairwise atom types from a reduced atom type composition, using experimentally determined intermolecular complexes in the Cambridge Structural Database (CSD) and the Protein Data Bank (PDB) as the information sources. We demonstrate that r·m·r had the best discriminatory accuracy and power for protein‐small molecule and protein‐DNA interactions, regardless of whether the native complex was included or excluded, from the test set. The superior performance of the r·m·r discriminatory function compared with seventeen alternative functions evaluated on publicly available test sets for protein‐small molecule and protein‐DNA interactions indicated that the function was not over optimized through back testing on a single class of biomolecular interactions. The initial success of the reduced composition and superior performance with the CSD as the distribution set over the PDB implies that further improvements and generality of the function are possible by deriving probabilities from subsets of the CSD, using structures that consist of only the atom types to be considered for given biomolecular interactions. The method is available as a web server module at http://protinfo.compbio.washington.edu. Proteins 2009. © 2008 Wiley‐Liss, Inc.
Wiley Online Library