Scan2S: increasing the precision of PROSITE pattern motifs using secondary structure constraints. Academic Article uri icon

Overview

abstract

  • Sequence signature databases such as PROSITE, which include protein pattern motifs indicative of a protein's function, are widely used for function prediction studies, cellular localization annotation, and sequence classification. Correct annotation relies on high precision of the motifs. We present a new and general approach for increasing the precision of established protein pattern motifs by including secondary structure constraints (SSCs). We use Scan2S, the first sequence motif-scanning program to optionally include SSCs, to augment PROSITE pattern motifs. The constraints were derived from either the DSSP secondary structure assignment or the PSIPRED predictions for PROSITE-documented true positive hits. The secondary structure-augmented motifs were scanned against all SwissProt sequences, for which secondary structure predictions were precalculated. Against this dataset, motifs with PSIPRED-derived SSCs exhibited improved performance over motifs with DSSP-derived constraints. The precision of 763 of the 782 PSIPRED-augmented motifs remained unchanged or increased compared to the original motifs; 26 motifs showed an absolute precision increase of 10-30%. We provide the complete set of augmented motifs and the Scan2S program at http://physiology.med.cornell.edu/go/scan2s. Our results suggest a general protocol for increasing the precision of protein pattern detection via the inclusion of SSCs.

publication date

  • September 1, 2008

Research

keywords

  • Databases, Protein
  • Protein Structure, Secondary
  • Sequence Analysis, Protein
  • Software

Identity

Scopus Document Identifier

  • 51349160068

Digital Object Identifier (DOI)

  • 10.1002/prot.22008

PubMed ID

  • 18320586

Additional Document Info

volume

  • 72

issue

  • 4