Mining sequential patterns for protein fold recognition (Journal article)

Exarchos, T. P./ Papaloukas, C./ Lampros, C./ Fotiadis, D. I.

Protein data contain discriminative patterns that can be used in many beneficial applications if they are defined correctly. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. Protein classification in terms of fold recognition plays an important role in computational protein analysis, since it can contribute to the determination of the function of a protein whose structure is unknown. Specifically, one of the most efficient SPM algorithms, cSPADE, is employed for the analysis of protein sequence. A classifier uses the extracted sequential patterns to classify proteins in the appropriate fold category. For training and evaluating the proposed method we used the protein sequences from the Protein Data Bank and the annotation of the SCOP database. The method exhibited an overall accuracy of 25% in a classification problem with 36 candidate categories. The classification performance reaches up to 56% when the five most probable protein folds are considered. (C) 2007 Elsevier Inc. All rights reserved.
Institution and School/Department of submitter: Πανεπιστήμιο Ιωαννίνων. Σχολή Επιστημών και Τεχνολογιών. Τμήμα Βιολογικών Εφαρμογών και Τεχνολογιών
Keywords: data mining,sequential patterns,fold recognition,hidden markov-models,support vector machines,structure prediction,structural class,neural-networks,amino-acid,classification,sequences,discovery,accuracy
ISSN: 1532-0464
Link: <Go to ISI>://000255260200014
Appears in Collections:Άρθρα σε επιστημονικά περιοδικά ( Ανοικτά)

Files in This Item:
File Description SizeFormat 
Exarchos-2008-Mining sequential pa.pdf496.34 kBAdobe PDFView/Open    Request a copy

 Please use this identifier to cite or link to this item:
  This item is a favorite for 0 people.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.