Please use this identifier to cite or link to this item: https://olympias.lib.uoi.gr/jspui/handle/123456789/11064
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKalogeratos, A.en
dc.contributor.authorLikas, A.en
dc.date.accessioned2015-11-24T17:02:31Z-
dc.date.available2015-11-24T17:02:31Z-
dc.identifier.issn0169-023X-
dc.identifier.urihttps://olympias.lib.uoi.gr/jspui/handle/123456789/11064-
dc.rightsDefault Licence-
dc.subjectclustering methodsen
dc.subjectdocument clusteringen
dc.subjecttext miningen
dc.subjectterm selectionen
dc.subjectsubspace clusteringen
dc.subjectalgorithmen
dc.subjectmodelen
dc.titleDocument clustering using synthetic cluster prototypesen
heal.typejournalArticle-
heal.type.enJournal articleen
heal.type.elΆρθρο Περιοδικούel
heal.identifier.primaryDOI 10.1016/j.datak.2010.12.002-
heal.languageen-
heal.accesscampus-
heal.recordProviderΠανεπιστήμιο Ιωαννίνων. Σχολή Θετικών Επιστημών. Τμήμα Μηχανικών Ηλεκτρονικών Υπολογιστών και Πληροφορικήςel
heal.publicationDate2011-
heal.abstractThe use of centroids as prototypes for clustering text documents with the k-means family of methods is not always the best choice for representing text clusters due to the high dimensionality, sparsity, and low quality of text data. Especially for the cases where we seek clusters with small number of objects, the use of centroids may lead to poor solutions near the bad initial conditions. To overcome this problem, we propose the idea of synthetic cluster prototype that is computed by first selecting a subset of cluster objects (instances), then computing the representative of these objects and finally selecting important features. In this spirit, we introduce the MedoidKNN synthetic prototype that favors the representation of the dominant class in a cluster. These synthetic cluster prototypes are incorporated into the generic spherical k-means procedure leading to a robust clustering method called k-synthetic prototypes (k-sp). Comparative experimental evaluation demonstrates the robustness of the approach especially for small datasets and clusters overlapping in many dimensions and its superior performance against traditional and subspace clustering methods. (c) 2010 Elsevier B.V. All rights reserved.en
heal.journalNameData & Knowledge Engineeringen
heal.journalTypepeer reviewed-
heal.fullTextAvailabilityTRUE-
Appears in Collections:Άρθρα σε επιστημονικά περιοδικά ( Ανοικτά)

Files in This Item:
File Description SizeFormat 
Kalogeratos-2011-Document clustering.pdf1.69 MBAdobe PDFView/Open    Request a copy


This item is licensed under a Creative Commons License Creative Commons