Document clustering using synthetic cluster prototypes

Kalogeratos, A.; Likas, A.

Please use this identifier to cite or link to this item: https://olympias.lib.uoi.gr/jspui/handle/123456789/11064

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kalogeratos, A.	en
dc.contributor.author	Likas, A.	en
dc.date.accessioned	2015-11-24T17:02:31Z	-
dc.date.available	2015-11-24T17:02:31Z	-
dc.identifier.issn	0169-023X	-
dc.identifier.uri	https://olympias.lib.uoi.gr/jspui/handle/123456789/11064	-
dc.rights	Default Licence	-
dc.subject	clustering methods	en
dc.subject	document clustering	en
dc.subject	text mining	en
dc.subject	term selection	en
dc.subject	subspace clustering	en
dc.subject	algorithm	en
dc.subject	model	en
dc.title	Document clustering using synthetic cluster prototypes	en
heal.type	journalArticle	-
heal.type.en	Journal article	en
heal.type.el	Άρθρο Περιοδικού	el
heal.identifier.primary	DOI 10.1016/j.datak.2010.12.002	-
heal.language	en	-
heal.access	campus	-
heal.recordProvider	Πανεπιστήμιο Ιωαννίνων. Σχολή Θετικών Επιστημών. Τμήμα Μηχανικών Ηλεκτρονικών Υπολογιστών και Πληροφορικής	el
heal.publicationDate	2011	-
heal.abstract	The use of centroids as prototypes for clustering text documents with the k-means family of methods is not always the best choice for representing text clusters due to the high dimensionality, sparsity, and low quality of text data. Especially for the cases where we seek clusters with small number of objects, the use of centroids may lead to poor solutions near the bad initial conditions. To overcome this problem, we propose the idea of synthetic cluster prototype that is computed by first selecting a subset of cluster objects (instances), then computing the representative of these objects and finally selecting important features. In this spirit, we introduce the MedoidKNN synthetic prototype that favors the representation of the dominant class in a cluster. These synthetic cluster prototypes are incorporated into the generic spherical k-means procedure leading to a robust clustering method called k-synthetic prototypes (k-sp). Comparative experimental evaluation demonstrates the robustness of the approach especially for small datasets and clusters overlapping in many dimensions and its superior performance against traditional and subspace clustering methods. (c) 2010 Elsevier B.V. All rights reserved.	en
heal.journalName	Data & Knowledge Engineering	en
heal.journalType	peer reviewed	-
heal.fullTextAvailability	TRUE	-
Appears in Collections:	Άρθρα σε επιστημονικά περιοδικά ( Ανοικτά)

Show simple item record

Files in This Item:

File	Description	Size	Format
Kalogeratos-2011-Document clustering.pdf		1.69 MB	Adobe PDF	View/Open Request a copy

Show simple item record

This item is licensed under a Creative Commons License

Repository of UOI "Olympias"