Μέθοδοι ομαδοποίησης βασισμένες σε στατιστικό έλεγχο της μονοτροπικότητας των δεδομένων

Χαμάλης, Θεόφιλος

Please use this identifier to cite or link to this item: https://olympias.lib.uoi.gr/jspui/handle/123456789/30303

Full metadata record

DC Field	Value	Language
dc.contributor.author	Χαμάλης, Θεόφιλος	el
dc.date.accessioned	2020-11-16T11:41:07Z	-
dc.date.available	2020-11-16T11:41:07Z	-
dc.identifier.uri	https://olympias.lib.uoi.gr/jspui/handle/123456789/30303	-
dc.identifier.uri	http://dx.doi.org/10.26268/heal.uoi.10191	-
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Αλγόριθμοι υπολογιστών	el
dc.subject	Computer algorithms	en
dc.title	Μέθοδοι ομαδοποίησης βασισμένες σε στατιστικό έλεγχο της μονοτροπικότητας των δεδομένων	el
dc.title	Clustering methods based on statistical testing of the unimodality of the data	en
heal.type	masterThesis	-
heal.type.en	Master thesis	en
heal.type.el	Μεταπτυχιακή εργασία	el
heal.classification	Αλγόριθμοι υπολογιστών	-
heal.dateAvailable	2020-11-16T11:42:08Z	-
heal.language	el	-
heal.access	free	-
heal.recordProvider	Πανεπιστήμιο Ιωαννίνων. Σχολή Θετικών Επιστημών. Τμήμα Μηχανικών Η/Υ & Πληροφορικής	el
heal.publicationDate	2015	-
heal.bibliographicCitation	Βιβλιογραφία: σ. 156-160	el
heal.abstract	Η ομαδοποίηση αποτελεί ένα από τα σημαντικότερα αντικείμενα της μηχανικής μάθησης και της εξόρυξης δεδομένων λόγω του πλήθους εφαρμογών της σε προβλήματα ανάλυσης δεδομένων. Ένα βασικό ζήτημα κατά την ομαδοποίηση ενός συνόλου δεδομένων σχετίζεται με την εκτίμηση του αριθμού των ομάδων, ο οποίος συνήθως δεν είναι γνωστός εκ των προτέρων. Μια τεχνική που έχει προταθεί για το πρόβλημα αυτό είναι ο αλγόριθμος dip-means, o οποίος προτείνει μια μεθοδολογία (κριτήριο) για τον στατιστικό έλεγχο της μονοτροπικότητας ενός συνόλου δεδομένων και χρησιμοποιεί τη μεθοδολογία αυτή για ομαδοποίηση με αυξητικό τρόπο: στην αρχή όλα τα δεδομένα ανήκουν στην ίδια ομάδα και σε κάθε βήμα διασπώνται οι ομάδες που δεν είναι μονοτροπικές σύμφωνα με το κριτήριο. Στην εργασία αυτή καταρχήν προτείνεται μια παραλλαγή του αλγορίθμου dip-means η οποία ονομάζεται pdip-means (projected dip-means) και η οποία τροποποιεί το κριτήριο ελέγχου μονοτροπικότητας ώστε, αντί να εξετάζονται για μονοτροπικότητα οι γραμμές του πίνακα αποστάσεων μεταξύ των δεδομένων μιας ομάδας, να ελέγχονται για μονοτροπικότητα οι μονοδιάστατες προβολές των δεδομένων της ομάδας σε διάφορες κατευθύνσεις που καθορίζονται ντετερμινιστικά (π.χ. PCA) ή τυχαία (π.χ. Random Projections). Στη συνέχεια παρουσιάζεται μια συσσωρευτική (agglomerative) μεθοδολογία ομαδοποίησης βασισμένη στον έλεγχο μονοτροπικότητας των δεδομένων μιας ομάδας. Η μέθοδος αυτή (agglodip) ξεκινά με πολλές αρχικές ομάδες. Σε κάθε βήμα ενώνονται δύο από τις ομάδες σε μια, εάν από τη συνένωσή τους προκύπτει μια μονοτροπική ομάδα σύμφωνα με το κριτήριο ελέγχου της μονοτροπικότητας. Η διαδικασία τερματίζεται όταν δεν υπάρχουν πλέον ομάδες που η συνένωσή τους να δίνει μια νέα μονοτροπική ομάδα. Με βάση αυτή τη βασική ιδέα προτείνονται κάποιες εναλλακτικές προσεγγίσεις. Μια από αυτές επιταχύνει τον έλεγχο μονοτροπικότητας κατά τη συνένωση δύο ομάδων εκμεταλλευόμενη τα κεντροειδή των δύο ομάδων. Μια δεύτερη μετασχηματίζει το πρόβλημα επαναληπτικής συνένωσης των ομάδων του αρχικού συνόλου σε πρόβλημα εύρεσης των συνεκτικών συνιστωσών ενός γραφήματος. Τέλος προτείνεται και μια τρίτη μέθοδος (agglopdip) η οποία χρησιμοποιεί για τον έλεγχο της μονοτροπικότητας τη μέθοδο των προβολών που αναφέρθηκε παραπάνω. Οι παραπάνω μεθοδολογίες αξιολογήθηκαν πειραματικά σε συνθετικά αλλά και σε πραγματικά σύνολα δεδομένων και οι επιδόσεις τους συγκρίθηκαν τόσο με τον αλγόριθμο dip-means, όσο και με προγενέστερους αυξητικούς αλγορίθμους όπως οι x-means και g-means. Επιπλέον οι προτεινόμενες συσσωρευτικές μεθοδολογίες ομαδοποίησης εφαρμόστηκαν για το πρόβλημα της κατάτμησης εικόνων (image segmentation). Στη μέθοδο κατάτμησης που μελετήθηκε, αρχικά δημιουργείται μια υπερκατάτμηση της εικόνας με μεθόδους δημιουργίας superpixels και στη συνέχεια τα superpixels συνενώνονται σε μεγαλύτερα τμήματα, εάν από τη συνένωσή τους προκύπτουν μονοτροπικές ομάδες.	el
heal.abstract	Clustering is one of the most important fields of machine learning and data mining because of the vast number of applications in data analysis problems. A key issue of clustering a dataset is related with the estimation of the number of clusters, which is usually unknown beforehand. An already proposed technique is the dip-means algorithm, which proposes the use of a methodology (criterion) for the statistical test of the unimodality of a dataset and uses this methodology in an incremental manner: starting with all the data in the same cluster and in each step splitting the multimodal clusters according to the criterion. Initially, in this thesis we propose a new variant of the dip-means algorithm named pdip-means (projected dip-means) which modifies the unimodality criterion such that the one dimensional projections of the data within a cluster to various directions, that are deterministic defined (e.g. PCA) or random (e.g. Random Projections), are tested for unimodality instead of each line of the distance matrix of the data within a cluster. Next, an agglomerative clustering methodology is presented based on the unimodality test of the data within a cluster. This method (agglodip) starts off with many initial clusters. In each step two clusters are merged into one, only if a unimodal cluster is produced according to the statistical test. This procedure ends when there are no more clusters that their merge will produce a unimodal cluster. In addition to this concept, some other approaches are proposed as well. The first one, accelerates the unimodality tests while merging two clusters exploiting their centroids. The next approach transforms the iterative cluster merging problem of the cluster of a dataset into the problem of finding the connected components of a graph. Finally, a third method (agglopdip) is proposed which uses the unimodality test criterion in conjunction with the projection method which was referred above. These methodologies were evaluated experimentally in synthetic as well as real life datasets and we compared their performance with the dip-means algorithm and with prior incremental clustering algorithms such as x-means and g-means. Furthermore, the proposed agglomerative clustering methodologies were applied to the problem of image segmentation. In the segmentation method that was studied, an oversegmentation of the image is created initially using superpixels and then they get merged into bigger segments, if unimodal clusters are produced by their merge.	en
heal.advisorName	Λύκας, Αριστείδης	el
heal.committeeMemberName	Λύκας, Αριστείδης
heal.academicPublisher	Πανεπιστήμιο Ιωαννίνων. Σχολή Θετικών Επιστημών. Τμήμα Μηχανικών Η/Υ & Πληροφορικής	el
heal.academicPublisherID	uoi	-
heal.numberOfPages	159 σ.	-
heal.fullTextAvailability	true	-
Appears in Collections:	Διατριβές Μεταπτυχιακής Έρευνας (Masters) - ΜΥ

Show simple item record

Files in This Item:

File	Description	Size	Format
Μ.Ε. ΧΑΜΑΛΗΣ ΘΕΟΦΙΛΟΣ 2015.pdf		5.46 MB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License

Repository of UOI "Olympias"