Παράλληλη και κατανεμημένη επεξεργασία ερωτημάτων σε χώρο-λεκτικά δεδομένα

Μπέστας, Δημήτριος

Please use this identifier to cite or link to this item: https://olympias.lib.uoi.gr/jspui/handle/123456789/27868

Full metadata record

DC Field	Value	Language
dc.contributor.author	Μπέστας, Δημήτριος	el
dc.date.accessioned	2017-03-13T11:28:00Z	-
dc.date.available	2017-03-13T11:28:00Z	-
dc.identifier.uri	https://olympias.lib.uoi.gr/jspui/handle/123456789/27868	-
dc.identifier.uri	http://dx.doi.org/10.26268/heal.uoi.1897	-
dc.rights	Default License	-
dc.subject	Παράλληλη και κατανεμημένη	el
dc.subject	Ερωτημάτα	el
dc.subject	Χώρο-λεκτικά δεδομένα	el
dc.subject	Parallel and distributed	en
dc.subject	Spatial	en
dc.subject	Queries	en
dc.title	Παράλληλη και κατανεμημένη επεξεργασία ερωτημάτων σε χώρο-λεκτικά δεδομένα	el
dc.title	Parallel and distributed processing of spatial preference queries	en
heal.type	masterThesis	-
heal.type.en	Master thesis	en
heal.type.el	Μεταπτυχιακή εργασία	el
heal.classification	Computer algorithms	en
heal.dateAvailable	2017-03-13T11:29:00Z	-
heal.language	en	-
heal.access	free	-
heal.recordProvider	Πανεπιστήμιο Ιωαννίνων. Σχολή Θετικών Επιστημών. Τμήμα Μηχανικών Η/Υ & Πληροφορικής	el
heal.publicationDate	2017	-
heal.bibliographicCitation	Βιβλιογράφία : σ. 77-83	el
heal.abstract	Ο στόχος της συγκεκριμένης διατριβής είναι η ανάπτυξη αλγορίθμων για την επίλυση χωρο-λεκτικών ερωτημάτων (spatio-textual queries) πάνω σε δεδομένα μεγάλου όγκου (big-data) και κατανεμημένο περιβάλλον. Το συγκεκριμένο αντικείμενο έχει μελετηθεί εκτενώς από την ακαδημαϊκή κοινότητα και έχουν προταθεί πολλοί και αποδοτικοί αλγόριθμοι για την αντιμετώπιση του ζητήματος. Οι υπάρχουσες μελέτες (και κατά συνέπεια και υλοποιήσεις) επικεντρώνονται στην λειτουργία σε κεντρικό υπολογιστικό περιβάλλον, ήτοι την επεξεργασία των δεδομένων σε ένα και μόνο τερματικό μηχάνημα (Η/Υ, κινητό, κτλ). Το γεγονός αυτό επιφέρει ένα σημαντικό μειονέκτημα: Θα πρέπει τα δεδομένα να είναι αρκούντως μικρά ώστε να μπορούν να επεξεργαστούν από τις πολλές φορές περιορισμένες υπολογιστικές δυνατότητες του τερματικού μηχανήματος. Με το πέρασμα του χρόνου και την ευρύτατη εξάπλωση του διαδικτύου και των κινητών συσκευών που έχουν πρόσβαση σε αυτό, ο όγκος των παραγόμενων δεδομένων (όλων των τύπων) έχει πολλαπλασιαστεί. Μιλάμε πλέον για big-data ήτοι δεδομένα της τάξης των πολλών gigabytes. Τα δεδομένα αυτά στην πλειοψηφία των περιπτώσεων αφορούν και την τοποθεσία των χρηστών, άρα μιλάμε για χωρικά δεδομένα (location based data). Στην διατριβή αυτή θα επιλύσουμε το ζήτημα της επεξεργασίας αυτών των δεδομένων σε κατανεμημένο περιβάλλον και θα προτείνουμε αλγορίθμους για το περιβάλλον αυτό. Λέγοντας κατανεμημένο περιβάλλον εννοούμε κάποιο υπολογιστικό σύστημα με 2 η παραπάνω κόμβους. Ενώ όπως είπαμε το αντικείμενο των spatio-textual queries έχει μελετηθεί σε κεντρικό επίπεδο εκτενώς, η μεταφορά του σε κατανεμημένο περιβάλλον παρουσιάζει προκλήσεις. Στην παρούσα εργασία θα παρουσιάσουμε τις προκλήσεις αυτές και θα δούμε πως αντιμετωπίζονται καθώς και θα προτείνουμε έναν παράλληλο αλγόριθμο που επιλύει με τον βέλτιστο τρόπο spatio-textual queries σε κατανεμημένο περιβάλλον. Στους επιμέρους στόχους της εργασίας ανήκει και ο σχεδιασμός ενός προσομοιωτή ο οποίος θα προσομοιώνει την λειτουργία του παράλληλου αλγορίθμου σε κεντρικό περιβάλλον.	el
heal.abstract	In this thesis we aim to implement algorithms for solving spatio-textual queries over big data and thus over a distributed environment. This topic has been extensively studied by the academic community, while a plethora of algorithms have been proposed and implemented that handle these types of queries. The problem with the existing solutions is that they focus on a centralized environment, or processing of the available data on a single terminal device (PC, mobile phone, etc). This fact incurs a significant disadvantage: The data to be processed should be sufficiently small in order to be able to be processed by the sometimes limited processing capabilities of the terminal device. With the advent of time and the huge growth of the internet and the widespread availability of devices that can access it, the amount of data produced has increased exponentially. We now talk about big-data, or data in the magnitude of hundreds of gigabytes. Depending on the application that generates the data (eg Facebook, Flickr, twitter, foursquare, etc.) more often than not, the geographical location of the user is also included. Thus we are faced with location based data. In this thesis we will tackle the problems that arise when trying to process such data on a distributed environment and we will propose algorithms that run on such environments. A distributed environment is a computational system that is comprised of at least two computing nodes. As mentioned before, spatio-textual queries have been studied in a centralized environment, but transferring the existing knowledge to a distributed environment poses challenges. We will study those challenges and propose parallel algorithms that solve spatio-textual queries in an optimum way. Motivated by this trend, in this thesis, we study the novel problem of parallel and distributed processing of spatial preference queries using keywords, where the input data is stored in a distributed way. Given a set of keywords, a set of spatial data objects and a set of spatial feature objects that are additionally annotated with textual descriptions, the spatial preference query using keywords retrieves the top-k spatial data objects ranked according to the textual relevance of feature objects in their vicinity. This query type is processing-intensive, especially for large datasets, since any data objects may belong to the result set while the spatial range defines the score, and the k data objects with the highest score need to be retrieved. We propose a solution that has two notable features: we propose a deliberate re-partitioning mechanism of input data to servers, which allows parallelized processing, thus establishing the foundations for a scalable query processing algorithm, and we boost the query processing performance in each partition by introducing an early termination mechanism that delivers the correct result by only examining few data objects. Capitalizing on this, we implement parallel algorithms that solve the problem in the MapReduce framework. Our experimental study using both real and synthetic data in a cluster of sixteen physical machines demonstrates the efficiency of our solution.	en
heal.advisorName	Μαμουλής, Νικόλαος	el
heal.committeeMemberName	Μαμουλής, Νικόλαος	el
heal.committeeMemberName	Πιτουρά, Ευαγγελία	el
heal.committeeMemberName	Βασιλειάδης, Παναγιώτης	el
heal.academicPublisher	Πανεπιστήμιο Ιωαννίνων. Σχολή Θετικών Επιστημών. Τμήμα Μηχανικών Η/Υ & Πληροφορικής	el
heal.academicPublisherID	uoi	-
heal.numberOfPages	84 σ.	-
heal.fullTextAvailability	true	-
Appears in Collections:	Διατριβές Μεταπτυχιακής Έρευνας (Masters) - ΜΥ

Show simple item record

Files in This Item:

File	Description	Size	Format
Μ.Ε. ΜΠΕΣΤΑΣ ΔΗΜΗΤΡΙΟΣ 2017.pdf		2.63 MB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License

Repository of UOI "Olympias"