Parallel and distributed processing of spatial preference queries (Master thesis)
In this thesis we aim to implement algorithms for solving spatio-textual queries over big data and thus over a distributed environment. This topic has been extensively studied by the academic community, while a plethora of algorithms have been proposed and implemented that handle these types of queries. The problem with the existing solutions is that they focus on a centralized environment, or processing of the available data on a single terminal device (PC, mobile phone, etc). This fact incurs a significant disadvantage: The data to be processed should be sufficiently small in order to be able to be processed by the sometimes limited processing capabilities of the terminal device. With the advent of time and the huge growth of the internet and the widespread availability of devices that can access it, the amount of data produced has increased exponentially. We now talk about big-data, or data in the magnitude of hundreds of gigabytes. Depending on the application that generates the data (eg Facebook, Flickr, twitter, foursquare, etc.) more often than not, the geographical location of the user is also included. Thus we are faced with location based data. In this thesis we will tackle the problems that arise when trying to process such data on a distributed environment and we will propose algorithms that run on such environments. A distributed environment is a computational system that is comprised of at least two computing nodes. As mentioned before, spatio-textual queries have been studied in a centralized environment, but transferring the existing knowledge to a distributed environment poses challenges. We will study those challenges and propose parallel algorithms that solve spatio-textual queries in an optimum way. Motivated by this trend, in this thesis, we study the novel problem of parallel and distributed processing of spatial preference queries using keywords, where the input data is stored in a distributed way. Given a set of keywords, a set of spatial data objects and a set of spatial feature objects that are additionally annotated with textual descriptions, the spatial preference query using keywords retrieves the top-k spatial data objects ranked according to the textual relevance of feature objects in their vicinity. This query type is processing-intensive, especially for large datasets, since any data objects may belong to the result set while the spatial range defines the score, and the k data objects with the highest score need to be retrieved. We propose a solution that has two notable features: we propose a deliberate re-partitioning mechanism of input data to servers, which allows parallelized processing, thus establishing the foundations for a scalable query processing algorithm, and we boost the query processing performance in each partition by introducing an early termination mechanism that delivers the correct result by only examining few data objects. Capitalizing on this, we implement parallel algorithms that solve the problem in the MapReduce framework. Our experimental study using both real and synthetic data in a cluster of sixteen physical machines demonstrates the efficiency of our solution.
|Institution and School/Department of submitter:||Πανεπιστήμιο Ιωαννίνων. Σχολή Θετικών Επιστημών. Τμήμα Μηχανικών Η/Υ & Πληροφορικής|
|Subject classification:||Computer algorithms|
|Keywords:||Παράλληλη και κατανεμημένη,Ερωτημάτα,Χώρο-λεκτικά δεδομένα,Parallel and distributed,Spatial,Queries|
|Appears in Collections:||Διατριβές Μεταπτυχιακής Έρευνας (Masters)|
Files in This Item:
|Μ.Ε. ΜΠΕΣΤΑΣ ΔΗΜΗΤΡΙΟΣ 2017.pdf||2.63 MB||Adobe PDF||View/Open|
Please use this identifier to cite or link to this item:This item is a favorite for 0 people.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.