Αποθηκευμένες όψεις για κατατακτήριες ερωτήσεις με άνω όριο αποτελεσμάτων (Doctoral thesis)
The goal of this thesis is to explore and investigate the answering of top-k queries through the exploitation of materialized top-k views. In addition, we study the problem of capturing the distance function that best complies with human perception for finding the similarity between two data collections of multidimensional points under the form of OLAP cubes. The top-k querying problem concerns the retrieval of the top-k results of a ranked query over a database. Specifically, given a relation R (tid, A1, A2,..., Am) and a query Q over R the desideratum is to retrieve the top-k tuples from R having the k highest values according to a scoring function f that accompanies Q. In an effort to improve the performance of the retrieval of top-k tuples from R, we study the problem by taking into consideration results from previously posed queries that are cached as materialized views. We study the problem by acquainting a geometric representation and we provide theoretical guarantees on whether a materialized view is able to answer a top-k query. We proceed by proposing the SafArI algorithm for deciding the usability of a materialized view as well as the answer of the top-k query, in case the view is suitable for the query. In the presence of updates in the relation over which a set of views is defined, we provide a method for keeping the top-k materialized views up to date without needing to re-compute them and provide results in two directions. Firstly, we deal with the problem of maintaining top-k views in the presence of high deletion rates and provide a principled method that is independent of the statistical properties of the data and the characteristics of the update streams. Secondly, we assess the problem of efficiently maintaining multiple top-k views, where we provide theoretical guarantees for the nucleation of a view with respect to another view and the reflection of this property to the management of updates. Further on, we propose an algorithm that maintains a large number of views, via their appropriate structuring in hierarchies of views. Apart from finding top-k answers for data in the form of multidimensional points, we also assess the problem of finding how similar are two collections of data according to human perception. To put the question a little more precisely, given two sets of points in a multidimensional hierarchical space, what is the distance between these two collections? In applications such as multimedia information retrieval and digital libraries, where contemporary data lead to huge repositories of heterogeneous data stored in data warehouses, there is a need of similarity search that complements the traditional exact match search. We address the problem by (a) organizing alternative distance functions in a taxonomy of functions and (b) experimentally assessing the effectiveness of each distance function via a user study in order to discover which distance function is mostly preferred by the users.
|Alternative title / Subtitle:||Επεξεργασία ερωτήσεων, ενημέρωση και ομοιότητα|
|Institution and School/Department of submitter:||Πανεπιστήμιο Ιωαννίνων Σχολή Θετικών Επιστημών Τμήμα Πληροφορικής|
|Keywords:||Αποθηκευμένες όψεις,Κατατακτήριες ερωτήσεις με άνω όριο αποτελεσμάτων,Επεξεργασία ερωτήσεων,Ομοιότητα δεδομένων,Ενημέρωση ερωτήσεων|
|Appears in Collections:||Διδακτορικές Διατριβές|
Files in This Item:
There are no files associated with this item.
Please use this identifier to cite or link to this item:This item is a favorite for 0 people.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.