Interval Data Management in Main Memory

Christodoulou, Georgios

Please use this identifier to cite or link to this item: https://olympias.lib.uoi.gr/jspui/handle/123456789/33052

Full metadata record

DC Field	Value	Language
dc.contributor.author	Christodoulou, Georgios	en
dc.date.accessioned	2023-09-12T09:51:45Z	-
dc.date.available	2023-09-12T09:51:45Z	-
dc.identifier.uri	https://olympias.lib.uoi.gr/jspui/handle/123456789/33052	-
dc.identifier.uri	http://dx.doi.org/10.26268/heal.uoi.12851	-
dc.rights	CC0 1.0 Universal	*
dc.rights.uri	http://creativecommons.org/publicdomain/zero/1.0/	*
dc.subject	Intervals	en
dc.subject	Indexing	en
dc.subject	Main memory	en
dc.title	Interval Data Management in Main Memory	en
dc.type	doctoralThesis	en
heal.type	doctoralThesis	el
heal.type.en	Doctoral thesis	en
heal.type.el	Διδακτορική διατριβή	el
heal.contributorName	Christodoulou, Georgios	en
heal.dateAvailable	2023-09-12T09:52:46Z	-
heal.language	en	el
heal.access	free	el
heal.recordProvider	Πανεπιστήμιο Ιωαννίνων. Πολυτεχνική Σχολή	el
heal.publicationDate	2023-08-30	-
heal.abstract	The management of intervals has been an active research area since databases were invented. A popular direction of research is the indexing and retrieval of intervals, finding a wide range of applications. Emerging and widely used systems are built dependent on temporal and uncertain data. Many algorithms and indices have been proposed, concentrated on a variety of queries. Most algorithms are either suboptimal in space consumption or perform well only for specific query types. We need novel and efficient in-memory indices for intervals, which can evaluate queries with high performance. In statistical and probabilistic databases [3], uncertain values are often approximated by confidence intervals. Real-world examples of uncertain values include temperature values obtained from IoT devices or time-series. For such cases, it would be more appropriate to record an observation using an interval range rather than a single value. In data anonymization [4] attributes can be generalized to intervals. Stored values can be replaced with semantically consistent but less precise alternatives in the form of intervals. In this way, information from a private table, like the identity of any individual to whom the released data refer cannot be recognized. In XML data indexing techniques [5], the scope of an XML element can be modeled as an interval defined by the positions of the starting and closing tag of the element. Intervals are representations of value ranges. Quite often, these ranges represent periods of time described as a tuple [start, end]. In a temporal database, an intervalbased data model can timestamp each tuple or attribute value with a validity time interval. Along with valid time, an interval-based model can timestamp transaction time, which captures when a tuple is inserted and deleted from the database. We index intervals in data structures so that we can efficiently evaluate different types of queries. There are several query types over intervals, with the differentiation lying on the specifications that shape the resulting set of intervals, or the context in which we model data with an interval representation. The topic of this dissertation is to study the problem of indexing and querying a large collection of records, based on an interval attribute that characterizes each object. We focus on the different aspects of temporal databases, as they form the most significant application of interval data. The collection can be known before indexing or evolve over time, which is common in temporal databases or streaming data. The challenge is to find solutions which, can take advantage of modern hardware such as large main memories, can handle traditional and on demand indexing of intervals, and provide high performance for a wide variety of query types and predicates. In this thesis, we study numerous problems and different scenarios which come down to indexing and querying interval data. In the first part, we propose HINT, a novel and efficient in-memory index for large known collections of intervals, with a focus on range queries, which are a basic component of many search and analysis tasks. Our index is suitable for valid-time indexing in the context of temporal databases. HINT applies a hierarchical partitioning approach, which assigns each interval to at most two partitions per level and has controlled space requirements. We reduce the information stored at each partition to the absolutely necessary by dividing the intervals in groups, based on whether they begin inside or before the partition boundaries. In addition, our index includes storage optimization techniques for the effective handling of data sparsity and skewness. The second problem we study, is a more general version of HINT, so that with the best trade-off in information storage, it will be able to handle queries with different predicates. Intervals may satisfy more sophisticated relations than intersections, which are based on Allen’s relationships [6] (e.g., find all intervals that are covered by the query interval). The principles of HINT are useful for the retrieval of data intervals based on Allen’s relationships, because the hierarchical partitioning applies independently of the query type. We show how HINT can be tuned depending on the data and can be efficiently used to process joins and queries based on Allen’s relationships. In the last part of this dissertation, we study the problem of transaction-time indexing in the context of temporal databases, i.e., indexing versions of data in an evolving database. Given the fact that the main memories of modern commodity are large and cheap, we can afford to keep track of all versions of an evolving table in memory. This raises the question of how to index such a table effectively. We depart from the classic indexing approach, where both current (i.e., live) and past (i.e., dead) data versions are indexed in the same data structure, and propose LIT, a hybrid index, which decouples the management of the current and past states of the indexed column. LIT includes optimized indexing modules for dead and live records, which support efficient queries and updates, and gracefully combines them. For the evaluation of our methods, we used multiple real and synthetic datasets with different characteristics so that we can safely conclude the robustness of our algorithms. The experiments showed that our algorithms are typically one order of magnitude faster than existing methods on static or evolving data collections and with multiple types of queries.	en
heal.advisorName	Mamoulis, Nikolaos	en
heal.committeeMemberName	Mamoulis, Nikolaos	en
heal.committeeMemberName	Vassiliadis, Panagiotis	en
heal.committeeMemberName	Tsaparas, Panayiotis	en
heal.committeeMemberName	Bouros, Panagiotis	en
heal.committeeMemberName	Koubarakis, Manolis	en
heal.committeeMemberName	Skiadopoulos, Spiros	en
heal.academicPublisher	Πανεπιστήμιο Ιωαννίνων. Πολυτεχνική Σχολή. Τμήμα Μηχανικών Ηλεκτρονικών Υπολογιστών και Πληροφορικής	el
heal.academicPublisherID	uoi	el
heal.fullTextAvailability	true	-
Appears in Collections:	Διδακτορικές Διατριβές - ΜΗΥΠ

Show simple item record

Files in This Item:

File	Description	Size	Format
Δ.Δ. Χριστοδούλου Γεώργιος (2023)		5.01 MB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License

Repository of UOI "Olympias"