Meshing streaming updates with persistent data in an active data warehouse

Polyzotis, N.; Skiadopoulos, S.; Vassiliadis, P.; Simitsis, A.; Frantzell, N. E.

Please use this identifier to cite or link to this item: https://olympias.lib.uoi.gr/jspui/handle/123456789/10990

Full metadata record

DC Field	Value	Language
dc.contributor.author	Polyzotis, N.	en
dc.contributor.author	Skiadopoulos, S.	en
dc.contributor.author	Vassiliadis, P.	en
dc.contributor.author	Simitsis, A.	en
dc.contributor.author	Frantzell, N. E.	en
dc.date.accessioned	2015-11-24T17:01:52Z	-
dc.date.available	2015-11-24T17:01:52Z	-
dc.identifier.issn	1041-4347	-
dc.identifier.uri	https://olympias.lib.uoi.gr/jspui/handle/123456789/10990	-
dc.rights	Default Licence	-
dc.subject	active data warehouse	en
dc.subject	join	en
dc.subject	meshjoin	en
dc.subject	streams	en
dc.subject	relations	en
dc.subject	view maintenance	en
dc.title	Meshing streaming updates with persistent data in an active data warehouse	en
heal.type	journalArticle	-
heal.type.en	Journal article	en
heal.type.el	Άρθρο Περιοδικού	el
heal.identifier.primary	Doi 10.1109/Tkde.2008.27	-
heal.language	en	-
heal.access	campus	-
heal.recordProvider	Πανεπιστήμιο Ιωαννίνων. Σχολή Θετικών Επιστημών. Τμήμα Μηχανικών Ηλεκτρονικών Υπολογιστών και Πληροφορικής	el
heal.publicationDate	2008	-
heal.abstract	Active Data Warehousing has emerged as an alternative to conventional warehousing practices in order to meet the high demand of applications for up-to-date information. In a nutshell, an active warehouse is refreshed online and thus achieves a higher consistency between the stored information and the latest data updates. The need for online warehouse refreshment introduces several challenges in the implementation of data warehouse transformations, with respect to their execution time and their overhead to the warehouse processes. In this paper, we focus on a frequently encountered operation in this context, namely, the join of a fast stream S of source updates with a disk-based relation R, under the constraint of limited memory. This operation lies at the core of several common transformations such as surrogate key assignment, duplicate detection, or identification of newly inserted tuples. We propose a specialized join algorithm, termed mesh join ( MESHJOIN), which compensates for the difference in the access cost of the two join inputs by 1) relying entirely on fast sequential scans of R and 2) sharing the I/O cost of accessing R across multiple tuples of S. We detail the MESHJOIN algorithm and develop a systematic cost model that enables the tuning of MESHJOIN for two objectives: maximizing throughput under a specific memory budget or minimizing memory consumption for a specific throughput. We present an experimental study that validates the performance of MESHJOIN on synthetic and real-life data. Our results verify the scalability of MESHJOIN to fast streams and large relations and demonstrate its numerous advantages over existing join algorithms.	en
heal.journalName	Ieee Transactions on Knowledge and Data Engineering	en
heal.journalType	peer reviewed	-
heal.fullTextAvailability	TRUE	-
Appears in Collections:	Άρθρα σε επιστημονικά περιοδικά ( Ανοικτά)

Show simple item record

Files in This Item:

File	Description	Size	Format
Polyzotis-2008-Meshing streaming up.pdf		2.54 MB	Adobe PDF	View/Open Request a copy

Show simple item record

This item is licensed under a Creative Commons License

Repository of UOI "Olympias"