Graph metrics as predictors of schema evolution for relational databases (Master thesis)

Κολοζώφ, Μιχάλης–Ρωμανός

Databases evolve over time and their evolution does not only concern their contents, but also their internal structure, or schema. Schema evolution impacts deeply, both the database itself, and the surrounding applications that need to adapt too. The study of the mechanisms and patterns via which database schemata evolve is important as it can allow the in-advance planning of design, maintenance and resource allocation with a view to the future. In this thesis, we focus on the study of the evolution of foreign keys in the context of schema evolution. Foreign keys are mechanisms that constraint data entry in relational tables, imposing that the domain of the contents of a table’s attribute is a subset of the contents of an attribute of another, lookup, table. Despite the importance of foreign keys, as an integrity constraint that guarantees consistency among the values of different tables, the study of their evolution is a topic that –to the best of our knowledge- has never been studied in the literature before. We have studied the schema histories of a six free, open-source databases that contained foreign keys. To facilitate a quantitative study, we model each version of the schema as a graph, with tables as nodes and foreign keys as directed edges (stemming from the referencing table to the referenced one). Our findings concerning the growth of nodes verify previous results that schemata slowly grow over time in terms of tables. Moreover, we have come to several surprising, new findings in terms of the schema edges (foreign keys). Foreign keys appear to be fairly scarce in the projects that we have studied and they do not necessarily grow in synch with table growth. In fact, we have observed different “cultures” for the handling of foreign keys, ranging from full sync with the growth of nodes to the unexpected extreme of full removal of foreign keys from the schema of the database. Node degrees and survival are related with an inverse gamma pattern: the few nodes with high degrees stand higher chances of survival than average. Similarly, nodes with inciting edges with high values for edge betweenness centrality frequently (but not always) stand higher chances to survive compared to the nodes with a single or zero inciting edges, which have significantly higher chances of removal.
Institution and School/Department of submitter: Πανεπιστήμιο Ιωαννίνων. Σχολή Θετικών Επιστημών. Τμήμα Μηχανικών Η/Υ & Πληροφορικής
Subject classification: Databases
Keywords: Databases,Database schema evolution,Foreign key evolution,Graph metrics,Βάσεις δεδομένων,Εξέλιξη σχήματος ΒΔ,Εξέλιξη ξένων κλειδιών,Γραφοθεωρητικές μετρικές
Appears in Collections:Διατριβές Μεταπτυχιακής Έρευνας (Masters)

Files in This Item:
File Description SizeFormat 
Μ.Ε. ΚΟΛΟΖΩΦ ΜΙΧΑΛΗΣ-ΡΩΜΑΝΟΣ 2017.pdf5.27 MBAdobe PDFView/Open

 Please use this identifier to cite or link to this item:
  This item is a favorite for 0 people.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.