Extraction and classification of phases in schema evolution histories (Master thesis)
Software projects that are built on top of relational databases evolve over time just like any other software project. Bugs occur and user requirements change and in order to keep the users satisfied and provide consistent services, software projects have to adapt to the new requirements. The information capacity of a software project also needs to be aligned with these requirements resulting in the need to evolve the database schema along with the software. Schema evolution affects the surrounding applications in both a syntactic and semantic manner, thus making its understanding a topic of significance. Our fundamental research question is of investigative nature: are there phases in the lives of relational schemata? To support our study towards answering the above question, we have used 6 free open source software projects that include relational databases, whose evolution we have tracked and for which, we have identified the changes that took place in each committed version. Based on these data, this Thesis is structured along two parts: the first part is of explorative nature, and studies the collected data to manually extract phases and patterns in the tables’ lives, whereas the second part, proposes an automated method to algorithmically extract these phases. The first part of this Thesis addresses the following question: when are tables, attributes and foreign keys born and evicted in the life of a schema? Based on the information on table and attribute births, deaths and updates, along with the timeline of schema size, we have manually derived phases in the life of our 6 database schemata. Our characterizations are based on the demonstrated growth (increase of information capacity) or maintenance (containing deletions and updates in order to improve the quality of the schema). The most interesting finding in our study is that, with a single exception, the history of a database schema comes in two mega-phases: (a) a “hot” expansion mega-phase at the start of its life demonstrating growth of information capacity, along with the necessary maintenance, and, (b) a “cooling” housekeeping mega-phase at its middle and later life where either maintenance actions or stillness dominate the update activity. We call this phenomenon progressive cooling of the schema heartbeat. Several observations support this finding. The second part of the Thesis addresses the following question: given the history and heartbeat of a schema, can we automatically extract phases in its evolution? Our algorithmic method includes four steps. The first step of our method, involves the characterization of the releases in terms of the two aforementioned change families, growth and maintenance. Based on these characterizations, the second step of the method splits the timeline of the schema’s life in phases, by applying a hierarchical agglomerative clustering, that clusters together consecutive releases. In the third step of our method, we use several measures of clustering quality, such as Silhouette, Cohesion and Separation to characterize the discriminating quality of each of the derived clusterings. Finally, the fourth step of the method classifies clusters, i.e. phases in the life of a schema, in terms of their nature, on the basis of a taxonomy of change profiles (e.g., Minor Activity, Restructuring, Intense Evolution, among others). The phase extraction and classification method introduced in the second part of this Thesis was evaluated with respect to clustering oriented measures and quality measures based on our golden standard. The findings of this evaluation show that our method performs fairly, having a small error rate and the solutions it produces are of significant quality.
|Institution and School/Department of submitter:||Πανεπιστήμιο Ιωαννίνων. Σχολή Θετικών Επιστημών. Τμήμα Μηχανικών Η/Υ & Πληροφορικής|
|Keywords:||Databases,Database schema evolution,Phase extraction,Βάσεις δεδομένων,Εξέλιξη σχήματος βάσεων δεδομένων,Εξαγωγή φάσεων|
|Appears in Collections:||Διατριβές Μεταπτυχιακής Έρευνας (Masters)|
Please use this identifier to cite or link to this item:This item is a favorite for 0 people.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.