Detection of predictable temporal changes in multidimensional biological sequences (Master thesis)
This work investigates the predictability of interesting temporal changes between the various states of a longitudinal microbiome dataset, whilst those changes occur at time-points subsequent to the analyzed ones. Predictability has been defined as the generalization performance of an optimal classification system built using a given dataset and tested with a given measure. The temporal dataset used was a longitudinal microbiome dataset containing information about the evolution of the relative abundances of the vaginal microbiome of a number of women. Initially, the analysis focused on the prediction of double changes in microbial composition (named as spikes), given the population relative abundances in previous time instances. The constructed datasets were classified using several methods with accuracy about 70% for the prediction of spikes. Next we searched for subsets of the datasets being more predictable than the complete dataset. A continuous measure describing the amount of temporal change between consecutive time-points, named spikeness was estimated for all time-points. The dataset examples were ranked based on spikeness and data subsets were created containing top-ranked positive and bottom-ranked negative examples. The classification system used for measuring predictability (called black-box classifier), consisted of a set of various classification models as well as external model parameters and the output classification result for each data subset was obtained from the best performing model. Based on the above ideas, a new automatic way of detecting predictable temporal changes has been proposed. An approach called rank-based predictability was applied for estimating the predictability of gradually increasing subsets of the dataset, which were selected based on the ranking of the examples. The methodology is based on first transforming the time series into symbolic ones using clustering techniques and then defining patterns of temporal change using a symbolic representation. Then a two-class dataset was constructed given a pattern of temporal changes and its prediction features. As a second step, the rank-based predictability approach was applied to this dataset, as a way of estimating the predictability of temporal patterns. Patterns of temporal changes with predictability greater than a user-specified threshold were considered predictable. The experimental results using four temporal patterns indicated that all temporal patterns were predictable for subsets having a high coverage of the positive examples. Moreover, the results indicated that the predictability of the rank-based subsets was always greater than the average predictability of randomly selected subsets.
|Institution and School/Department of submitter:||Πανεπιστήμιο Ιωαννίνων. Σχολή Θετικών Επιστημών. Τμήμα Μηχανικών Η/Υ & Πληροφορικής|
|Subject classification:||Machine learning|
|Keywords:||Μηχανική μάθηση,Χρονικά εξελισσόμενα δεδομένα,Βιολογικές ακολουθίες,Πρόβλεψη χρονικών μεταβολών,Machine learning,Longitudinal data,Biological sequences,Prediction of temporal changes|
|Appears in Collections:||Διατριβές Μεταπτυχιακής Έρευνας (Masters)|
Files in This Item:
|Μ.Ε. ΤΙΜΟΝΙΔΗΣ ΝΕΣΤΩΡ 2017.pdf||4.62 MB||Adobe PDF||View/Open|
Please use this identifier to cite or link to this item:This item is a favorite for 0 people.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.