Please use this identifier to cite or link to this item: https://olympias.lib.uoi.gr/jspui/handle/123456789/7275
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPilalidou, Alexandraen
dc.date.accessioned2015-11-23T13:29:49Z-
dc.date.available2015-11-23T13:29:49Z-
dc.identifier.urihttps://olympias.lib.uoi.gr/jspui/handle/123456789/7275-
dc.identifier.urihttp://dx.doi.org/10.26268/heal.uoi.1199-
dc.rightsDefault License-
dc.subject-
dc.titleOnline negotiation for privacy preserving data publishingen
heal.typemasterThesis-
heal.type.enMaster thesisen
heal.type.elΜεταπτυχιακή εργασίαel
heal.identifier.secondaryΜ.Ε. ΠΗΛ 2010-
heal.languageen-
heal.accessfree-
heal.recordProviderΠανεπιστήμιο Ιωαννίνων. Σχολή Θετικών Επιστημών. Τμήμα Μηχανικών Η/Υ & Πληροφορικήςel
heal.publicationDate2010-
heal.bibliographicCitationΒιβλιογραφία: σ. 193 - 195
heal.abstractThe problem of privacy preserving data publishing is defined as the problem of publicly presenting a data set with the structured records around the activities or transactions of a set of persons, in order to accommodate the following two antagonistic goals: (a) allow a set of well-intended knowledge workers to execute data mining algorithms over the public data set in order to extract useful information of statistical nature for this data set, and, (b) prevent a malicious attacker to combine these publicly available data with background knowledge (in the sense of personal knowledge of the attacker, other publicly available data sets, etc) in order to link a specific person in the real world (and in particular sensitive information around this person) with its corresponding record in the public data set. The main technique that data curators undergo is the anonymization of data, which involves transforming the data (in one of many ways that the research community has come up with) before presenting them for public use. in our setting, we focus on the global recoding approach which is a method for data anonymization with (a) high utility for the data mining tools of the well-intended users, (b) faster times than the alternative methods (although not fast enough for an online environment), and, at the same time, (c) the problem of having to delete (a.k.a., suppress) outlier groups to attain an acceptable level of generalization. In this thesis we attack the following goals, not previously explored by the research community. The first goal of this thesis is to study the interplay of suppression, generalization and privacy criterion and record how changes to one of these parameters affect the two others. The main goal, however, of this thesis is to provide the means to negotiate the configuration of the anonymization of a data set, by allowing a target group of known well-meaning users and the data curator who is responsible for the anonymization of data to agree online on (a) the level of data generalization (and thus, the incurred information loss for the well-meaning users), (b) the number of tuples that can be omitted from the published data set and (c) the privacy criterion that the data curator imposes. Our first approach involves precomputing suitable histograms for all the different anonymization schemes that a global recoding method can follow. This allows computing exact answers extremely fact (in the order of few milliseconds). We provide both exact answers, if they exist, and suggestions for approximate answers by exploiting these histograms. However, this approach requires a pre-processing time in the orders of few dozens of minutes; whenever this is not feasible, alternative approaches must be explored. To this end, we propose a method that precomputes a small subset of the histograms in order to speed up the pre-processing time. Our experiments indicate a linear speedup along with very good or acceptable values for the quality of the proposed solutions, depending on the type of answer. Finally, to alleviate the problems of deviations from the optimal solution for two cases of approximation suggestions, we introduce a third variant, where the histogram of the top acceptable node (in terms of height constraint) is also computed at runtime. This method pays the price of 0.1-0.3 seconds to gain excellent quality of solution for all kinds of answers. This way, the data curator is equipped with alternative tools that he can use depending on the constraints in terms of user time and quality of solution.en
heal.advisorName-
heal.committeeMemberName-
heal.academicPublisherΠανεπιστήμιο Ιωαννίνων. Σχολή Θετικών Επιστημών. Τμήμα Μηχανικών Η/Υ & Πληροφορικήςel
heal.academicPublisherIDuoi-
heal.numberOfPages196 σ.-
heal.fullTextAvailabilitytrue-
Appears in Collections:Διατριβές Μεταπτυχιακής Έρευνας (Masters) - ΜΥ

Files in This Item:
File Description SizeFormat 
Μ. Ε. PILALIDOU ALEXANDRA.pdf5.25 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons