Selective Known Sample Attacks to Relation Preserving Data Transformations

Term: 
2018-2019 Summer
Faculty Department of Project Supervisor: 
Faculty of Engineering and Natural Sciences
Number of Students: 
2

Many data mining applications such as clustering and k-NN search rely on distances and relations in the data. Thus, distance preserving transformations, which perturb the data but retain records' distances, have emerged as a prominent privacy protection method. While this is so, a generalized form of distance preserving transformations, called relation preserving transformations has been shown to be vulnerable to known sample attacks. That is, an attacker with few known samples (4 to 10) and direct access to relations can retrieve unknown private data records with non-negligable confidence.
In this project, we aim to show that the performance of such an attack is dependent on the distribution of the sample. We plan to improve/extend the attack algorithm for an adversary who is able to selectively corrupt a few users in the dataset to create a targetted sample set. With a careful selection of a known sample, we aim to create a resiliant attack even for the worst case scenarios (e.g., against victims acting as outliers).
Within the scope of this project, students are expected to implement a selection algorithm that inputs a dissimilarity matrix and outputs entries with relatively high pair-wise distances. Moreover, students will experimentally demonstrate the effectiveness of the new approach with respect to various parameters and datasets.

Related Areas of Project: 
Computer Science and Engineering

About Project Supervisors

Mehmet Ercan Nergiz, ercann@sabanciuniv.edu
Öznur Taştan, otastan@sabanciuniv.edu