Genomic data are incorporated in various domains including biomedical research, clinical care, and customer services. Therefore, currently, thousands of genomes are shared online. DNA carries personal information about its owner such as ethnicity, kin or predisposition to certain diseases. Therefore, it is critical to ensure that genomic data are shared without compromising the privacy of the participants and their families. We have developed a computational methodology to protect kinship information of the individuals who share their genomic data in a public database [1]. We formulated the trade-off between the utility of sharing data and preserving the privacy of the family as an optimization problem. The model renders the maximal sharing of genomic data possible while minimizing kinship privacy risks. To achieve that, the model systematically identifies and withholds a minimal portion of the newly arrived members’ genome and share the remaining data without compromising kinship privacy. One limitation of this model is that it disregards the statistical dependencies between genomic positions. If a position that is masked is strongly statistically dependent on another locus, it can be inferred from the correlated position thus obviating the extra protection gained from its absence. This project aims at incorporating the genomic correlation structures into the existing model.
1. G. Kale, E. Ayday, and O. Tastan. A utility maximizing and privacy preserving approach for protecting kinship in genomic databases, Bioinformatics, in Press.
About Project Supervisors
Oznur Tastan in collaboration with Erman Ayday of Case Western University