Authors

* External authors

Venue

Date

Share

Privacy for Free: How does Dataset Condensation Help Privacy?

Tian Dong

Bo Zhao*

Lingjuan Lyu

* External authors

ICML 2022

2022

Abstract

To prevent unintentional data leakage, research community has resorted to data generators that can produce differentially private data for model training. However, for the sake of the data privacy, existing solutions suffer from either expensive training cost or poor generalization performance. Therefore, we raise the question whether training efficiency and privacy can be achieved simultaneously. In this work, we for the first time identify that dataset condensation (DC) which is originally designed for improving training efficiency can be a better solution to replace data generators for private data generation, thus providing privacy for free. To demonstrate the privacy benefit of DC, we build a connection between DC and differential privacy (DP), and theoretically prove on linear feature extractors (and then extended to non-linear feature extractors) that the existence of one sample has limited impact (O(m/n)) on the parameter distribution of networks trained on m samples synthesized from n (n >> m) raw data by DC. We also empirically validate the vision privacy and membership privacy of DC-synthesized data by launching both the loss-based and the state-of-the-art likelihood-based membership inference attacks. We envision this work as a milestone for data-efficient and privacy-preserving machine learning.

Related Publications

A Simple Background Augmentation Method for Object Detection with Diffusion Model

ECCV, 2024
Yuhang Li, Xin Dong, Chen Chen, Weiming Zhuang, Lingjuan Lyu

In computer vision, it is well-known that a lack of data diversity will impair model performance. In this study, we address the challenges of enhancing the dataset diversity problem in order to benefit various downstream tasks such as object detection and instance segmentati…

Finding a needle in a haystack: A Black-Box Approach to Invisible Watermark Detection

ECCV, 2024
Minzhou Pan*, Zhenting Wang, Xin Dong, Vikash Sehwag, Lingjuan Lyu, Xue Lin*

In this paper, we propose WaterMark Detection (WMD), the first invisible watermark detection method under a black-box and annotation-free setting. WMD is capable of detecting arbitrary watermarks within a given reference dataset using a clean non watermarked dataset as a ref…

PerceptAnon: Exploring the Human Perception of Image Anonymization Beyond Pseudonymization for GDPR

ICML, 2024
Kartik Patwari, Chen-Nee Chuah*, Lingjuan Lyu, Vivek Sharma

Current image anonymization techniques, largely focus on localized pseudonymization, typically modify identifiable features like faces or full bodies and evaluate anonymity through metrics such as detection and re-identification rates. However, this approach often overlooks …

  • HOME
  • Publications
  • Privacy for Free: How does Dataset Condensation Help Privacy?

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.