Lingjuan Lyu – Sony AI

Profile

Lingjuan is the Head of Privacy-Preserving Machine Learning (PPML) team in Sony AI. As a globally recognized expert in privacy and security, she is leading a group of excellent scientists and engineers on privacy and security related initiatives across the company. Prior to joining Sony AI, she spent more than eight years working in academia and at industry organizations. Lingjuan received her Ph.D. from the University of Melbourne. She was a recipient of the prestigious IBM PhD Fellowship Award Worldwide. Lingjuan’s current interest is trustworthy AI, mainly on federated learning, responsible foundation model development, data privacy, model robustness, IP protection, on-device AI, etc. She had published over 100 papers in top conferences and journals, including NeurIPS, ICML, ICLR, Nature, etc. She and her papers had won a long list of awards from top main venues, such as ICML Outstanding Paper Award, ACL Area Chair Award, CIKM Best Paper Runner-up Award (only 1), IEEE Outstanding Leadership Award, and many best paper awards from AAAI, IJCAI, WWW, KDD, etc.

Publications

How to Evaluate and Mitigate IP Infringement in Visual Generative AI?

ICML, 2025 | Zhenting Wang, Chen Chen, Vikash Sehwag, Minzhou Pan*, Lingjuan Lyu

The popularity of visual generative AI models like DALL-E 3, Stable Diffusion XL, Stable Video Diffusion, and Sora has been increasing. Through extensive evaluation, we discovered that the state-of-the-art visual generative models can generate content that bears a striking r...

Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models

CVPR, 2025 | Jie Ren, Kangrui Chen, Yingqian Cui, Shenglai Zeng, Hui Liu, Yue Xing, Jiliang Tang, Lingjuan Lyu

Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. However, the advancement of T2I diffusion models presents significant risks, as the models could be exploited for malicious purposes, suc...

CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI

CVPR, 2025 | Siyuan Cheng, Lingjuan Lyu, Zhenting Wang, Xiangyu Zhang, Vikash Sehwag

With the rapid advancement of generative AI, it is now pos-sible to synthesize high-quality images in a few seconds.Despite the power of these technologies, they raise signif-icant concerns regarding misuse. Current efforts to dis-tinguish between real and AI-generated image...

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

CVPR, 2025 | Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, Lingjuan Lyu

As scaling laws in generative AI push performance, they simultaneously concentrate the development of these models among actors with large computational resources. With a focus on text-to-image (T2I) generative models, we aim to unlock this bottleneck by demonstrating very l...

Argus: A Compact and Versatile Foundation Model for Vision

CVPR, 2025 | Weiming Zhuang, Chen Chen, Zhizhong Li, Sina Sajadmanesh, Jingtao Li, Jiabo Huang, Vikash Sehwag, Vivek Sharma, Hirotaka Shinozaki, Felan Carlo Garcia, Yihao Zhan, Naohiro Adachi, Ryoji Eki, Michael Spranger, Peter Stone, Lingjuan Lyu

While existing vision and multi-modal foundation models can handle multiple computer vision tasks, they often suffer from significant limitations, including huge demand for data and computational resources during training and inconsistent performance across vision tasks at d...

Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Model

WWW, 2025 | Jie Ren, Kangrui Chen, Chen Chen, Vikash Sehwag, Yue Xing, Jiliang Tang, Lingjuan Lyu

Large Language Models (LLMs) and Vision-Language Models (VLMs) have made significant advancements in a wide range of natural language processing and vision-language tasks. Access to large web-scale datasets has been a key factor in their success. However, concerns have been ...

Exploit Gradient Skewness to Circumvent Byzantine Defenses for Federated Learning

AAAI, 2025 | Yuchen Liu*, Chen Chen, Lingjuan Lyu, Yaochu Jin, Gang Chen*

Federated Learning (FL) is notorious for its vulnerability to Byzantine attacks. Most current Byzantine defenses share a common inductive bias: among all the gradients, the densely distributed ones are more likely to be honest. However, such a bias is a poison to Byzantine r...

FLoRA: Federated Fine-Tuning Large Language Models with Heterogeneous Low- Rank Adaptations

NEURIPS, 2024 | Lingjuan Lyu, Ziyao Wang, Zheyu Shen, Yexiao He, Guoheng Sun, Hongyi Wang, Ang Li

The rapid development of Large Language Models (LLMs) has been pivotal in advancing AI, with pre-trained LLMs being adaptable to diverse downstream tasks through fine-tuning. Federated learning (FL) further enhances fine-tuning in a privacy-aware manner by utilizing clients'...

pFedClub: Controllable Heterogeneous Model Aggregation for Personalized Federated Learning

NEURIPS, 2024 | Jiaqi Wang*, Lingjuan Lyu, Fenglong Ma*, Qi Li

Federated learning, a pioneering paradigm, enables collaborative model training without exposing users’ data to central servers. Most existing federated learning systems necessitate uniform model structures across all clients, restricting their practicality. Several methods ...

CURE4Rec: A Benchmark for Recommendation Unlearning with Deeper Influence

NEURIPS, 2024 | Chaochao Chen*, Yizhao Zhang*, Lingjuan Lyu, Yuyuan Li*, Jiaming Zhang, Li Zhang, Biao Gong, Chenggang Yan

With increasing privacy concerns in artificial intelligence, regulations have mandated the right to be forgotten, granting individuals the right to withdraw their data from models. Machine unlearning has emerged as a potential solution to enable selective forgetting in model...

FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection

NEURIPS, 2024 | Jiaqi Wang*, Lingjuan Lyu, Fenglong Ma*, Xiaochen Wang, Jinghui Chen

This study introduces the Federated Medical Knowledge Injection (FedMEKI) platform, a new benchmark designed to address the unique challenges of integrating medical knowledge into foundation models under privacy constraints. By leveraging a cross-silo federated learning appr...

DECO-Bench: Unified Benchmark for Decoupled Task-Agnostic Synthetic Data Release

NEURIPS, 2024 | Lingjuan Lyu, Vivek Sharma, Farzaneh Askari

In this work, we tackle the question of how to systematically benchmark task-agnostic decoupling methods for privacy-preserving machine learning (ML). Sharing datasets that include sensitive information often triggers privacy concerns, necessitating robust decoupling methods...

Masked Differential Privacy

ECCV, 2024 | Sina Sajadmanesh, Vikash Sehwag, Lingjuan Lyu, Vivek Sharma, David Schneider, Saquib Sarfraz, Rainer Stiefelhagen

Privacy-preserving computer vision is an important emerg- ing problem in machine learning and artificial intelligence. The prevalent methods tackling this problem use differential privacy or anonymization and obfuscation techniques to protect the privacy of individuals. In b...

A Simple Background Augmentation Method for Object Detection with Diffusion Model

ECCV, 2024 | Yuhang Li, Xin Dong, Chen Chen, Weiming Zhuang, Lingjuan Lyu

In computer vision, it is well-known that a lack of data diversity will impair model performance. In this study, we address the challenges of enhancing the dataset diversity problem in order to benefit various downstream tasks such as object detection and instance segmentati...

Finding a needle in a haystack: A Black-Box Approach to Invisible Watermark Detection

ECCV, 2024 | Minzhou Pan*, Zhenting Wang, Xin Dong, Vikash Sehwag, Lingjuan Lyu, Xue Lin*

In this paper, we propose WaterMark Detection (WMD), the first invisible watermark detection method under a black-box and annotation-free setting. WMD is capable of detecting arbitrary watermarks within a given reference dataset using a clean non watermarked dataset as a ref...

PerceptAnon: Exploring the Human Perception of Image Anonymization Beyond Pseudonymization for GDPR

ICML, 2024 | Kartik Patwari, Chen-Nee Chuah*, Lingjuan Lyu, Vivek Sharma

Current image anonymization techniques, largely focus on localized pseudonymization, typically modify identifiable features like faces or full bodies and evaluate anonymity through metrics such as detection and re-identification rates. However, this approach often overlooks ...

COALA: A Practical and Vision-Centric Federated Learning Platform

ICML, 2024 | Weiming Zhuang, Jian Xu, Chen Chen, Jingtao Li, Lingjuan Lyu

We present COALA, a vision-centric Federated Learning (FL) platform, and a suite of benchmarks for practical FL scenarios, which we categorize as task, data, and model levels. At the task level, COALA extends support from simple classification to 15 computer vision tasks, in...

How to Trace Latent Generative Model Generated Images without Artificial Watermark?

ICML, 2024 | Zhenting Wang, Vikash Sehwag, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas*, Shiqing Ma*

Latent generative models (e.g., Stable Diffusion) have become more and more popular, but concerns have arisen regarding potential misuse related to images generated by these models. It is, therefore, necessary to analyze the origin of images by inferring if a particular imag...

FedMef: Towards Memory-efficient Federated Dynamic Pruning

CVPR, 2024 | Hong Huang, Weiming Zhuang, Chen Chen, Lingjuan Lyu

Federated learning (FL) promotes decentralized training while prioritizing data confidentiality. However, its application on resource-constrained devices is challenging due to the high demand for computation and memory resources for training deep learning models. Neural netw...

DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models

ICLR, 2024 | Zhenting Wang, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas*, Shiqing Ma*

Recent text-to-image diffusion models have shown surprising performance in generating high-quality images. However, concerns have arisen regarding the unauthorized data usage during the training or fine-tuning process. One example is when a model trainer collects a set of im...

FedWon: Triumphing Multi-domain Federated Learning Without Normalization

ICLR, 2024 | Weiming Zhuang, Lingjuan Lyu

Federated learning (FL) enhances data privacy with collaborative in-situ training on decentralized clients. Nevertheless, FL encounters challenges due to non-independent and identically distributed (non-i.i.d) data, leading to potential performance degradation and hindered c...

Detecting, Explaining, and Mitigating Memorization in Diffusion Models

ICLR, 2024 | Yuxin Wen, Yuchen Liu*, Chen Chen, Lingjuan Lyu

Recent breakthroughs in diffusion models have exhibited exceptional image-generation capabilities. However, studies show that some outputs are merely replications of training data. Such replications present potential legal challenges for model owners, especially when the gen...

FedP3: Federated Personalized and Privacy-friendly Network Pruning under Model Heterogeneity

ICLR, 2024 | Kai Yi, Nidham Gazagnadou, Peter Richtárik*, Lingjuan Lyu

The interest in federated learning has surged in recent research due to its unique ability to train a global model using privacy-secured information held locally on each client. This paper pays particular attention to the issue of client-side model heterogeneity, a pervasive...

Privacy Assessment on Reconstructed Images: Are Existing Evaluation Metrics Faithful to Human Perception?

NEURIPS, 2023 | Xiaoxiao Sun*, Nidham Gazagnadou, Vivek Sharma, Lingjuan Lyu, Hongdong Li*, Liang Zheng*

Hand-crafted image quality metrics, such as PSNR and SSIM, are commonly used to evaluate model privacy risk under reconstruction attacks. Under these metrics, reconstructed images that are determined to resemble the original one generally indicate more privacy leakage. Image...

UltraRE: Enhancing RecEraser for Recommendation Unlearning via Error Decomposition

NEURIPS, 2023 | Yuyuan Li*, Chaochao Chen*, Yizhao Zhang*, Weiming Liu*, Lingjuan Lyu, Xiaolin Zheng*, Dan Meng*, Jun Wang*

With growing concerns regarding privacy in machine learning models, regulations have committed to granting individuals the right to be forgotten while mandating companies to develop non-discriminatory machine learning systems, thereby fueling the study of the machine unlearn...

Towards Personalized Federated Learning via Heterogeneous Model Reassembly

NEURIPS, 2023 | Jiaqi Wang*, Xingyi Yang*, Suhan Cui*, Liwei Che*, Lingjuan Lyu, Dongkuan Xu*, Fenglong Ma*

This paper focuses on addressing the practical yet challenging problem of model heterogeneity in federated learning, where clients possess models with different network structures. To track this problem, we propose a novel framework called pFedHR, which leverages heterogeneo...

Is Heterogeneity Notorious? Taming Heterogeneity to Handle Test-Time Shift in Federated Learning

NEURIPS, 2023 | Yue Tan, Chen Chen, Weiming Zhuang, Xin Dong, Lingjuan Lyu, Guodong Long*

Federated learning (FL) is an effective machine learning paradigm where multiple clients can train models based on heterogeneous data in a decentralized manner without accessing their private data. However, existing FL systems undergo performance deterioration due to feature...

Where Did I Come From? Origin Attribution of AI-Generated Images

NEURIPS, 2023 | Zhenting Wang, Chen Chen, Yi Zeng, Lingjuan Lyu, Shiqing Ma*

Image generation techniques have been gaining increasing attention recently, but concerns have been raised about the potential misuse and intellectual property (IP) infringement associated with image generation models. It is, therefore, necessary to analyze the origin of ima...

MAS: Towards Resource-Efficient Federated Multiple-Task Learning

ICCV, 2023 | Weiming Zhuang, Yonggang Wen*, Shuai Zhang*, Lingjuan Lyu

Federated learning (FL) is an emerging distributed machine learning method that empowers in-situ model training on decentralized edge devices. However, multiple simultaneous FL tasks could overload resource-constrained devices. In this work, we propose the first FL system to...

The Perils of Learning From Unlabeled Data: Backdoor Attacks on Semi-supervised Learning

ICCV, 2023 | Virat Shejwalkar, Lingjuan Lyu, Amir Houmansadr*

Semi-supervised machine learning (SSL) is gaining popularity as it reduces the cost of training ML models. It does so by using very small amounts of (expensive, well-inspected) labeled data and large amounts of (cheap, non-inspected) unlabeled data. SSL has shown comparable ...

TARGET: Federated Class-Continual Learning via Exemplar-Free Distillation

ICCV, 2023 | Jie Zhang*, Chen Chen, Weiming Zhuang, Lingjuan Lyu

This paper focuses on an under-explored yet important problem: Federated Class-Continual Learning (FCCL), where new classes are dynamically added in federated learning. Existing FCCL works suffer from various limitations, such as requiring additional datasets or storing the ...

A Pathway Towards Responsible AI Generated Content

IJCAI, 2023 | Lingjuan Lyu

AI Generated Content (AIGC) has received tremendous attention within the past few years, with content ranging from image, text, to audio, video, etc. Meanwhile, AIGC has become a double-edged sword and recently received much criticism regarding its responsible usage. In this...

RAIN: RegulArization on Input and Network for Black-Box Domain Adaptation

IJCAI, 2023 | Qucheng Peng*, Zhengming Ding*, Lingjuan Lyu, Lichao Sun*, Chen Chen

Source-Free domain adaptation transits the source-trained model towards target domain without exposing the source data, trying to dispel these concerns about data privacy and security. However, this paradigm is still at risk of data leakage due to adversarial attacks on the ...

FedSampling: A Better Sampling Strategy for Federated Learning

IJCAI, 2023 | Tao Qi*, Fangzhao Wu*, Lingjuan Lyu, Yongfeng Huang*, Xing Xie*

Federated learning (FL) is an important technique for learning models from decentralized data in a privacy-preserving way. Existing FL methods usually uniformly sample clients for local model learning in each round. However, different clients may have significantly different...

Reducing Communication for Split Learning by Randomized Top-k Sparsification

IJCAI, 2023 | Fei Zheng*, Chaochao Chen*, Lingjuan Lyu, Binhui Yao*

The EU AI Act proposal addresses, among other applications, AI systems that enable facial classification and emotion recognition. As part of previous work, we have investigated how citizens deliberate about the validity of AI-based facial classifications in the advertisement...

Meta-Sift: How to Sift Out a Clean Subset in the Presence of Data Poisoning?

USENIX SECURITY, 2023 | Yi Zeng, Minzhou Pan*, Himanshu Jahagirdar*, Lingjuan Lyu, Ruoxi Jia*

External data sources are increasingly being used to train machine learning (ML) models as the data demand increases. However, the integration of external data into training poses data poisoning risks, where malicious providers manipulate their data to compromise the utility...

PrivateRec: Differentially Private Model Training and Online Serving for Federated News Recommendation.

KDD, 2023 | Ruixuan Liu*, Yanlin Wang*, Yang Cao*, Lingjuan Lyu, Weike Pan*, Yun Chen*, Hong Chen*

Collecting and training over sensitive personal data raise severe privacy concerns in personalized recommendation systems, and federated learning can potentially alleviate the problem by training models over decentralized user data.However, a theoretically private solution i...

Byzantine-Robust Learning on Heterogeneous Data via Gradient Splitting

ICML, 2023 | Yuchen Liu*, Chen Chen, Lingjuan Lyu, Fangzhao Wu*, Sai Wu*, Gang Chen*

Federated learning has exhibited vulnerabilities to Byzantine attacks, where the Byzantine attackers can send arbitrary gradients to the central server to destroy the convergence and performance of the global model. A wealth of defenses have been proposed to defend against B...

Revisiting Data-Free Knowledge Distillation with Poisoned Teachers

ICML, 2023 | Junyuan Hong, Yi Zeng, Shuyang Yu*, Lingjuan Lyu, Ruoxi Jia*, Jiayu Zhou*

Data-free knowledge distillation (KD) helps realistically transfer knowledge from a pre-trained model (known as the teacher model) to a smaller model (known as the student model) without access to the original training data used for training the teacher model. However, the s...

Dimension-independent Certified Neural Network Watermarks via Mollifier Smoothing

ICML, 2023 | Jiaxiang Ren*, Yang Zhou*, Jiayin Jin*, Lingjuan Lyu, Da Yan*

Certified_Watermarks is the first to provide a watermark certificate against 𝑙2-norm watermark removal attacks, by leveraging the randomized smoothing techniques for certified robustness to adversarial attacks. However, the randomized smoothing techniques suffer from hardnes...

Fast Federated Machine Unlearning with Nonlinear Functional Theory

ICML, 2023 | Tianshi Che*, Yang Zhou*, Zijie Zhang*, Lingjuan Lyu, Ji Liu*, Da Yan*, Dejing Dou*, Jun Huan*

Federated machine unlearning (FMU) aims to remove the influence of a specified subset of training data upon request from a trained federated learning model. Despite achieving remarkable performance, existing FMU techniques suffer from inefficiency due to two sequential opera...

Reconstructive Neuron Pruning for Backdoor Defense

ICML, 2023 | Yige Li*, Xixiang Lyu*, Xingjun Ma*, Nodens Koren*, Lingjuan Lyu, Bo Li*, Yu-Gang Jiang*

Deep neural networks (DNNs) have been found to be vulnerable to backdoor attacks, raising security concerns about their deployment in mission-critical applications. While existing defense methods have demonstrated promising results, it is still not clear how to effectively r...

Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark.

ACL, 2023 | Wenjun Peng*, Jingwei Yi*, Fangzhao Wu*, Shangxi Wu*, Bin Bin Zhu*, Lingjuan Lyu, Binxing Jiao*, Guangzhong Sun*, Xing Xie*

Large language models (LLMs) have demonstrated powerful capabilities in both text understanding and generation. Companies have begun to offer Embedding as a Service (EaaS) based on these LLMs, which can benefit various natural language processing (NLP) tasks for customers. H...

Towards Adversarially Robust Continual Learning

ICASSP, 2023 | Tao Bai, Chen Chen, Lingjuan Lyu, Jun Zhao*, Bihan Wen*

Recent studies show that models trained by continual learning can achieve the comparable performances as the standard supervised learning and the learning flexibility of continual learning models enables their wide applications in the real world. Deep learning models, howeve...

MocoSFL: enabling cross-client collaborative self-supervised learning

ICLR, 2023 | Jingtao Li, Lingjuan Lyu, Daisuke Iso, Chaitali Chakrabarti*, Michael Spranger

Existing collaborative self-supervised learning (SSL) schemes are not suitable for cross-client applications because of their expensive computation and large local data requirements. To address these issues, we propose MocoSFL, a collaborative SSL framework based on Split Fe...

IDEAL: Query-Eﬀicient Data-Free Learning from Black-Box Models

ICLR, 2023 | Jie Zhang*, Chen Chen, Lingjuan Lyu

Knowledge Distillation (KD) is a typical method for training a lightweight student model with the help of a well-trained teacher model. However, most KD methods require access to either the teacher's training data or model parameter, which is unrealistic. To tackle this prob...

Twofer: Tackling Continual Domain Shift with Simultaneous Domain Generalization and Adaptation

ICLR, 2023 | Chenxi Liu*, Lixu Wang, Lingjuan Lyu, Chen Sun*, Xiao Wang*, Qi Zhu*

In real-world applications, deep learning models often run in non-stationary environments where the target data distribution continually shifts over time. There have been numerous domain adaptation (DA) methods in both online and offline modes to improve cross-domain adaptat...

MECTA: Memory-Economic Continual Test-Time Model Adaptation

ICLR, 2023 | Junyuan Hong, Lingjuan Lyu, Jiayu Zhou*, Michael Spranger

Continual Test-time Adaptation (CTA) is a promising art to secure accuracy gains in continually-changing environments. The state-of-the-art adaptations improve out-of-distribution model accuracy via computation-efficient online test-time gradient descents but meanwhile cost ...

Towards Robustness Certification Against Universal Perturbations

ICLR, 2023 | Yi Zeng, Zhouxing Shi*, Ming Jin*, Feiyang Kang*, Lingjuan Lyu, Cho-Jui Hsieh*, Ruoxi Jia*

In this paper, we investigate the problem of certifying neural network robustness against universal perturbations (UPs), which have been widely used in universal adversarial attacks and backdoor attacks. Existing robustness certification methods aim to provide robustness gua...

Minimum Topology Attacks for Graph Neural Networks

WWW, 2023 | Mengmei Zhang*, Xiao Wang*, Chuan Shi*, Lingjuan Lyu, Tianchi Yang*, Junping Du*

With the great popularity of Graph Neural Networks (GNNs), their robustness to adversarial topology attacks has received increasing attention. Although many attack methods have been proposed, they mainly focus on fixed-budget attacks, aiming at finding the most adversarial p...

Defending Against Backdoor Attacks in Natural Language Generation

AAAI, 2023 | Xiaofei Sun*, Xiaoya Li*, Yuxian Meng*, Xiang Ao*, Lingjuan Lyu, Jiwei Li*, Tianwei Zhang*

The frustratingly fragile nature of neural network models make current natural language generation (NLG) systems prone to backdoor attacks and generate malicious sequences that could be sexist or offensive. Unfortunately, little effort has been invested to how backdoor attac...

Delving into the Adversarial Robustness of Federated Learning

AAAI, 2023 | Zijie Zhang*, Bo Li*, Chen Chen, Lingjuan Lyu, Shuang Wu*, Shouhong Ding*, Chao Wu*

In Federated Learning (FL), models are as fragile as centrally trained models against adversarial examples. However, the adversarial robustness of federated learning remains largely unexplored. This paper casts light on the challenge of adversarial robustness of federated le...

Outsourcing Training without Uploading Data via Eﬀicient Collaborative Open-Source Sampling

NEURIPS, 2022 | Junyuan Hong, Lingjuan Lyu, Jiayu Zhou*, Michael Spranger

As deep learning blooms with growing demand for computation and data resources, outsourcing model training to a powerful cloud server becomes an attractive alternative to training at a low-power and cost-effective end device. Traditional outsourcing requires uploading device...

Calibrated Federated Adversarial Training with Label Skewness

NEURIPS, 2022 | Chen Chen, Yuchen Liu*, Xingjun Ma*, Lingjuan Lyu

Recent studies have shown that, like traditional machine learning, federated learning (FL) is also vulnerable to adversarial attacks.To improve the adversarial robustness of FL, few federated adversarial training (FAT) methods have been proposed to apply adversarial training...

DENSE: Data-Free One-Shot Federated Learning

NEURIPS, 2022 | Jie Zhang*, Chen Chen, Bo Li*, Lingjuan Lyu, Shuang Wu*, Shouhong Ding*, Chunhua Shen*, Chao Wu*

One-shot Federated Learning (FL) has recently emerged as a promising approach, which allows the central server to learn a model in a single communication round. Despite the low communication cost, existing one-shot FL methods are mostly impractical or face inherent limitatio...

CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks

NEURIPS, 2022 | Xuanli He*, Qiongkai Xu*, Yi Zeng, Lingjuan Lyu, Fangzhao Wu*, Jiwei Li*, Ruoxi Jia*

Previous works have validated that text generation APIs can be stolen through imitation attacks, causing IP violations. In order to protect the IP of text generation APIs, a recent work has introduced a watermarking algorithm and utilized the null-hypothesis test as a post-h...

Prompt Certified Machine Unlearning with Randomized Gradient Smoothing and Quantization

NEURIPS, 2022 | Zijie Zhang*, Xin Zhao*, Tianshi Che*, Yang Zhou*, Lingjuan Lyu

The right to be forgotten calls for efficient machine unlearning techniques that make trained machine learning models forget a cohort of data. The combination of training and unlearning operations in traditional machine unlearning methods often leads to the expensive computa...

FairVFL: A Fair Vertical Federated Learning Framework with Contrastive Adversarial Learning

NEURIPS, 2022 | Tao Qi*, Fangzhao Wu*, Chuhan Wu*, Lingjuan Lyu, Tong Xu*, Hao Liao*, Zhongliang Yang*, Yongfeng Huang*, Xing Xie*

Vertical federated learning (VFL) is a privacy-preserving machine learning paradigm that can learn models from features distributed on different platforms in a privacy-preserving way. Since in real-world applications the data may contain bias on fairness-sensitive features (...

FedSkip: Combatting Statistical Heterogeneity with Federated Skip Aggregation

ICDM, 2022 | Ziqing Fan*, Yanfeng Wang*, Jiangchao Yao*, Lingjuan Lyu, Ya Zhang*, Qi Tian*

The statistical heterogeneity of the non-independent and identically distributed (non-IID) data in local clients significantly limits the performance of federated learning. Previous attempts like FedProx, SCAFFOLD, MOON, FedNova and FedDyn resort to an optimization perspecti...

Privacy and Robustness in Federated Learning: Attacks and Defenses

TNNLS, 2022 | Lingjuan Lyu, Han Yu*, Xingjun Ma*, Chen Chen, Lichao Sun*, Jun Zhao*, Qiang Yang*, Philip S. Yu*

As data are increasingly being stored in different silos and societies becoming more aware of data privacy issues, the traditional centralized training of artificial intelligence (AI) models are facing efficiency and privacy challenges. Recently, federated learning (FL) has ...

Beyond Model Extraction: Imitation Attack for Black-Box NLP APIs

COLING, 2022 | Qiongkai Xu*, Xuanli He*, Lingjuan Lyu, Lizhen Qu*, Gholamreza Haffari*

Machine-learning-as-a-service (MLaaS) has attracted millions of users to their splendid large-scale models. Although published as black-box APIs, the valuable models behind these services are still vulnerable to imitation attacks. Recently, a series of works have demonstrate...

Cross-Network Social User Embedding with Hybrid Differential Privacy Guarantees

CIKM, 2022 | Jiaqian Ren*, Lei Jiang*, Hao Peng*, Lingjuan Lyu, Zhiwei Liu*, Chaochao Chen*, Jia Wu*, Xu Bai*, Philip S. Yu*

Integrating multiple online social networks (OSNs) has important implications for many downstream social mining tasks, such as user preference modelling, recommendation, and link prediction. However, it is unfortunately accompanied by growing privacy concerns about leaking s...

Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models

EMNLP, 2022 | Zhiyuan Zhang*, Lingjuan Lyu, Xingjun Ma*, Chenguang Wang*, Xu Sun*

Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks. In Natural Language Processing (NLP), DNNs are often backdoored during the fine-tuning process of a large-scale Pre-trained Language Model (PLM) with poisoned samples. Although the clean weights of P...

Extracted BERT Model Leaks More Information than You Think!

EMNLP, 2022 | Xuanli He*, Chen Chen, Lingjuan Lyu, Qiongkai Xu*

The collection and availability of big data, combined with advances in pre-trained models (e.g. BERT), have revolutionized the predictive performance of natural language processing tasks. This allows corporations to provide machine learning as a service (MLaaS) by encapsulat...

Privacy for Free: How does Dataset Condensation Help Privacy?

ICML, 2022 | Tian Dong, Bo Zhao*, Lingjuan Lyu

To prevent unintentional data leakage, research community has resorted to data generators that can produce differentially private data for model training. However, for the sake of the data privacy, existing solutions suffer from either expensive training cost or poor general...

Accelerated Federated Learning with Decoupled Adaptive Optimization

ICML, 2022 | Jiayin Jin*, Jiaxiang Ren*, Yang Zhou*, Lingjuan Lyu, Ji Liu*, Dejing Dou*

The federated learning (FL) framework enables edge clients to collaboratively learn a shared inference model while keeping privacy of training data on clients. Recently, many heuristics efforts have been made to generalize centralized adaptive optimization methods, such as S...

A Federated Graph Neural Network Framework for Privacy-Preserving Personalization

NATURE COMMUNICATIONS, 2022 | Yongfeng Huang*, Chuhan Wu*, Fangzhao Wu*, Lingjuan Lyu, Tao Qi*, Xing Xie*

Graph neural network (GNN) is effective in modeling high-order interactions and has been widely used in various personalized applications such as recommendation. However, mainstream personalization methods rely on centralized GNN learning on global graphs, which have conside...

Heterogeneous Graph Node Classification with Multi-Hops Relation Features

ICASSP, 2022 | Xiaolong Xu*, Lingjuan Lyu, Hong Jin*, Weiqiang Wang*, Shuo Jia*

In recent years, knowledge graph~(KG) has obtained many achievements in both research and industrial fields. However, most KG algorithms consider node embedding with only structure and node features, but not relation features. In this paper, we propose a novel Heterogeneous ...

How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data

ICLR, 2022 | Zhiyuan Zhang*, Lingjuan Lyu, Weiqiang Wang*, Lichao Sun*, Xu Sun*

Since training a large-scale backdoored model from scratch requires a large training dataset, several recent attacks have considered to inject backdoors into a trained clean model without altering model behaviors on the clean data. Previous work finds that backdoors can be i...

Vertically Federated Graph Neural Network for Privacy-Preserving Node Classification

IJCAI, 2022 | Chaochao Chen*, Longfei Zheng*, Huiwen Wu*, Lingjuan Lyu, Jun Zhou*, Jia Wu*, Bingzhe Wu*, Ziqi Liu*, Li Wang*, Xiaolin Zheng*

Graph Neural Network (GNN) has achieved remarkable progresses in various real-world tasks on graph data. High-performance GNN models always depend on both rich features and complete edge information in graph. However, such information could possibly be isolated by different ...

Data- Free Adversarial Knowledge Distillation for Graph Neural Networks

IJCAI, 2022 | Yuanxin Zhuang*, Lingjuan Lyu, Chuan Shi*, Carl Yang*, Lichao Sun*

Graph neural networks (GNNs) have been widely used in modeling graph structured data, owing to its impressive performance in a wide range of practical applications. Recently, knowledge distillation (KD) for GNNs has enabled remarkable progress in graph model compression and ...

Decision Boundary-aware Data Augmentation for Adversarial Training

TDSC, 2022 | Chen Chen, Jingfeng Zhang*, Xilie Xu*, Lingjuan Lyu, Chaochao Chen*, Tianlei Hu*, Gang Chen*

Adversarial training (AT) is a typical method to learn adversarially robust deep neural networks via training on the adversarial variants generated by their natural examples. However, as training progresses, the training data becomes less attackable, which may undermine the ...

Communication-Efficient Federated Learning via Knowledge Distillation

NATURE COMMUNICATIONS, 2022 | Yongfeng Huang*, Chuhan Wu*, Fangzhao Wu*, Lingjuan Lyu, Xing Xie*

Federated learning is a privacy-preserving machine learning technique to train intelligent models from decentralized data, which enables exploiting private data by communicating local model updates in each iteration of model learning rather than the raw data. However, model ...

Practical Attribute Reconstruction Attack Against Federated Learning

IEEE TRANSACTIONS ON BIG DATA, 2022 | Chen Chen, Lingjuan Lyu, Han Yu*, Gang Chen*

Existing federated learning (FL) designs have been shown to exhibit vulnerabilities which can be exploited by adversaries to compromise data privacy. However, most current works conduct attacks by leveraging gradients calculated on a small batch of data. This setting is not ...

Traffic Anomaly Prediction Based on Joint Static-Dynamic Spatio-Temporal Evolutionary Learning

TKDE, 2022 | Xiaoming Liu*, Zhanwei Zhang*, Lingjuan Lyu, Zhaohan Zhang*, Shuai Xiao*, Chao Shen*, Philip Yu*

Accurate traffic anomaly prediction offers an opportunity to save the wounded at the right location in time. However, the complex process of traffic anomaly is affected by both various static factors and dynamic interactions. The recent evolving representation learning provi...

Differential Private Knowledge Transfer for Privacy-Preserving Cross-Domain Recommendation

WWW, 2022 | Chaochao Chen*, Huiwen Wu*, Jiajie Su*, Lingjuan Lyu, Xiaolin Zheng*, Li Wang*

Cross Domain Recommendation (CDR) has been popularly studied to alleviate the cold-start and data sparsity problem commonly existed in recommender systems. CDR models can improve the recommendation performance of a target domain by leveraging the data of other source domains...

GEAR: A Margin-based Federated Adversarial Training Approach

AAAI, 2022 | Chen Chen, Jie Zhang*, Lingjuan Lyu

Previous studies have shown that federated learning (FL) is vulnerable to well-crafted adversarial examples. Some recent efforts tried to combine adversarial training with FL, i.e., federated adversarial training (FAT), in order to achieve adversarial robustness in FL. Howev...

Byzantine-resilient Federated Learning via Gradient Memorization

AAAI, 2022 | Chen Chen, Lingjuan Lyu, Yuchen Liu*, Fangzhao Wu*, Chaochao Chen*, Gang Chen*

Federated learning (FL) provides a privacy-aware learning framework by enabling a multitude of participants to jointly construct models without collecting their private training data. However, federated learning has exhibited vulnerabilities to Byzantine attacks. Many existi...

FedBERT: When Federated Learning Meets Pre-Training

ACM TIST, 2022 | Yuanyishu Tian*, Yao Wan*, Lingjuan Lyu, Dezhong Yao*, Hai Jin*, Lichao Sun*

The fast growth of pre-trained models (PTMs) has brought natural language processing to a new era, which becomes a dominant technique for various natural language processing (NLP) applications. Every user can download weights of PTMs, then fine-tune the weights on a task on ...

FedCTR: Federated Native Ad CTR Prediction with Cross Platform User Behavior Data

ACM TIST, 2022 | Chuhan Wu*, Fangzhao Wu*, Lingjuan Lyu, Yongfeng Huang*, Xing Xie*

Native ad is a popular type of online advertisement which has similar forms with the native content displayed on websites. Native ad CTR prediction is useful for improving user experience and platform revenue. However, it is challenging due to the lack of explicit user inten...

DADFNet: Dual Attention and Dual Frequency-Guided Dehazing Network for Video-Empowered Intelligent Transportation

AAAI, 2022 | Yu Guo*, Wen Liu*, Jiangtian Nie*, Lingjuan Lyu, Zehui Xiong*, Jiawen Kang*, Han Yu*, Dusit Niyato*

Visual surveillance technology is an indispensable functional component of advanced traffic management systems. It has been applied to perform traffic supervision tasks, such as object detection, tracking and recognition. However, adverse weather conditions, e.g., fog, haze ...

Protecting Intellectual Property of Language Generation APIs with Lexical Watermark

AAAI, 2022 | Xuanli He*, Qiongkai Xu*, Lingjuan Lyu, Fangzhao Wu*, Chenguang Wang*

Nowadays, due to the breakthrough in natural language generation (NLG), including machine translation, document summarization, image captioning, etc NLG models have been encapsulated in cloud APIs to serve over half a billion people worldwide and process over one hundred bil...

Exploiting Data Sparsity in Secure Cross-Platform Social Recommendation

NEURIPS, 2021 | Jamie Cui*, Chaochao Chen*, Lingjuan Lyu, Carl Yang*, Li Wang*

Social recommendation has shown promising improvements over traditional systems since it leverages social correlation data as an additional input. Most existing works assume that all data are available to the recommendation platform. However, in practice, user-item interacti...

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

NEURIPS, 2021 | Yige Li*, Xixiang Lyu*, Nodens Koren*, Lingjuan Lyu, Bo Li*, Xingjun Ma*

Backdoor attack has emerged as a major security threat to deep neural networks(DNNs). While existing defense methods have demonstrated promising results on detecting and erasing backdoor triggers, it is still not clear if measures can be taken to avoid the triggers from bein...

Gradient Driven Rewards to Guarantee Fairness in Collaborative Machine Learning

NEURIPS, 2021 | Xu Xinyi*, Lingjuan Lyu, Xingjun Ma*, Chenglin Miao*, Chuan-Sheng Foo*, Bryan Kian Hsiang Low*

Collaborative machine learning provides a promising framework for different agents to pool their resources (e.g., data) for a common learning task. In realistic settings where agents are self-interested and not altruistic, they may be unwilling to share data or model without...

Data Poisoning Attacks on Federated Machine Learning

IEEE IOT-J, 2021 | Gan Sun*, Yang Cong*, Jiahua Dong*, Qiang Wang*, Lingjuan Lyu, Ji Liu*

Federated machine learning which enables resource-constrained node devices (e.g., Internet of Things (IoT) devices, smartphones) to establish a knowledge-shared model while keeping the raw data local, could provide privacy preservation and economic benefit by designing an ef...

Joint Stance and Rumor Detection in Hierarchical Heterogeneous Graph

IEEE TNNLS, 2021 | Chen li*, Hao Peng*, Jianxin Li*, Lichao Sun*, Lingjuan Lyu, Lihong Wang*, Philip Yu*, Lifang He*

Recently, large volumes of false or unverified information (e.g., fake news and rumors) appear frequently in emerging social media, which are often discussed on a large scale and widely disseminated, causing bad consequences. Many studies on rumor detection indicate that the...

FLEAM: A Federated Learning Empowered Architecture to Mitigate DDoS in Industrial IoT

IEEE TII, 2021 | Jianhua Li*, Lingjuan Lyu, Ximeng Liu*, Xuyun Zhang*, Xixiang Lyu*

A Novel Attribute Reconstruction Attack in Federated Learning

IJCAI, 2021 | Lingjuan Lyu, Chen Chen

Federated learning (FL) emerged as a promising learning paradigm to enable a multitude of partici- pants to construct a joint ML model without expos- ing their private training data. Existing FL designs have been shown to exhibit vulnerabilities which can be exploited by adv...

Blog Posts

Celebrating the Women of Sony AI: Sharing Insights, Inspiration, and Advice

March 29, 2024 | Alice Xiang, Life at Sony AI, Yunshu Du, Lingjuan Lyu, Lison Abecassis, Andreanne Lemay, Kana Maruyama

In March, the world commemorates the accomplishments of women throughout history and celebrates those of today. The United States observes March as ...

Sony AI Reveals New Research Contributions at NeurIPS 2023

December 13, 2023 | Peter Stone, Alice Xiang, Jerone Andrews, Events, Kazuki Shimada, Apostolos Modas, Tarek Besold, William Thong, Dora Zhao*, Lingjuan Lyu, Orestis Papakyriakopoulos*, Xin Dong, Nidham Gazagnadou, Weiming Zhuang, Vivek Sharma, Yuki Mitsufuji, Chen Chen

Sony Group Corporation and Sony AI have been active participants in the annual NeurIPS Conference for years, contributing pivotal research that has ...

Advancements in Federating Learning Highlighted in Papers Presented at ICCV 2023

October 6, 2023 | Lingjuan Lyu, PPML, Weiming Zhuang

As the field of machine learning continues to evolve, Sony AI researchers are constantly exploring innovative solutions to address the pressing ...

Privacy-Preserving Machine Learning Blog Series: Practicing Privacy by Design

August 7, 2023 | Machine Learning, Lingjuan Lyu, Nidham Gazagnadou

Privacy-Preserving Machine Learning Blog Series

Recent Breakthroughs Tackle Challenges in Federated Learning

June 8, 2023 | Machine Learning, Lingjuan Lyu, Weiming Zhuang

Privacy-Preserving Machine Learning Blog Series At Sony AI, the Privacy-Preserving Machine Learning (PPML) team focuses on fundamental and applied ...

Meet the Team #2: Lingjuan, Jerone and Roberto

November 29, 2021 | Life at Sony AI, Lingjuan Lyu, Jerone T. A. Andrews, Roberto Capobianco

What do privacy, pattern recognition, and percussion all have in common? They are concepts and creative endeavors that have inspired Sony AI team ...