Authors

* External authors

Venue

Date

Share

FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

Chieh-Hsin Lai

Yuhta Takida

Naoki Murata

Toshimitsu Uesaka

Yuki Mitsufuji

Stefano Ermon*

* External authors

ICML 2023

2023

Abstract

Score-based generative models learn a family of noise-conditional score functions corresponding to the data density perturbed with increasingly large amounts of noise. These perturbed data densities are tied together by the Fokker-Planck equation (FPE), a partial differential equation (PDE) governing the spatial-temporal evolution of a density undergoing a diffusion process. In this work, we derive a corresponding equation, called the score FPE that characterizes the noise-conditional scores of the perturbed data densities (i.e., their gradients). Surprisingly, despite impressive empirical performance, we observe that scores learned via denoising score matching (DSM) do not satisfy the underlying score FPE. We prove that satisfying the FPE is desirable as it improves the likelihood and the degree of conservativity. Hence, we propose to regularize the DSM objective to enforce satisfaction of the score FPE, and we show the effectiveness of this approach across various datasets.

Related Publications

Can Large Language Models Predict Audio Effects Parameters from Natural Language?

WASPAA, 2025
Seungheon Doh, Junghyun Koo*, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Juhan Nam, Yuki Mitsufuji

In music production, manipulating audio effects (Fx) parameters through natural language has the potential to reduce technical barriers for non-experts. We present LLM2Fx, a framework leveraging Large Language Models (LLMs) to predict Fx parameters directly from textual desc…

Large-Scale Training Data Attribution for Music Generative Models via Unlearning

ICML, 2025
Woosung Choi, Junghyun Koo*, Kin Wai Cheuk, Joan Serrà, Marco A. Martínez-Ramírez, Yukara Ikemiya, Naoki Murata, Yuhta Takida, Wei-Hsiang Liao, Yuki Mitsufuji

This paper explores the use of unlearning methods for training data attribution (TDA) in music generative models trained on large-scale datasets. TDA aims to identify which specific training data points contributed to the generation of a particular output from a specific mod…

Fx-Encoder++: Extracting Instrument-Wise Audio Effects Representations from Mixtures

ISMIR, 2025
Yen-Tung Yeh, Junghyun Koo*, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yi-Hsuan Yang, Yuki Mitsufuji

General-purpose audio representations have proven effective across diverse music information retrieval applications, yet their utility in intelligent music production remains limited by insufficient understanding of audio effects (Fx). Although previous approaches have empha…

  • HOME
  • Publications
  • FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.