Authors

* External authors

Venue

Date

Share

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

Junghyun Koo*

Marco A. Martínez-Ramírez

Wei-Hsiang Liao

Stefan Uhlich*

Kyogu Lee*

Yuki Mitsufuji

* External authors

ICASSP 2023

2023

Abstract

We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a reference music recording. All our models are trained in a self-supervised manner from an already-processed wet multitrack dataset with an effective data preprocessing method that alleviates the data scarcity of obtaining unprocessed dry data. We analyze the proposed encoder for the disentanglement capability of audio effects and also validate its performance for mixing style transfer through both objective and subjective evaluations. From the results, we show the proposed system not only converts the mixing style of multitrack audio close to a reference but is also robust with mixture-wise style transfer upon using a music source separation model.

Related Publications

SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning

ECCV, 2024
Bac Nguyen, Stefan Uhlich*, Fabien Cardinaux*, Lukas Mauch*, Marzieh Edraki*, Aaron Courville*

Handling distribution shifts from training data, known as out-of-distribution (OOD) generalization, poses a significant challenge in the field of machine learning. While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-shot performance, further …

On the Language Encoder of Contrastive Cross-modal Models

ACL, 2024
Mengjie Zhao*, Junya Ono*, Zhi Zhong*, Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Takashi Shibuya, Hiromi Wakaki*, Yuki Mitsufuji, Wei-Hsiang Liao

Contrastive cross-modal models such as CLIP and CLAP aid various vision-language (VL) and audio-language (AL) tasks. However, there has been limited investigation of and improvement in their language encoder, which is the central component of encoding natural language descri…

DiffuCOMET: Contextual Commonsense Knowledge Diffusion

ACL, 2024
Silin Gao*, Mete Ismayilzada*, Mengjie Zhao*, Hiromi Wakaki*, Yuki Mitsufuji, Antoine Bosselut*

Inferring contextually-relevant and diverse commonsense to understand narratives remains challenging for knowledge models. In this work, we develop a series of knowledge models, DiffuCOMET, that leverage diffusion to learn to reconstruct the implicit semantic connections bet…

  • HOME
  • Publications
  • Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.