Authors
- Eloi Moliner
- Marco A. Martínez-Ramírez
- Junghyun Koo*
- Wei-Hsiang Liao
- Kin Wai Cheuk
- Joan Serrà
- Vesa Välimäki
- Yuki Mitsufuji
* External authors
Venue
- ICASSP-26
Date
- 2026
Automatic Music Mixing Using a Generative Model of Effect Embeddings
Eloi Moliner
Marco A. Martínez-Ramírez
Kin Wai Cheuk
Vesa Välimäki
* External authors
ICASSP-26
2026
Abstract
Music mixing involves combining individual tracks into a cohesive mixture, a task characterized by subjectivity where multiple valid solutions exist for the same input. Existing automatic mixing systems treat this task as a deterministic regression problem, thus ignoring this multiplicity of solutions. Here we introduce MEGAMI (Multitrack Embedding Generative Auto MIxing), a generative framework that models the conditional distribution of professional mixes given unprocessed tracks. MEGAMI uses a track-agnostic effects processor conditioned on per-track generated embeddings, handles arbitrary unlabeled tracks through a permutation-equivariant architecture, and enables training on both dry and wet recordings via domain adaptation. Our objective evaluation using distributional metrics shows consistent improvements over existing methods, while listening tests indicate performances approaching human-level quality across diverse musical genres.
Related Publications
Classifier-Free Guidance (CFG) is a widely used technique for conditional generation and improving sample quality in continuous diffusion models, and recent works have extended it to discrete diffusion. This paper theoretically analyzes CFG in the context of masked discrete …
We present 3DScenePrompt, a framework that generates the next video chunk from arbitrary-length input while enabling precise camera control and preserving scene consistency. Unlike methods conditioned on a single image or a short clip, we employ dual spatio-temporal conditio…
This paper introduces LLM2Fx-Tools, a multimodal tool-calling framework that generates executable sequences of audio effects (Fx-chain) for music post-production. LLM2Fx-Tools uses a large language model (LLM) to understand audio inputs, select audio effects types, determine…
JOIN US
Shape the Future of AI with Sony AI
We want to hear from those of you who have a strong desire
to shape the future of AI.



