SongEcho: Towards Cover Song Generation via Instance-Adaptive Element-wise Linear Modulation
Paper
• 2602.19976 • Published
• 1
This is the official model repository for SongEcho, introduced in the ICLR 2026 paper SongEcho: Cover Song Generation via Instance-Adaptive Element-wise Linear Modulation.
SongEcho is a conditional music generation model designed for Cover Song Generation. It simultaneously generates new vocals and accompaniment conditioned on an original vocal melody and text prompts. It leverages a novel framework called Instance-Adaptive Element-wise Linear Modulation (IA-EiLM) to facilitate precise temporal alignment and controllable generation.
This repository contains the following three essential weight files:
rmvpe_model.pt: Melody extraction model, used to extract the vocal melody from the source audio.melody_encoder.pt: Melody encoder model, used to encode the extracted melody into hidden representations.melody.pt: IA-EiLM parameters, which control the conditioning injection and refinement for the generative process.The model is intended to be used for:
If you find this model or the associated paper useful, please cite it:
@inproceedings{li2026songecho,
title={SongEcho: Cover Song Generation via Instance-Adaptive Element-wise Linear Modulation},
author={Li, Sifei and others},
booktitle={The Fourteenth International Conference on Learning Representations (ICLR)},
year={2026},
url={[https://arxiv.org/abs/2602.19976](https://arxiv.org/abs/2602.19976)}
}