SongEcho Model Card

This is the official model repository for SongEcho, introduced in the ICLR 2026 paper SongEcho: Cover Song Generation via Instance-Adaptive Element-wise Linear Modulation.

Model Details

Model Description

SongEcho is a conditional music generation model designed for Cover Song Generation. It simultaneously generates new vocals and accompaniment conditioned on an original vocal melody and text prompts. It leverages a novel framework called Instance-Adaptive Element-wise Linear Modulation (IA-EiLM) to facilitate precise temporal alignment and controllable generation.

Model Checkpoints

This repository contains the following three essential weight files:

  • rmvpe_model.pt: Melody extraction model, used to extract the vocal melody from the source audio.
  • melody_encoder.pt: Melody encoder model, used to encode the extracted melody into hidden representations.
  • melody.pt: IA-EiLM parameters, which control the conditioning injection and refinement for the generative process.

Uses

Direct Use

The model is intended to be used for:

  • AI-assisted cover song generation.
  • Melody-conditioned music and vocal synthesis.
  • Modifying the style, singer, or accompaniment of an existing song using text prompts while preserving the original melody.

Citation

If you find this model or the associated paper useful, please cite it:

@inproceedings{li2026songecho,
  title={SongEcho: Cover Song Generation via Instance-Adaptive Element-wise Linear Modulation},
  author={Li, Sifei and others},
  booktitle={The Fourteenth International Conference on Learning Representations (ICLR)},
  year={2026},
  url={[https://arxiv.org/abs/2602.19976](https://arxiv.org/abs/2602.19976)}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for lsfhuihuiff/SongEcho