crypto-code commited on
Commit
a8187ee
1 Parent(s): 2b507fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md CHANGED
@@ -1,3 +1,29 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+ # M<sup>2</sup>UGen Model with MusicGen-small
5
+
6
+ The M<sup>2</sup>UGen model is a Music Understanding and Generation model that is capable of Music Question Answering and also Music Generation
7
+ from texts, images, videos and audios, as well as Music Editing. The model utilizes encoders such as MERT for music understanding, ViT for image understanding
8
+ and ViViT for video understanding and the MusicGen/AudioLDM2 model as the music generation model (music decoder), coupled with adapters and the LLaMA 2 model
9
+ to make the model possible for multiple abilities.
10
+
11
+ M<sup>2</sup>UGen was published in [M<sup>2</sup>UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models](https://arxiv.org/abs/2311.11255) by *Atin Sakkeer Hussain, Shansong Liu, Chenshuo Sun and Ying Shan*.
12
+
13
+ The code repository for the model is published in [crypto-code/M2UGen](https://github.com/crypto-code/M2UGen). Clone the repository, download the checkpoint and run the following for a model demo:
14
+ ```bash
15
+ python gradio_app.py --model ./ckpts/M2UGen-AudioLDM2/checkpoint.pth --llama_dir ./ckpts/LLaMA-2 --music_decoder musicgen --music_decoder_path facebook/musicgen-small
16
+ ```
17
+
18
+ ## Citation
19
+
20
+ If you find this model useful, please consider citing:
21
+
22
+ ```bibtex
23
+ @article{hussain2023m,
24
+ title={{M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models}},
25
+ author={Hussain, Atin Sakkeer and Liu, Shansong and Sun, Chenshuo and Shan, Ying},
26
+ journal={arXiv preprint arXiv:2311.11255},
27
+ year={2023}
28
+ }
29
+ ```