Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,68 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc0-1.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc0-1.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
🚀 **AEROMamba: Efficient Audio Super-Resolution**
|
| 6 |
+
*AI-Generated README - Original: [GitHub](https://github.com/aeromamba-super-resolution/aeromamba) | [Demo](https://aeromamba-super-resolution.github.io/)*
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Model Overview
|
| 11 |
+
**Architecture**: Hybrid GAN + Mamba SSM
|
| 12 |
+
**Task**: 11.025 kHz → 44.1 kHz audio upsampling
|
| 13 |
+
**Key Improvements**:
|
| 14 |
+
- 14x faster inference vs AERO
|
| 15 |
+
- 5x less GPU memory usage
|
| 16 |
+
- 66.47 subjective score (vs AERO's 60.03)
|
| 17 |
+
|
| 18 |
+
**Checkpoint**: [MUSDB18-HQ Model](https://huggingface.co/KingNish/AEROMamba/blob/main/checkpoint.th)
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## Quick Start
|
| 23 |
+
```python
|
| 24 |
+
# Installation
|
| 25 |
+
pip install torch==1.12.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
|
| 26 |
+
pip install causal-conv1d==1.1.2 mamba-ssm==1.1.3
|
| 27 |
+
|
| 28 |
+
# Inference
|
| 29 |
+
from src.models.aeromamba import AEROMamba
|
| 30 |
+
import torchaudio
|
| 31 |
+
|
| 32 |
+
model = AEROMamba.load_from_checkpoint("checkpoint.th")
|
| 33 |
+
lr_audio, sr = torchaudio.load("low_res.wav") # 11kHz input
|
| 34 |
+
hr_audio = model(lr_audio) # 44.1kHz output
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
---
|
| 38 |
+
|
| 39 |
+
## Performance (MUSDB18)
|
| 40 |
+
| Metric | Low-Res | AERO | AEROMamba |
|
| 41 |
+
|-----------------|---------|-------|-----------|
|
| 42 |
+
| ViSQOL ↑ | 1.82 | 2.90 | **2.93** |
|
| 43 |
+
| LSD ↓ | 3.98 | 1.34 | **1.23** |
|
| 44 |
+
| Subjective ↑ | 38.22 | 60.03 | **66.47** |
|
| 45 |
+
|
| 46 |
+
**Hardware**: 14x faster on RTX 3090 (0.087s vs 1.246s)
|
| 47 |
+
|
| 48 |
+
---
|
| 49 |
+
|
| 50 |
+
## Training Data
|
| 51 |
+
**MUSDB18-HQ**:
|
| 52 |
+
- 150 full-track music recordings
|
| 53 |
+
- 44.1 kHz originals → 11.025 kHz downsampled pairs
|
| 54 |
+
- 87.5/12.5 train-val split
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
## Citation
|
| 59 |
+
```bibtex
|
| 60 |
+
@inproceedings{Abreu2024lamir,
|
| 61 |
+
author = {Wallace Abreu and Luiz Wagner Pereira Biscainho},
|
| 62 |
+
title = {AEROMamba: Efficient Audio SR with GANs and SSMs},
|
| 63 |
+
booktitle = {Proc. Latin American Music IR Workshop},
|
| 64 |
+
year = {2024}
|
| 65 |
+
}
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
*This README was AI-generated based on original project materials. For training code and OLA inference scripts, visit the [GitHub repo](https://github.com/aeromamba-super-resolution/aeromamba).*
|