nvidia
/

audio-flamingo

audio understanding

audio captioning

audio question answering

audio classification

audio dialogues

retrieval augmented generation

in-context learning

Model card Files Files and versions Community

audio-flamingo / README.md

ZhifengKong's picture

Update README.md

77da490 verified 4 months ago

|

history blame contribute delete

1.88 kB

	---
	license: other
	language:
	- en
	tags:
	- audio understanding
	- audio captioning
	- audio question answering
	- audio classification
	- audio dialogues
	- retrieval augmented generation
	- in-context learning
	---
	# Audio Flamingo

	Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei Ping, Rafael Valle, Bryan Catanzaro

	This repo contains the model checkpoints of [Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities](https://arxiv.org/abs/2402.01831) (ICML 2024). Audio Flamingo is a novel audio-understanding language model with
	- strong audio understanding abilities,
	- the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and
	- strong multi-turn dialogue abilities.

	We introduce a series of training techniques, architecture design, and data strategies to enhance our model with these abilities. Extensive evaluations across various audio understanding tasks confirm the efficacy of our method, setting new state-of-the-art benchmarks. Sound demos can be found in this [website](https://audioflamingo.github.io/).

	![](audio_flamingo_arch.png)

	## Code

	Our code is at [https://github.com/NVIDIA/audio-flamingo](https://github.com/NVIDIA/audio-flamingo)

	## License

	- The checkpoints are for non-commercial use only. They are subject to the [OPT-IML](https://huggingface.co/facebook/opt-iml-1.3b/blob/main/LICENSE.md) license, the [Terms of Use](https://openai.com/policies/terms-of-use) of the data generated by OpenAI, and the original licenses accompanying each training dataset.


	## Citation
	```
	@article{kong2024audio,
	title={Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities},
	author={Kong, Zhifeng and Goel, Arushi and Badlani, Rohan and Ping, Wei and Valle, Rafael and Catanzaro, Bryan},
	journal={arXiv preprint arXiv:2402.01831},
	year={2024}
	}
	```