File size: 1,800 Bytes
11f413e 0ca4d79 11f413e 8c0524d d981a17 8c0524d 4275b0f a09d06d 4275b0f 300a419 4275b0f a09d06d 300a419 4275b0f f3692a8 300a419 3c051dd 4275b0f 300a419 4275b0f 8c0524d 80e17cf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
---
license: llama3
language:
- en
- hi
---
`Shuka v1` is a language model which natively understands audio in Indic languages. It is an encoder-decoder model built by combining two models:
- Our state-of-the-art, in-house, audio encoder: Saaras v1
- Meta’s Llama3-8B-Instruct as the decoder
The encoder and decoder are connected by a small projector with ~60M parameters. During training, only the projector weights are finetuned while the rest of the network is frozen. Following our tradition of training models frugally, we train `Shuka v1` on less than 100 hours of audio.
Though we only finetune the projector on English and Hindi data, the multilingual nature of our encoder makes `Shuka v1` perform well on zero-shot QA in other Indic languages as well. We have tested on the model on Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu.
See what `Shuka v1` can do in this [demo video](https://www.youtube.com/watch?v=VgJhjCPbORs), and get started by using huggingface pipeline, as follows:
```
# install libraries
# pip install transformers==4.41.2 peft==0.11.1 librosa==0.10.2
import transformers
import librosa
# load the model pipeline on gpu:0
pipe = transformers.pipeline(model='sarvamai/shuka_v1', trust_remote_code=True, device=0, torch_dtype='bfloat16')
# get a sample audio
# wget https://huggingface.co/sarvamai/shuka_v1/resolve/main/hi-question.webm
audio, sr = librosa.load("./hi-question.webm", sr=16000)
turns = [
{'role': 'system', 'content': 'Respond naturally and informatively.'},
{'role': 'user', 'content': '<|audio|>'}
]
pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=512)
```
For more details, please see our [blog](https://www.sarvam.ai/blogs/shuka-v1). |