Building on HF

11 28 101

Anurag

edwixx

https://anuragkanade.com/

AI & ML interests

Machine Learning, and Speech

Recent Activity

liked a model 1 day ago

LiquidAI/LFM2.5-Audio-1.5B

new activity 1 day ago

google/translategemma-27b-it:list of all 55 languages which are supported?

new activity 2 days ago

huggingface/InferenceSupport:edwixx/whisper-large-hebrew-finetune

View all activity

Organizations

liked a model 1 day ago

LiquidAI/LFM2.5-Audio-1.5B

Audio-to-Audio • 1B • Updated 16 days ago • 1.92k • 288

New activity in google/translategemma-27b-it 1 day ago

list of all 55 languages which are supported?

#1 opened 5 days ago by

edwixx

New activity in huggingface/InferenceSupport 2 days ago

edwixx/whisper-large-hebrew-finetune

#7518 opened 2 days ago by

edwixx

reacted to sagar007's post with 🤝🔥 2 days ago

Post

4088

🚀 I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP!

Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results!

🔧 What I Built:
A vision-language model that can understand images and answer questions about them, combining:
- Google Gemma-3-270M (language)
- OpenAI CLIP ViT-Large/14 (vision)
- LoRA fine-tuning for efficiency

📊 Training Stats:
- 157,712 training samples (full LLaVA dataset)
- 3 epochs on A100 40GB
- ~9 hours training time
- Final loss: 1.333 training / 1.430 validation
- Only 18.6M trainable params (3.4% of 539M total)

📈 sagar007/multigemma
Benchmark Results:
- VQA Accuracy: 53.8%
- Works great for: animal detection, room identification, scene understanding

🔗 **Try it yourself:**
- 🤗 Model: sagar007/multigemma
- 🎮 Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma
- 💻 GitHub: https://github.com/sagar431/multimodal-gemma-270m

Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD!

Would love to hear your feedback! 🙏

#multimodal #gemma #clip #llava #vision-language #pytorch

9 replies

liked 2 models 4 days ago

nvidia/personaplex-7b-v1

Updated about 15 hours ago • 2.43k • 197

pipecat-ai/smart-turn-v3

Voice Activity Detection • Updated 14 days ago • 115

New activity in LiquidAI/LFM2.5-1.2B-Instruct 5 days ago

Liquid AI, You NEED to Make a 16B MoE Next!

❤️ 6

#5 opened 5 days ago by

tanyiades

liked a model 5 days ago

LiquidAI/LFM2.5-1.2B-Instruct

Text Generation • 1B • Updated about 22 hours ago • 55.2k • 392

upvoted a collection 5 days ago

TranslateGemma

Collection

3 items • Updated 6 days ago • 168

updated a model 7 days ago

edwixx/whisper-large-hebrew-finetune

Updated 7 days ago • 23 • 1

published a model 7 days ago

edwixx/whisper-large-hebrew-finetune

Updated 7 days ago • 23 • 1

liked a model 7 days ago

Shiry/whisper-large-v2-he

Automatic Speech Recognition • Updated Jan 26, 2023 • 9 • 6

liked a dataset 7 days ago

imvladikon/hebrew_speech_kan

Viewer • Updated May 5, 2023 • 10k • 137 • 13

liked a dataset 8 days ago

bolshyC/Muse

Preview • Updated 2 days ago • 1.04k • 8

updated a model 9 days ago

edwixx/hf-loras-113

Updated 9 days ago

published a model 9 days ago

edwixx/hf-loras-113

Updated 9 days ago

updated a model 9 days ago

edwixx/my-test-lora-123

Updated 9 days ago

published a model 9 days ago

edwixx/my-test-lora-123

Updated 9 days ago

reacted to hypothetical's post with 😎 11 days ago

Post

2010

We have updated our transcription model: TheStageAI/thewhisper-large-v3-turbo

– 6.00 WER on the English Open ASR Leaderboard
– 4.74 WER on the Multilingual Open ASR Leaderboard
– Beats NVIDIA Parakeet (6.34 WER) and Whisper-large-v3-turbo (7.8 WER)
– Strong improvements in Arabic, Hindi, Chinese
– Maintains quality with background and environmental noise
– Optimized inference engines for NVIDIA and Apple
– Hugging Face Transformers interface for easy use
– Best-in-class speed on NVIDIA GPUs and power efficiency on Apple devices
– NVIDIA Jetson Thor support

2 replies

Anurag

AI & ML interests

Recent Activity

Organizations

edwixx's activity

list of all 55 languages which are supported?

edwixx/whisper-large-hebrew-finetune

Liquid AI, You NEED to Make a 16B MoE Next!