Abid Ali Awan's picture

Abid Ali Awan

kingabzpro

AI & ML interests

LLMs, MLOps, ASR, & RL

Recent Activity

Organizations

Spaces-explorers's profile picture Speech Recognition Community Event Version 2's profile picture eXfinite's profile picture HugGAN Community's profile picture Gradio-Blocks-Party's profile picture Blog-explorers's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture

kingabzpro's activity

replied to their post 4 months ago
view reply

@victor I think the community is eagerly awaiting the next big month-long event, where the community can come together to build something, like we used to do in the past.

reacted to their post with πŸ˜” 4 months ago
view post
Post
1090
I believe Hugging Face should have something similar to Hacktoberfest. I miss the days when there were events like this every 3 months for audio, deep reinforcement learning, gradio themes, but it turns out everything slowed down. There are no more Hugging Face events.
@victor
  • 3 replies
Β·
posted an update 4 months ago
view post
Post
1090
I believe Hugging Face should have something similar to Hacktoberfest. I miss the days when there were events like this every 3 months for audio, deep reinforcement learning, gradio themes, but it turns out everything slowed down. There are no more Hugging Face events.
@victor
  • 3 replies
Β·
reacted to their post with πŸ‘€ 4 months ago
view post
Post
1286
I never imagined that Jenkins could be as powerful and easy to implement as GitHub Actions. Loving it. πŸ₯°
posted an update 4 months ago
view post
Post
1286
I never imagined that Jenkins could be as powerful and easy to implement as GitHub Actions. Loving it. πŸ₯°
replied to their post 4 months ago
view reply

I'm having some issues with the RAG pipeline. It generally takes 0.2-2 seconds for it to respond, and most of the time the embedding model takes even longer. I can implement prompt caching, but I was considering a more hardware-related solution. What do you think about using Ray for distributed serving? Also, what do you think about GraphQL?

reacted to their post with πŸ‘€ 4 months ago
view post
Post
1838
How can I make my RAG application generate real-time responses? Up until now, I have been using Groq for fast LLM generation and the Gradio Live function. I am looking for a better solution that can help me build a real-time application without any delay. @abidlabs

kingabzpro/Real-Time-RAG
  • 2 replies
Β·
posted an update 4 months ago
view post
Post
1838
How can I make my RAG application generate real-time responses? Up until now, I have been using Groq for fast LLM generation and the Gradio Live function. I am looking for a better solution that can help me build a real-time application without any delay. @abidlabs

kingabzpro/Real-Time-RAG
  • 2 replies
Β·
reacted to merve's post with πŸ”₯πŸ€— 7 months ago
view post
Post
4217
I love Depth Anything V2 😍
It’s Depth Anything, but scaled with both larger teacher model and a gigantic dataset!

Here's a small TLDR of paper with a lot of findings, experiments and more.
I have also created a collection that has the models, the dataset, the demo and CoreML converted model 😚 merve/depth-anything-v2-release-6671902e798cd404513ffbf5

The authors have analyzed Marigold, a diffusion based model against Depth Anything and found out what’s up with using synthetic images vs real images for MDE:

πŸ”– Real data has a lot of label noise, inaccurate depth maps (caused by depth sensors missing transparent objects etc) and there are many details overlooked

πŸ”– Synthetic data have more precise and detailed depth labels and they are truly ground-truth, but there’s a distribution shift between real and synthetic images, and they have restricted scene coverage

The authors train different image encoders only on synthetic images and find out unless the encoder is very large the model can’t generalize well (but large models generalize inherently anyway) 🧐
But they still fail encountering real images that have wide distribution in labels (e.g. diverse instances of objects) πŸ₯²

Depth Anything v2 framework is to..

πŸ¦– Train a teacher model based on DINOv2-G based on 595K synthetic images
🏷️ Label 62M real images using teacher model
πŸ¦• Train a student model using the real images labelled by teacher
Result: 10x faster and more accurate than Marigold!

The authors also construct a new benchmark called DA-2K that is less noisy, highly detailed and more diverse!
reacted to DmitryRyumin's post with πŸ”₯ 10 months ago
view post
Post
πŸš€πŸ’ƒπŸ»πŸŒŸ New Research Alert - CVPR 2024! πŸŒŸπŸ•Ί πŸš€
πŸ“„ Title: Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling πŸŒŸπŸš€

πŸ“ Description: Animatable Gaussians - a novel method for creating lifelike human avatars from RGB videos, utilizing 2D CNNs and 3D Gaussian splatting to capture pose-dependent garment details and dynamic appearances with high fidelity.

πŸ‘₯ Authors: Zhe Li, Zerong Zheng, Lizhen Wang, and Yebin Liu

πŸ“… Conference: CVPR, Jun 17-21, 2024 | Seattle WA, USA πŸ‡ΊπŸ‡Έ

πŸ”— Paper: Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling (2311.16096)

🌐 Github Page: https://animatable-gaussians.github.io
πŸ“ Repository: https://github.com/lizhe00/AnimatableGaussians

πŸ“Ί Video: https://www.youtube.com/watch?v=kOmZxD0HxZI

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸš€ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

πŸ” Keywords: #AnimatableGaussians #HumanAvatars #3DGaussianSplatting #CVPR2024 #DeepLearning #Animation #Innovation
reacted to merve's post with πŸ€— 12 months ago
view post
Post
Posting about a very underrated model that tops paperswithcode across different segmentation benchmarks: OneFormer πŸ‘‘

OneFormer is a "truly universal" model for semantic, instance and panoptic segmentation tasks βš”οΈ
What makes is truly universal is that it's a single model that is trained only once and can be used across all tasks.
The enabler here is the text conditioning, i.e. the model is given a text query that states task type along with the appropriate input, and using contrastive loss, the model learns the difference between different task types πŸ‘‡ (see in the image below)

It's also super easy to use with transformers.
from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation

processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_ade20k_swin_large")
model = OneFormerForUniversalSegmentation.from_pretrained("shi-labs/oneformer_ade20k_swin_large")

# swap the postprocessing and task_inputs for different types of segmentation
semantic_inputs = processor(images=image, task_inputs=["semantic"], return_tensors="pt")
semantic_outputs = model(**semantic_inputs)
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]

I have drafted a notebook for you to try right away ✨ https://colab.research.google.com/drive/1wfJhoTFqUqcTAYAOUc6TXUubBTmOYaVa?usp=sharing
You can also check out the Space without checking out the code itself πŸ‘‰ shi-labs/OneFormer
Β·