Spaces:

vigneshwar472
/

Conversational-image-recognition-chatbot

Runtime error

App Files Files Community

vigneshwar472 commited on Sep 25, 2024

Commit

28065b0

verified ·

1 Parent(s): 05d0420

Update README.md

Browse files

Files changed (1) hide show

README.md +79 -1

README.md CHANGED Viewed

@@ -6,8 +6,86 @@ colorTo: green
 sdk: gradio
 sdk_version: 4.44.0
 app_file: app.py
-pinned: false
 license: mit
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 sdk: gradio
 sdk_version: 4.44.0
 app_file: app.py
+pinned: true
 license: mit
+short_description: Conversational Image Recognition chatbot
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# Conversational Image Recognition Chatbot
+## Introduction
+This application combines natural language processing with image recognition technology that involves the image recognition downstream task and extraction of visual information to interact with users. It allows users to upload images and then engage in a conversation to answer questions about the content of those images. This tool is designed to provide a seamless and interactive way to understand and analyze visual content through natural language dialogue.
+# Solution we offer
+● Chatbot harnessing the power of Vision Language Model (VLM) & Zero-shot object detection Model
+● The user can upload an image, detect the objects in it and start the chat session.
+● Enhanced spatial understanding of objects in the images.It happens due to inter-communication between both the
+models.
+● Image question answering chatbot with feature of object detection.
+● Chat bot History and detection output interaction of the system.
+● Used 9 representative VLMs on 10 Benchmarks in Open compass multimodal leaderboard.
+● Best performing and most used object detection models like OWLv2 model, Grounding DINO model.
+# How it addresses the problem
+● Both the models help in addressing the problem since they were pretrained on household datasets.
+● Provide unparalleled results and inference speed.
+● Our approach is easy and can be implemented with minimal efforts.
+# Unique value propositions
+● Correct responses lexically and grammatically.
+● Usage of state of the art and novel research work in field of image understanding.
+Everything we have used is open sourcework.
+● Flexibility to use different models with reference documentation.
+● Working and hosted application
+# Technologies used
+Programming languages : Python
+Libraries : Transformers, Pytorch, Image libraries like PIL
+Hardware : Nvidia T4 medium 8vCPU 30GB RAM
+Platforms : Hugging face, arXiv, other research articles
+Deployment tools : Gradio, Streamlit, Hugging face spaces
+hardware
+## Aplication demo
+● Used google/owlv2-base-patch16-ensemble as zero shot object detection model and Qwen/Qwen2-VL-2B-Instruct as VLM
+● Gradio framework and transformers library for development
+Demo Link - Hugging Face Space Google Colab demo
+HF space has 2vCPU-16GB RAM and no GPU deployed in the free tier. So the
+inference speed of our chatbot is very slow.
+To see the demo it is highly recommended to use the Google colab demo we have
+provided. Get started with the demo with minimal efforts. The inference speed
+increases drastically on google colab with T4 GPU runtime.
+### Image Demos
+- [Demo Image 1](https://drive.google.com/file/d/1AZNrdTZMSDdGAPtgacQ4V1b8UWKBqaAY/view?usp=sharing)
+- [Demo Image 2](https://drive.google.com/file/d/1aUi75v0I3qwcHA2HBx6zls1JX0s0ZLyg/view?usp=sharing)
+### Live Links
+- **Hugging Face Spaces**: [Conversational Image Recognition Chatbot](https://huggingface.co/spaces/vigneshwar472/Conversational-image-recognition-chatbot)
+  ***Note**: Due to GPU constraints on Hugging Face Spaces, performance may be slower.*
+- **Google Colab Notebook**: [Google Colab Demo](https://colab.research.google.com/drive/1UcY1X5AV5yy9jTuxBnmDWdAjETCi-ni2?usp=sharing)
+  ***Recommended**: It is recommended to use the Google Colab Notebook for the demo because Hugging Face Spaces have GPU constraints so the application  may run slow*
+## Technologies Used
+- **Programming languages** : Python
+- **Libraries** : Transformers, Pytorch, Image libraries like PIL
+- **Hardware** : Nvidia T4 medium 8vCPU 30GB RAM
+- **Platforms** : Hugging face, arXiv, other research articles
+- **Deployment tools** : Gradio, Streamlit, Hugging face spaces hardware