vigneshwar472 commited on
Commit
28065b0
Β·
verified Β·
1 Parent(s): 05d0420

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -1
README.md CHANGED
@@ -6,8 +6,86 @@ colorTo: green
6
  sdk: gradio
7
  sdk_version: 4.44.0
8
  app_file: app.py
9
- pinned: false
10
  license: mit
 
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  sdk: gradio
7
  sdk_version: 4.44.0
8
  app_file: app.py
9
+ pinned: true
10
  license: mit
11
+ short_description: Conversational Image Recognition chatbot
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
15
+
16
+ # Conversational Image Recognition Chatbot
17
+
18
+ ## Introduction
19
+
20
+ This application combines natural language processing with image recognition technology that involves the image recognition downstream task and extraction of visual information to interact with users. It allows users to upload images and then engage in a conversation to answer questions about the content of those images. This tool is designed to provide a seamless and interactive way to understand and analyze visual content through natural language dialogue.
21
+
22
+ # Solution we offer
23
+ ● Chatbot harnessing the power of Vision Language Model (VLM) & Zero-shot object detection Model
24
+ ● The user can upload an image, detect the objects in it and start the chat session.
25
+ ● Enhanced spatial understanding of objects in the images.It happens due to inter-communication between both the
26
+ models.
27
+ ● Image question answering chatbot with feature of object detection.
28
+ ● Chat bot History and detection output interaction of the system.
29
+ ● Used 9 representative VLMs on 10 Benchmarks in Open compass multimodal leaderboard.
30
+ ● Best performing and most used object detection models like OWLv2 model, Grounding DINO model.
31
+
32
+ # How it addresses the problem
33
+ ● Both the models help in addressing the problem since they were pretrained on household datasets.
34
+ ● Provide unparalleled results and inference speed.
35
+ ● Our approach is easy and can be implemented with minimal efforts.
36
+
37
+ # Unique value propositions
38
+ ● Correct responses lexically and grammatically.
39
+ ● Usage of state of the art and novel research work in field of image understanding.
40
+ Everything we have used is open sourcework.
41
+ ● Flexibility to use different models with reference documentation.
42
+ ● Working and hosted application
43
+
44
+ # Technologies used
45
+ Programming languages : Python
46
+ Libraries : Transformers, Pytorch, Image libraries like PIL
47
+ Hardware : Nvidia T4 medium 8vCPU 30GB RAM
48
+ Platforms : Hugging face, arXiv, other research articles
49
+ Deployment tools : Gradio, Streamlit, Hugging face spaces
50
+ hardware
51
+
52
+
53
+ ## Aplication demo
54
+
55
+ ● Used google/owlv2-base-patch16-ensemble as zero shot object detection model and Qwen/Qwen2-VL-2B-Instruct as VLM
56
+ ● Gradio framework and transformers library for development
57
+ Demo Link - Hugging Face Space Google Colab demo
58
+ HF space has 2vCPU-16GB RAM and no GPU deployed in the free tier. So the
59
+ inference speed of our chatbot is very slow.
60
+ To see the demo it is highly recommended to use the Google colab demo we have
61
+ provided. Get started with the demo with minimal efforts. The inference speed
62
+ increases drastically on google colab with T4 GPU runtime.
63
+
64
+ ### Image Demos
65
+ - [Demo Image 1](https://drive.google.com/file/d/1AZNrdTZMSDdGAPtgacQ4V1b8UWKBqaAY/view?usp=sharing)
66
+ - [Demo Image 2](https://drive.google.com/file/d/1aUi75v0I3qwcHA2HBx6zls1JX0s0ZLyg/view?usp=sharing)
67
+
68
+
69
+ ### Live Links
70
+
71
+
72
+ - **Hugging Face Spaces**: [Conversational Image Recognition Chatbot](https://huggingface.co/spaces/vigneshwar472/Conversational-image-recognition-chatbot)
73
+ ***Note**: Due to GPU constraints on Hugging Face Spaces, performance may be slower.*
74
+
75
+
76
+
77
+
78
+ - **Google Colab Notebook**: [Google Colab Demo](https://colab.research.google.com/drive/1UcY1X5AV5yy9jTuxBnmDWdAjETCi-ni2?usp=sharing)
79
+ ***Recommended**: It is recommended to use the Google Colab Notebook for the demo because Hugging Face Spaces have GPU constraints so the application may run slow*
80
+
81
+
82
+
83
+
84
+
85
+
86
+ ## Technologies Used
87
+ - **Programming languages** : Python
88
+ - **Libraries** : Transformers, Pytorch, Image libraries like PIL
89
+ - **Hardware** : Nvidia T4 medium 8vCPU 30GB RAM
90
+ - **Platforms** : Hugging face, arXiv, other research articles
91
+ - **Deployment tools** : Gradio, Streamlit, Hugging face spaces hardware