Overview
This project aims to support visually impaired individuals in their daily navigation.
This project combines the YOLO model and LLaMa 2 7b for the navigation.
YOLO is trained on the bounding box data from the AI Hub,
Output of YOLO (bbox data) is converted as lists like [[class_of_obj_1, xmin, xmax, ymin, ymax, size], [class_of...] ...]
then added to the input of question.
The LLM is trained to navigate using LearnItAnyway/Visual-Navigation-21k multi-turn dataset
Usage
We show how to use the model in yolo_llama_visnav_test.ipynb
- Downloads last month
- 185
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.