SketchAgent: Language-Driven Sequential Sketch Generation
Abstract
Sketching serves as a versatile tool for externalizing ideas, enabling rapid exploration and visual communication that spans various disciplines. While artificial systems have driven substantial advances in content creation and human-computer interaction, capturing the dynamic and abstract nature of human sketching remains challenging. In this work, we introduce SketchAgent, a language-driven, sequential sketch generation method that enables users to create, modify, and refine sketches through dynamic, conversational interactions. Our approach requires no training or fine-tuning. Instead, we leverage the sequential nature and rich prior knowledge of off-the-shelf multimodal large language models (LLMs). We present an intuitive sketching language, introduced to the model through in-context examples, enabling it to "draw" using string-based actions. These are processed into vector graphics and then rendered to create a sketch on a pixel canvas, which can be accessed again for further tasks. By drawing stroke by stroke, our agent captures the evolving, dynamic qualities intrinsic to sketching. We demonstrate that SketchAgent can generate sketches from diverse prompts, engage in dialogue-driven drawing, and collaborate meaningfully with human users.
Community
An interesting use might be to start with a sketch then iteratively increase the quality and resolution
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- VQ-SGen: A Vector Quantized Stroke Representation for Sketch Generation (2024)
- FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations (2024)
- Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping (2024)
- LLaVA-CoT: Let Vision Language Models Reason Step-by-Step (2024)
- Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models (2024)
- SRSA: A Cost-Efficient Strategy-Router Search Agent for Real-world Human-Machine Interactions (2024)
- Large Language Models as User-Agents for Evaluating Task-Oriented-Dialogue Systems (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Project page: https://yael-vinker.github.io/sketch-agent/
Arxiv paper: https://arxiv.org/pdf/2411.17673
Github repo: https://github.com/yael-vinker/SketchAgent
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper