stereoplegic 's Collections Layout
updated
UI Layout Generation with LLMs Guided by UI Grammar
Paper
• 2310.15455
• Published
• 3
You Only Look at Screens: Multimodal Chain-of-Action Agents
Paper
• 2309.11436
• Published
• 1
Never-ending Learning of User Interfaces
Paper
• 2308.08726
• Published
• 2
LMDX: Language Model-based Document Information Extraction and
Localization
Paper
• 2309.10952
• Published
• 67
LASER: LLM Agent with State-Space Exploration for Web Navigation
Paper
• 2309.08172
• Published
• 14
LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language
Models
Paper
• 2309.09506
• Published
• 15
DSG: An End-to-End Document Structure Generator
Paper
• 2310.09118
• Published
• 2
On Web-based Visual Corpus Construction for Visual Document
Understanding
Paper
• 2211.03256
• Published
• 1
Attention Where It Matters: Rethinking Visual Document Understanding
with Selective Region Concentration
Paper
• 2309.01131
• Published
• 1
DocFormerv2: Local Features for Document Understanding
Paper
• 2306.01733
• Published
• 1
OCR-free Document Understanding Transformer
Paper
• 2111.15664
• Published
• 6
DocParser: End-to-end OCR-free Information Extraction from Visually Rich
Documents
Paper
• 2304.12484
• Published
• 1
Understanding HTML with Large Language Models
Paper
• 2210.03945
• Published
• 1
Leveraging Large Language Models for Scalable Vector Graphics-Driven
Image Understanding
Paper
• 2306.06094
• Published
• 1
DocLLM: A layout-aware generative language model for multimodal document
understanding
Paper
• 2401.00908
• Published
• 189
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper
• 2401.02823
• Published
• 36
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for
Multi-modal Large Language Models
Paper
• 2311.07575
• Published
• 15
LayoutPrompter: Awaken the Design Ability of Large Language Models
Paper
• 2311.06495
• Published
• 12
Viewer
• Updated
• 2.75M • 4.54k
• 380
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in
Large Multimodal Models
Paper
• 2401.13311
• Published
• 12
Empowering LLM to use Smartphone for Intelligent Task Automation
Paper
• 2308.15272
• Published
• 1
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler
Generation
Paper
• 2404.12753
• Published
• 43