taesiri's picture

taesiri PRO

taesiri

·

https://taesiri.ai/

AI & ML interests

AGI

Recent Activity

updated a dataset about 17 hours ago

taesiri/PhotoshopRequest-DailyDump-November-2024

updated a dataset about 17 hours ago

taesiri/PhotoshopRequest-DailyDump-November-2024

updated a dataset about 17 hours ago

taesiri/PhotoshopRequest-DailyDump-November-2024

View all activity

Organizations

taesiri's activity

upvoted 4 papers 1 day ago

Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published 2 days ago • 28

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published 8 days ago • 44

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Paper • 2411.14199 • Published 2 days ago • 19

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published 2 days ago • 13

upvoted a collection 2 days ago

Vision/multimodal Models

Collection of the most popular vision models including Llama 3.2, LlaVa, Qwen2 VL, Pixtral, PaliGemma and more! • 22 items • Updated 2 days ago • 4

upvoted 2 papers 2 days ago

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

Paper • 2411.13281 • Published 3 days ago • 15

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

Paper • 2411.10958 • Published 6 days ago • 41

upvoted 2 papers 3 days ago

SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning

Paper • 2411.10161 • Published 8 days ago • 6

RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published 4 days ago • 41

upvoted 3 papers 4 days ago

Generative World Explorer

Paper • 2411.11844 • Published 5 days ago • 55

AnimateAnything: Consistent and Controllable Animation for Video Generation

Paper • 2411.10836 • Published 7 days ago • 18

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Paper • 2411.10640 • Published 8 days ago • 39

upvoted 4 papers 5 days ago

Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement

Paper • 2411.06558 • Published 13 days ago • 29

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

Paper • 2411.10323 • Published 8 days ago • 26

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published 8 days ago • 93

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Paper • 2411.08033 • Published 11 days ago • 21

upvoted a paper 6 days ago

Thinking LLMs: General Instruction Following with Thought Generation

Paper • 2410.10630 • Published Oct 14 • 16

upvoted 2 papers 8 days ago

LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

Paper • 2411.09595 • Published 9 days ago • 66

MagicQuill: An Intelligent Interactive Image Editing System

Paper • 2411.09703 • Published 9 days ago • 52

upvoted a paper 9 days ago

Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

Paper • 2411.03823 • Published 17 days ago • 43