Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper ā¢ 2408.15998 ā¢ Published Aug 28, 2024 ā¢ 85
view post Post 5085 Real-time DEtection Transformer (RT-DETR) landed in transformers š¤© with Apache 2.0 license šš models: https://huggingface.co/PekingUš demo: merve/RT-DETR-tracking-cocoš paper: DETRs Beat YOLOs on Real-time Object Detection (2304.08069)š notebook: https://github.com/merveenoyan/example_notebooks/blob/main/RT_DETR_Notebook.ipynbYOLO models are known to be super fast for real-time computer vision, but they have a downside with being volatile to NMS š„²Transformer-based models on the other hand are computationally not as efficient š„²Isn't there something in between? Enter RT-DETR!The authors combined CNN backbone, multi-stage hybrid decoder (combining convs and attn) with a transformer decoder. In the paper, authors also claim one can adjust speed by changing decoder layers without retraining altogether. The authors find out that the model performs better in terms of speed and accuracy compared to the previous state-of-the-art. š¤© š„ 12 12 š 4 4 + Reply
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper ā¢ 2404.19427 ā¢ Published Apr 30, 2024 ā¢ 72
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data Paper ā¢ 2406.18790 ā¢ Published Jun 26, 2024 ā¢ 34
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance Paper ā¢ 2405.17532 ā¢ Published May 27, 2024