CompVis Community

university

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

ynie submitted a paper 4 days ago

CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

omriav authored a paper 7 months ago

Story2Board: A Training-Free Approach for Expressive Storyboard Generation

dpaleka authored a paper 9 months ago

Pitfalls in Evaluating Language Model Forecasters

View all activity

ynie

submitted a paper to Daily Papers 4 days ago

CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

Paper • 2603.01973 • Published 5 days ago • 6

liorwolf

authored a paper about 1 month ago

TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors

Paper • 2601.17958 • Published Jan 25 • 3

toshas

posted an update 3 months ago

Post

865

Introducing StereoSpace -- our new end-to-end method for turning photos into stereo images without explicit geometry or depth maps. This makes it especially robust with thin structures and transparencies. Try the demo below:

🌐 Project: prs-eth/stereospace_web
📕 Paper: StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space (2512.10959)
🐙 Code: https://github.com/prs-eth/stereospace
🤗 Demo: toshas/stereospace
🤗 Weights: prs-eth/stereospace-v1-0

By ETH Zürich ( @behretj , @Bingxin , @konradschindler ), University of Bologna ( @fabiotosi92 , @mpoggi ), HUAWEI Bayer Lab ( @toshas ).

toshas

authored a paper 3 months ago

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

Paper • 2512.10959 • Published Dec 11, 2025 • 12

toshas

submitted a paper to Daily Papers 3 months ago

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

Paper • 2512.10959 • Published Dec 11, 2025 • 12

toshas

posted an update 3 months ago

Post

2282

Introducing 🇨🇭WindowSeat🇨🇭 –– our new method for removing reflections from photos taken through windows, on planes, in malls, offices, and other glass-filled environments.

Finetuning a foundation diffusion transformer for reflection removal quickly runs up against the limits of what existing datasets and techniques can offer. To fill that gap, we generate physically accurate examples in Blender that simulate realistic glass and reflection effects. This data enables strong performance on both established benchmarks and previously unseen images.

To make this practical, the open-source Apache-2 model builds on Qwen-Image-Edit-2509, a 20B image-editing diffusion transformer that runs on a single GPU and can be fine-tuned in about a day. WindowSeat keeps its use of the underlying DiT cleanly separated from the data and training recipe, allowing future advances in base models to be incorporated with minimal friction.

Try it out with your own photos in this interactive demo:
🤗 toshas/windowseat-reflection-removal

Other resources:
🌎 Website: huawei-bayerlab/windowseat-reflection-removal-web
🎓 Paper: Reflection Removal through Efficient Adaptation of Diffusion Transformers (2512.05000)
🤗 Model: huawei-bayerlab/windowseat-reflection-removal-v1-0
🐙 Code: https://github.com/huawei-bayerlab/windowseat-reflection-removal

Team: Daniyar Zakarin ( @daniyarzt )*, Thiemo Wandel ( @thiemo-wandel )*, Anton Obukhov ( @toshas ), Dengxin Dai.
*Work done during internships at HUAWEI Bayer Lab

toshas

authored 2 papers 3 months ago

The Fourth Monocular Depth Estimation Challenge

Paper • 2504.17787 • Published Apr 24, 2025

Reflection Removal through Efficient Adaptation of Diffusion Transformers

Paper • 2512.05000 • Published Dec 4, 2025 • 16

multimodalart

posted an update 5 months ago

Post

20795

Want to iterate on a Hugging Face Space with an LLM?

Now you can easily convert any HF entire repo (Model, Dataset or Space) to a text file and feed it to a language model!

multimodalart/repo2txt

1 reply

mbrack

authored a paper 5 months ago

UniFusion: Vision-Language Model as Unified Encoder in Image Generation

Paper • 2510.12789 • Published Oct 14, 2025 • 19

ermonste

authored a paper 5 months ago

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Paper • 2509.16117 • Published Sep 19, 2025 • 22

ryanramos

authored a paper 7 months ago

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

Paper • 2508.10637 • Published Aug 14, 2025 • 8

mbrack

authored a paper 9 months ago

How to Train your Text-to-Image Model: Evaluating Design Choices for Synthetic Training Captions

Paper • 2506.16679 • Published Jun 20, 2025 • 1

multimodalart

posted an update 9 months ago

Post

18210

Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it 🐐

I've built a live real time demo on Spaces 📹💨

multimodalart/self-forcing