AI & ML interests

None defined yet.

Recent Activity

DmitryRyuminΒ 
posted an update 3 days ago
view post
Post
1072
πŸš€πŸ‘οΈπŸŒŸ New Research Alert - ICCV 2025 (Poster)! πŸŒŸπŸ‘οΈπŸš€
πŸ“„ Title: Is Less More? Exploring Token Condensation as Training-Free Test-Time Adaptation πŸ”

πŸ“ Description: Token Condensation as Adaptation (TCA) improves the performance and efficiency of Vision Language Models in zero-shot inference by introducing domain anchor tokens.

πŸ‘₯ Authors: Zixin Wang, Dong Gong, Sen Wang, Zi Huang, Yadan Luo

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation (2410.14729)

πŸ“ Repository: https://github.com/Jo-wang/TCA

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Session 1: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/session-1.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #TestTimeAdaptation #TokenCondensation #VisionLanguageModels #TrainingFreeAdaptation #ZeroShotLearning #EfficientAI #AI #ICCV2025 #ResearchHighlight
DavidVivancosΒ 
posted an update 3 days ago
AdinaYΒ 
posted an update 4 days ago
view post
Post
2580
Kimi K2 Thinking is now live on the hub πŸ”₯

moonshotai/Kimi-K2-Thinking

✨ 1T MoE for deep reasoning & tool use
✨ Native INT4 quantization = 2Γ— faster inference
✨ 256K context window
✨ Modified MIT license
AdinaYΒ 
posted an update 5 days ago
view post
Post
310
Chinese open source AI in October wasn’t about bigger models, it was about real world impact πŸ”₯

https://huggingface.co/collections/zh-ai-community/october-2025-china-open-source-highlights

✨ Vision-Language & OCR wave 🌊
- DeepSeek-OCR : 3B
- PaddleOCR-VL : 0.9B
- Qwen3-VL : 2B / 4B / 8B / 32B /30B-A3B
- Open-Bee: Bee-8B-RL
- http://Z.ai Glyph :10B

OCR is industrializing, the real game now is understanding the (long context) document, not just reading it.

✨ Text generation: scale or innovation?
- MiniMax-M2: 229B
- Antgroup Ling-1T & Ring-1T
- Moonshot Kimi-Linear : linear-attention challenger
- Kwaipilot KAT-Dev

Efficiency is the key.

✨ Any-to-Any & World-Model : one step forward to the real world
- BAAI Emu 3.5
- Antgroup Ming-flash-omni
- HunyuanWorld-Mirror: 3D

Aligning with the β€œworld model” globally

✨ Audio & Speech + Video & Visual: released from entertainment labs to delivery platforms
- SoulX-Podcast TTS
- LongCat-Audio-Codec & LongCat-Video by Meituan delivery paltform
- xiabs DreamOmni 2

Looking forward to what's next πŸš€
DmitryRyuminΒ 
posted an update 6 days ago
view post
Post
2283
πŸš€πŸ‘οΈπŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ‘οΈπŸš€
πŸ“„ Title: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching πŸ”

πŸ“ Description: The proposed method enhances stereo matching by efficiently combining unbiased monocular priors from vision foundation models. This method addresses misalignment and local optima issues using a binary local ordering map and pixel-wise linear regression.

πŸ‘₯ Authors: Chengtang Yao, Lidong Yu, Zhidan Liu, Jiaxi Zeng, Yuwei Wu, and Yunde Jia

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching (2505.14414)

πŸ“ Repository: https://github.com/YaoChengTang/Diving-into-the-Fusion-of-Monocular-Priors-for-Generalized-Stereo-Matching

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the 3D Pose Understanding Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/3d-pose-understanding.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #StereoMatching #MonocularDepth #VisionFoundationModels #3DReconstruction #Generalization #AI #ICCV2025 #ResearchHighlight
NymboΒ 
posted an update 8 days ago
view post
Post
423
I've added an 11th tool to the Nymbo/Tools MCP server, it's for your Obsidian_Vault. I'd argue it's far more context-efficient than any other Obsidian MCP I've seen, and doesn't require any plugins. Also some big improvements to the Web_Search and Web_Fetch tools.

# Obsidian_Vault Tool

It's basically a read-only version of the File_System tool, but it works so well for navigating Obsidian without unnecessary context. It supports recursive (full-text) search across the entire vault, and supports offset so the agent can "scroll" through a document without re-consuming tokens.

Run the server locally and set the OBSIDIAN_VAULT_ROOT environment variable to your vault's root path. If you don't use Obsidian, this is perfectly usable as simply a read-only filesystem.

# Web_Search Improvements

The Web_Search tool previously just used DuckDuckGo as a backend search engine, but now it also supports Bing, Brave, Yahoo, and Wikipedia. Default engine is auto which provides results from all backends in recommended order. Still doesn't require any kind of API or auth for Web_Search.

There's also a new date filter to limit results to those created in the past day, week, month, or year. Oh, and uhh, SafeSearch is now off by default :)

# Web_Fetch Improvements

As context-efficient as the Markdown mode is for web browsing, sometimes it does lose important context in the conversion from HTML to Markdown. So I've added a new HTML mode to the Web_Fetch tool that basically executes a cURL request on the URL, returning the full HTML page if necessary.

# A Note on Claude Skills

I've been having fun with the new File_System and Shell_Command tools. Using Claude Skills doesn't currently work in the public HF space because of environment restrictions, but using Skills works perfectly well running locally.

Happy building ~
DmitryRyuminΒ 
posted an update 10 days ago
view post
Post
2789
πŸš€πŸ‘ŒπŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ€ŒπŸš€
πŸ“„ Title: Understanding Co-speech Gestures in-the-wild πŸ”

πŸ“ Description: JEGAL is a tri-modal model that learns from gestures, speech and text simultaneously, enabling devices to interpret co-speech gestures in the wild.

πŸ‘₯ Authors: @sindhuhegde , K R Prajwal, Taein Kwon, and Andrew Zisserman

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Understanding Co-speech Gestures in-the-wild (2503.22668)

🌐 Web Page: https://www.robots.ox.ac.uk/~vgg/research/jegal
πŸ“ Repository: https://github.com/Sindhu-Hegde/jegal
πŸ“Ί Video: https://www.youtube.com/watch?v=TYFOLKfM-rM

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Human Modeling Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/human-modeling.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #CoSpeechGestures #GestureUnderstanding #TriModalRepresentation #MultimodalLearning #AI #ICCV2025 #ResearchHighlight
AdinaYΒ 
posted an update 10 days ago
view post
Post
378
Kimi LinearπŸš€ Hybrid linear attention model from Moonshot AI

https://huggingface.co/collections/moonshotai/kimi-linear-a3b

✨ 48B total/ 3B active - MIT license
✨ Up to 1M context
✨ 84.3 on RULER (128k) with 3.98Γ— speedup
✨ Hybrid KDA + MLA architecture for peak throughput & quality
DmitryRyuminΒ 
posted an update 13 days ago
view post
Post
3910
πŸš€πŸ’‘πŸŒŸ New Research Alert - ICCV 2025 (Oral)! 🌟πŸͺ„πŸš€
πŸ“„ Title: LoftUp: Learning a Coordinate-based Feature Upsampler for Vision Foundation Models πŸ”

πŸ“ Description: LoftUp is a coordinate-based transformer that upscales the low-resolution features of VFMs (e.g. DINOv2 and CLIP) using cross-attention and self-distilled pseudo-ground truth (pseudo-GT) from SAM.

πŸ‘₯ Authors: Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, and Dan Zhang

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models (2504.14032)

🌐 Github Page: https://andrehuang.github.io/loftup-site
πŸ“ Repository: https://github.com/andrehuang/loftup

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Foundation Models and Representation Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/foundation-models-and-representation-learning.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #LoftUp #VisionFoundationModels #FeatureUpsampling #Cross-AttentionTransformer #CoordinateBasedLearning #SelfDistillation #PseudoGroundTruth #RepresentationLearning #AI #ICCV2025 #ResearchHighlight
DmitryRyuminΒ 
posted an update 14 days ago
view post
Post
1919
πŸš€πŸ·οΈπŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ§©πŸš€
πŸ“„ Title: Heavy Labels Out! Dataset Distillation with Label Space Lightening πŸ”

πŸ“ Description: The HeLlO framework is a new corpus distillation method that removes the need for large soft labels. It uses a lightweight, online image-to-label projector based on CLIP. This projector has been adapted using LoRA-style, parameter-efficient tuning. It has also been initialized with text embeddings.

πŸ‘₯ Authors: @roseannelexie , @Huage001 , Zigeng Chen, Jingwen Ye, and Xinchao Wang

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Heavy Labels Out! Dataset Distillation with Label Space Lightening (2408.08201)

πŸ“Ί Video: https://www.youtube.com/watch?v=kAyK_3wskgA

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #DatasetDistillation #LabelCompression #CLIP #LoRA #EfficientAI #FoundationModels #AI #ICCV2025 #ResearchHighlight
  • 2 replies
Β·