Test Dummy's picture
2 13

Test Dummy

testdummyvt
Β·

AI & ML interests

None yet

Recent Activity

Organizations

None yet

testdummyvt's activity

reacted to MonsterMMORPG's post with πŸ”₯ 3 days ago
view post
Post
1855
I just pushed another amazing update to our Wan 2.1 APP. LoRA loading for 14B Wan 2.1 models were taking over 15 minutes. Optimized to take only few seconds now. Fully supports RTX 5000 series and fully optimized for both VRAM and RAM.

Our APP here : https://www.patreon.com/posts/wan-2-1-ultra-as-123105403

Tutorial 1 : https://youtu.be/hnAhveNy-8s

Tutorial 2 : https://youtu.be/ueMrzmbdWBg

It is also pushed to the original repo you can see pull request here : https://github.com/modelscope/DiffSynth-Studio/pull/442

reacted to singhsidhukuldeep's post with πŸ‘€ 3 months ago
view post
Post
1111
Exciting breakthrough in multimodal search technology! @nvidia researchers have developed MM-Embed, a groundbreaking universal multimodal retrieval system that's changing how we think about search.

Key innovations:
β€’ First-ever universal multimodal retriever that excels at both text and image searches across diverse tasks
β€’ Leverages advanced multimodal LLMs to understand complex queries combining text and images
β€’ Implements novel modality-aware hard negative mining to overcome modality bias issues
β€’ Achieves state-of-the-art performance on M-BEIR benchmark while maintaining superior text retrieval capabilities

Under the hood:
The system uses a sophisticated bi-encoder architecture with LLaVa-Next (based on Mistral 7B) as its backbone. It employs a unique two-stage training approach: first with random negatives, then with carefully mined hard negatives to improve cross-modal understanding.

The real magic happens in the modality-aware negative mining, where the system learns to distinguish between incorrect modality matches and unsatisfactory information matches, ensuring retrieved results match both content and format requirements.

What sets it apart is its ability to handle diverse search scenarios - from simple text queries to complex combinations of images and text, all while maintaining high accuracy across different domains