AI & ML interests

Building interactive demos to scikit-learn examples ๐Ÿงก

sklearn-docs's activity

prithivMLmodsย 
posted an update about 13 hours ago
view post
Post
1102
Deepswipe by
.
.
.
. Deepseek๐Ÿฌ๐Ÿ—ฟ






Everything is now in recovery. ๐Ÿ“‰๐Ÿ“ˆ
  • 2 replies
ยท
merveย 
posted an update 5 days ago
view post
Post
4293
Oof, what a week! ๐Ÿฅต So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal ๐Ÿ’ฌ
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG ๐Ÿ’—
- UI-TARS are new models by ByteDance to unlock agentic GUI control ๐Ÿคฏ in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs ๐Ÿ“–
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! ๐Ÿคฏ
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio ๐Ÿ—ฃ๏ธ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation โฏ๏ธ
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
ยท
merveย 
posted an update 5 days ago
view post
Post
2069
smolagents can see ๐Ÿ”ฅ
we just shipped vision support to smolagents ๐Ÿค— agentic computers FTW

you can now:
๐Ÿ’ป let the agent get images dynamically (e.g. agentic web browser)
๐Ÿ“‘ pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc)
with few LoC change! ๐Ÿคฏ
you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) ๐Ÿค 

read our blog http://hf.co/blog/smolagents-can-see
suayptalhaย 
posted an update 8 days ago
prithivMLmodsย 
posted an update 9 days ago
view post
Post
3355
Q'n' Sketches โค๏ธโ€๐Ÿ”ฅ

๐Ÿ–ผ๏ธ Adapters:
- Qs : strangerzonehf/Qs-Sketch
- Qd : strangerzonehf/Qd-Sketch
- Qx : strangerzonehf/Qx-Art
- Qc : strangerzonehf/Qc-Sketch
- Bb : strangerzonehf/Bg-Bag

๐Ÿ Collection : strangerzonehf/q-series-sketch-678e3503bf3a661758429717

๐Ÿ”—Page : https://huggingface.co/strangerzonehf

.
.
.
@prithivMLmods ๐Ÿค—
merveย 
posted an update 12 days ago
view post
Post
2530
Everything that happened this week in open AI, a recap ๐Ÿค  merve/jan-17-releases-678a673a9de4a4675f215bf5

๐Ÿ‘€ Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance

๐Ÿ’ฌ LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens ๐Ÿคฏ
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D ๐Ÿง™๐Ÿปโ€โ™‚๏ธ
- ReaderLM-v2 is a new HTML parsing model by Jina AI

- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3

๐Ÿ–ผ๏ธ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture

๐Ÿ—ฃ๏ธ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities

๐Ÿ“– Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm
not-lainย 
posted an update 12 days ago
view post
Post
1146
we now have more than 2000 public AI models using ModelHubMixin๐Ÿค—
prithivMLmodsย 
posted an update 13 days ago
view post
Post
3060
ChemQwen-vL [ Qwen for Chem Vision ] ๐Ÿง‘๐Ÿปโ€๐Ÿ”ฌ

๐ŸงชModel : prithivMLmods/ChemQwen-vL

๐Ÿ“ChemQwen-vL is a vision-language model fine-tuned based on the Qwen2VL-2B Instruct model. It has been trained using the International Chemical Identifier (InChI) format for chemical compounds and is optimized for chemical compound identification. The model excels at generating the InChI and providing descriptions of chemical compounds based on their images. Its architecture operates within a multi-modal framework, combining image-text-text capabilities. It has been fine-tuned using datasets from: https://iupac.org/projects/

๐Ÿ“’Colab Demo: https://tinyurl.com/2pn8x6u7, Collection : https://tinyurl.com/2mt5bjju

Inference with the documentation is possible with the help of the ReportLab library. https://pypi.org/project/reportlab/

๐Ÿค—: @prithivMLmods
  • 1 reply
ยท
merveย 
posted an update 13 days ago
merveย 
posted an update 16 days ago
view post
Post
3859
there's a new multimodal retrieval model in town ๐Ÿค 
LlamaIndex released vdr-2b-multi-v1
> uses 70% less image tokens, yet outperforming other dse-qwen2 based models
> 3x faster inference with less VRAM ๐Ÿ’จ
> shrinkable with matryoshka ๐Ÿช†
> can do cross-lingual retrieval!
Collection: llamaindex/visual-document-retrieval-678151d19d2758f78ce910e1 (with models and datasets)
Demo: llamaindex/multimodal_vdr_demo
Learn more from their blog post here https://huggingface.co/blog/vdr-2b-multilingual ๐Ÿ“–
not-lainย 
posted an update 17 days ago
view post
Post
3926
Published a new blogpost ๐Ÿ“–
In this blogpost I have gone through the transformers' architecture emphasizing how shapes propagate throughout each layer.
๐Ÿ”— https://huggingface.co/blog/not-lain/tensor-dims
some interesting takeaways :
merveย 
posted an update 19 days ago
view post
Post
3609
What a beginning to this year in open ML ๐Ÿค 
Let's unwrap! merve/jan-10-releases-677fe34177759de0edfc9714

Multimodal ๐Ÿ–ผ๏ธ
> ByteDance released SA2VA: a family of vision LMs that can take image, video, text and visual prompts
> moondream2 is out with new capabilities like outputting structured data and gaze detection!
> Dataset: Alibaba DAMO lab released multimodal textbook โ€” 22k hours worth of samples from instruction videos ๐Ÿคฏ
> Dataset: SciCap captioning on scientific documents benchmark dataset is released along with the challenge!

LLMs ๐Ÿ’ฌ
> Microsoft released Phi-4, sota open-source 14B language model ๐Ÿ”ฅ
> Dolphin is back with Dolphin 3.0 Llama 3.1 8B ๐Ÿฌ๐Ÿฌ
> Prime-RL released Eurus-2-7B-PRIME a new language model trained using PRIME alignment
> SmallThinker-3B is a new small reasoning LM based on Owen2.5-3B-Instruct ๐Ÿ’ญ
> Dataset: QWQ-LONGCOT-500K is the dataset used to train SmallThinker, generated using QwQ-32B-preview ๐Ÿ“•
> Dataset: @cfahlgren1 released React Code Instructions: a dataset of code instruction-code pairs ๐Ÿ“•
> Dataset: Qwen team is on the roll, they just released CodeElo, a dataset of code preferences ๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป

Embeddings ๐Ÿ”–
> @MoritzLaurer released zero-shot version of ModernBERT large ๐Ÿ‘
> KaLM is a new family of performant multilingual embedding models with MIT license built using Qwen2-0.5B

Image/Video Generation โฏ๏ธ
> NVIDIA released Cosmos, a new family of diffusion/autoregressive World Foundation Models generating worlds from images, videos and texts ๐Ÿ”ฅ
> Adobe released TransPixar: a new text-to-video model that can generate assets with transparent backgrounds (a first!)
> Dataset: fal released cosmos-openvid-1m Cosmos-tokenized OpenVid-1M with samples from OpenVid-1M

Others
> Prior Labs released TabPFNv2, the best tabular transformer is out for classification and regression
> Metagene-1 is a new RNA language model that can be used for pathogen detection, zero-shot embedding and genome understanding
prithivMLmodsย 
posted an update 20 days ago
view post
Post
3353
200+ f{๐Ÿค—} on Stranger Zone! [ https://huggingface.co/strangerzonehf ]

โค๏ธโ€๐Ÿ”ฅStranger Zone's MidJourney Mix Model Adapter is trending on the Very Model Page, with over 45,000+ downloads. Additionally, the Super Realism Model Adapter has over 52,000+ downloads, remains the top two adapter on Stranger Zone!
strangerzonehf/Flux-Midjourney-Mix2-LoRA, strangerzonehf/Flux-Super-Realism-LoRA

๐Ÿ‘ฝTry Demo: prithivMLmods/FLUX-LoRA-DLC

๐Ÿ“ฆMost Recent Adapters to Check Out :
+ Ctoon : strangerzonehf/Ctoon-Plus-Plus
+ Cardboard : strangerzonehf/Flux-Cardboard-Art-LoRA
+ Claude Art : strangerzonehf/Flux-Claude-Art
+ Flay Lay : strangerzonehf/Flux-FlatLay-LoRA
+ Smiley Portrait : strangerzonehf/Flux-Smiley-Portrait-LoRA

๐Ÿค—Thanks for Community & OPEN SOURCEEE !!
  • 6 replies
ยท
merveย 
posted an update 20 days ago
view post
Post
1798
ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license ๐Ÿ’— ByteDance/sa2va-model-zoo-677e3084d71b5f108d00e093

> The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos โฏ๏ธ

> The models come in 1B, 4B and 8B and are based on InternVL2.5 for base architecture and Qwen2, Qwen2.5 and InternLM2 for language model part (depending on the checkpoint)

> The model is very interesting, it has different encoders for different modalities each (visual prompt, text prompt, image and video) then it concatenates these to feed into LLM ๐Ÿ’ฌ

the output segmentation tokens are passed to SAM2, to sort of match text (captions or semantic classes) to masks โคต๏ธ

> Their annotation pipeline is also interesting, they seems to use two open large vision LMs to refine the annotations, and have different levels of descriptions to provide consistency.
  • 1 reply
ยท
prithivMLmodsย 
posted an update 24 days ago
view post
Post
5902
Reasoning SmolLM2 ๐Ÿš€

๐ŸŽฏFine-tuning SmolLM2 on a lightweight synthetic reasoning dataset for reasoning-specific tasks. Future updates will focus on lightweight, blazing-fast reasoning models. Until then, check out the blog for fine-tuning details.

๐Ÿ”ฅBlog : https://huggingface.co/blog/prithivMLmods/smollm2-ft

๐Ÿ”ผ Models :
+ SmolLM2-CoT-360M : prithivMLmods/SmolLM2-CoT-360M
+ Reasoning-SmolLM2-135M : prithivMLmods/Reasoning-SmolLM2-135M
+ SmolLM2-CoT-360M-GGUF : prithivMLmods/SmolLM2-CoT-360M-GGUF

๐Ÿค  Other Details :
+ Demo : prithivMLmods/SmolLM2-CoT-360M
+ Fine-tune nB : prithivMLmods/SmolLM2-CoT-360M




prithivMLmodsย 
posted an update 29 days ago
view post
Post
3863
Triangulum Catalogued ๐Ÿ”ฅ๐Ÿ’ซ

๐ŸŽฏTriangulum is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.

+ Triangulum-10B : prithivMLmods/Triangulum-10B
+ Quants : prithivMLmods/Triangulum-10B-GGUF

+ Triangulum-5B : prithivMLmods/Triangulum-5B
+ Quants : prithivMLmods/Triangulum-5B-GGUF

+ Triangulum-1B : prithivMLmods/Triangulum-1B
+ Quants : prithivMLmods/Triangulum-1B-GGUF
ยท
merveย 
posted an update 29 days ago
view post
Post
4835
supercharge your LLM apps with smolagents ๐Ÿ”ฅ

however cool your LLM is, without being agentic it can only go so far

enter smolagents: a new agent library by Hugging Face to make the LLM write code, do analysis and automate boring stuff!

Here's our blog for you to get started https://huggingface.co/blog/smolagents
suayptalhaย 
posted an update about 1 month ago
view post
Post
1926
๐Ÿš€ Introducing ๐…๐ข๐ซ๐ฌ๐ญ ๐‡๐ฎ๐ ๐ ๐ข๐ง๐  ๐…๐š๐œ๐ž ๐ˆ๐ง๐ญ๐ž๐ ๐ซ๐š๐ญ๐ข๐จ๐ง ๐จ๐Ÿ ๐ฆ๐ข๐ง๐†๐‘๐” ๐Œ๐จ๐๐ž๐ฅ๐ฌ from the paper ๐–๐ž๐ซ๐ž ๐‘๐๐๐ฌ ๐€๐ฅ๐ฅ ๐–๐ž ๐๐ž๐ž๐๐ž๐?

๐Ÿ–ฅ I have integrated ๐ง๐ž๐ฑ๐ญ-๐ ๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐จ๐ง ๐‘๐๐๐ฌ, specifically minGRU, which offer faster performance compared to Transformer architectures, into HuggingFace. This allows users to leverage the lighter and more efficient minGRU models with the "๐ญ๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐ž๐ซ๐ฌ" ๐ฅ๐ข๐›๐ซ๐š๐ซ๐ฒ for both usage and training.

๐Ÿ’ป I integrated two main tasks: ๐Œ๐ข๐ง๐†๐‘๐”๐…๐จ๐ซ๐’๐ž๐ช๐ฎ๐ž๐ง๐œ๐ž๐‚๐ฅ๐š๐ฌ๐ฌ๐ข๐Ÿ๐ข๐œ๐š๐ญ๐ข๐จ๐ง and ๐Œ๐ข๐ง๐†๐‘๐”๐…๐จ๐ซ๐‚๐š๐ฎ๐ฌ๐š๐ฅ๐‹๐Œ.

๐Œ๐ข๐ง๐†๐‘๐”๐…๐จ๐ซ๐’๐ž๐ช๐ฎ๐ž๐ง๐œ๐ž๐‚๐ฅ๐š๐ฌ๐ฌ๐ข๐Ÿ๐ข๐œ๐š๐ญ๐ข๐จ๐ง:
You can use this class for ๐’๐ž๐ช๐ฎ๐ž๐ง๐œ๐ž ๐‚๐ฅ๐š๐ฌ๐ฌ๐ข๐Ÿ๐ข๐œ๐š๐ญ๐ข๐จ๐ง tasks. I also trained a Sentiment Analysis model with stanfordnlp/imdb dataset.

๐Œ๐ข๐ง๐†๐‘๐”๐…๐จ๐ซ๐‚๐š๐ฎ๐ฌ๐š๐ฅ๐‹๐Œ:
You can use this class for ๐‚๐š๐ฎ๐ฌ๐š๐ฅ ๐‹๐š๐ง๐ ๐ฎ๐š๐ ๐ž ๐Œ๐จ๐๐ž๐ฅ tasks such as GPT, Llama. I also trained an example model with roneneldan/TinyStories dataset. You can fine-tune and use it!

๐Ÿ”— ๐‹๐ข๐ง๐ค๐ฌ:
Models: suayptalha/mingru-676fe8d90760d01b7955d7ab
GitHub: https://github.com/suayptalha/minGRU-hf
LinkedIn Post: https://www.linkedin.com/posts/suayp-talha-kocabay_mingru-a-suayptalha-collection-activity-7278755484172439552-wNY1

๐Ÿ“ฐ ๐‚๐ซ๐ž๐๐ข๐ญ๐ฌ:
Paper Link: https://arxiv.org/abs/2410.01201

I am thankful to Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio and Hossein Hajimirsadeghi for their papers.
suayptalhaย 
posted an update about 1 month ago
view post
Post
2490
๐Ÿš€ Introducing Substitution Cipher Solvers!

As @suayptalha and @Synd209 we are thrilled to share an update!

๐Ÿ”‘ This project contains a text-to-text model designed to decrypt English and Turkish text encoded using a substitution cipher. In a substitution cipher, each letter in the plaintext is replaced by a corresponding, unique letter to form the ciphertext. The model leverages statistical and linguistic properties of English to make educated guesses about the letter substitutions, aiming to recover the original plaintext message.

These models were fine-tuned on T5-base. The models are for monoalphabetic English and Turkish substitution ciphers, and they output decoded text and the alphabet with an accuracy that has never been achieved before!

Example:

Encoded text: Z hztwgx tstcsf qf z ulooqfe osfuqb tzx uezx awej z ozewsbe vlfwby fsmqisfx.

Decoded text: A family member or a support person may stay with a patient during recovery.

Model Collection Link: Cipher-AI/substitution-cipher-solvers-6731ebd22f0f0d8e0e2e2e00

Organization Link: https://huggingface.co/Cipher-AI
  • 3 replies
ยท
merveย 
posted an update about 1 month ago