scikit-learn
community
AI & ML interests
Building interactive demos to scikit-learn examples ๐งก
sklearn-docs's activity
prithivMLmodsย
posted
an
update
about 13 hours ago
Post
4293
Oof, what a week! ๐ฅต So many things have happened, let's recap!
merve/jan-24-releases-6793d610774073328eac67a9
Multimodal ๐ฌ
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG ๐
- UI-TARS are new models by ByteDance to unlock agentic GUI control ๐คฏ in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs ๐
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! ๐คฏ
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio ๐ฃ๏ธ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation โฏ๏ธ
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
Multimodal ๐ฌ
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG ๐
- UI-TARS are new models by ByteDance to unlock agentic GUI control ๐คฏ in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs ๐
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! ๐คฏ
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio ๐ฃ๏ธ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation โฏ๏ธ
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
Post
2069
smolagents can see ๐ฅ
we just shipped vision support to smolagents ๐ค agentic computers FTW
you can now:
๐ป let the agent get images dynamically (e.g. agentic web browser)
๐ pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc)
with few LoC change! ๐คฏ
you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) ๐ค
read our blog http://hf.co/blog/smolagents-can-see
we just shipped vision support to smolagents ๐ค agentic computers FTW
you can now:
๐ป let the agent get images dynamically (e.g. agentic web browser)
๐ pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc)
with few LoC change! ๐คฏ
you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) ๐ค
read our blog http://hf.co/blog/smolagents-can-see
suayptalhaย
posted
an
update
8 days ago
Post
1604
My last Falcon3-7B merge model,
suayptalha/Falcon3-Jessi-v0.4-7B-Slerp, is currently ranked #1 on the
open-llm-leaderboard/open_llm_leaderboard among all models with up to 14B parameters.
My Qwen2.5-7B merge model, suayptalha/HomerCreativeAnvita-Mix-Qw7B, is also ranked #7, placing two of my models in the top 10!
My Qwen2.5-7B merge model, suayptalha/HomerCreativeAnvita-Mix-Qw7B, is also ranked #7, placing two of my models in the top 10!
prithivMLmodsย
posted
an
update
9 days ago
Post
3355
Q'n' Sketches โค๏ธโ๐ฅ
๐ผ๏ธ Adapters:
- Qs : strangerzonehf/Qs-Sketch
- Qd : strangerzonehf/Qd-Sketch
- Qx : strangerzonehf/Qx-Art
- Qc : strangerzonehf/Qc-Sketch
- Bb : strangerzonehf/Bg-Bag
๐ Collection : strangerzonehf/q-series-sketch-678e3503bf3a661758429717
๐Page : https://huggingface.co/strangerzonehf
.
.
.
@prithivMLmods ๐ค
๐ผ๏ธ Adapters:
- Qs : strangerzonehf/Qs-Sketch
- Qd : strangerzonehf/Qd-Sketch
- Qx : strangerzonehf/Qx-Art
- Qc : strangerzonehf/Qc-Sketch
- Bb : strangerzonehf/Bg-Bag
๐ Collection : strangerzonehf/q-series-sketch-678e3503bf3a661758429717
๐Page : https://huggingface.co/strangerzonehf
.
.
.
@prithivMLmods ๐ค
Post
2530
Everything that happened this week in open AI, a recap ๐ค
merve/jan-17-releases-678a673a9de4a4675f215bf5
๐ Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance
๐ฌ LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens ๐คฏ
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D ๐ง๐ปโโ๏ธ
- ReaderLM-v2 is a new HTML parsing model by Jina AI
- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3
๐ผ๏ธ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture
๐ฃ๏ธ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities
๐ Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm
๐ Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance
๐ฌ LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens ๐คฏ
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D ๐ง๐ปโโ๏ธ
- ReaderLM-v2 is a new HTML parsing model by Jina AI
- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3
๐ผ๏ธ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture
๐ฃ๏ธ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities
๐ Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm
Post
1146
we now have more than 2000 public AI models using ModelHubMixin๐ค
prithivMLmodsย
posted
an
update
13 days ago
Post
3060
ChemQwen-vL [ Qwen for Chem Vision ] ๐ง๐ปโ๐ฌ
๐งชModel : prithivMLmods/ChemQwen-vL
๐ChemQwen-vL is a vision-language model fine-tuned based on the Qwen2VL-2B Instruct model. It has been trained using the International Chemical Identifier (InChI) format for chemical compounds and is optimized for chemical compound identification. The model excels at generating the InChI and providing descriptions of chemical compounds based on their images. Its architecture operates within a multi-modal framework, combining image-text-text capabilities. It has been fine-tuned using datasets from: https://iupac.org/projects/
๐Colab Demo: https://tinyurl.com/2pn8x6u7, Collection : https://tinyurl.com/2mt5bjju
Inference with the documentation is possible with the help of the ReportLab library. https://pypi.org/project/reportlab/
๐ค: @prithivMLmods
๐งชModel : prithivMLmods/ChemQwen-vL
๐ChemQwen-vL is a vision-language model fine-tuned based on the Qwen2VL-2B Instruct model. It has been trained using the International Chemical Identifier (InChI) format for chemical compounds and is optimized for chemical compound identification. The model excels at generating the InChI and providing descriptions of chemical compounds based on their images. Its architecture operates within a multi-modal framework, combining image-text-text capabilities. It has been fine-tuned using datasets from: https://iupac.org/projects/
๐Colab Demo: https://tinyurl.com/2pn8x6u7, Collection : https://tinyurl.com/2mt5bjju
Inference with the documentation is possible with the help of the ReportLab library. https://pypi.org/project/reportlab/
๐ค: @prithivMLmods
Post
1975
New smolagents example landed on Hugging Face cookbook ๐ค
Learn how to create an inventory managing multi-agent system with smolagents, MongoDB and DeepSeek Chat ๐ https://huggingface.co/learn/cookbook/mongodb_smolagents_multi_micro_agents
Learn how to create an inventory managing multi-agent system with smolagents, MongoDB and DeepSeek Chat ๐ https://huggingface.co/learn/cookbook/mongodb_smolagents_multi_micro_agents
Post
3859
there's a new multimodal retrieval model in town ๐ค
LlamaIndex released vdr-2b-multi-v1
> uses 70% less image tokens, yet outperforming other dse-qwen2 based models
> 3x faster inference with less VRAM ๐จ
> shrinkable with matryoshka ๐ช
> can do cross-lingual retrieval!
Collection: llamaindex/visual-document-retrieval-678151d19d2758f78ce910e1 (with models and datasets)
Demo: llamaindex/multimodal_vdr_demo
Learn more from their blog post here https://huggingface.co/blog/vdr-2b-multilingual ๐
LlamaIndex released vdr-2b-multi-v1
> uses 70% less image tokens, yet outperforming other dse-qwen2 based models
> 3x faster inference with less VRAM ๐จ
> shrinkable with matryoshka ๐ช
> can do cross-lingual retrieval!
Collection: llamaindex/visual-document-retrieval-678151d19d2758f78ce910e1 (with models and datasets)
Demo: llamaindex/multimodal_vdr_demo
Learn more from their blog post here https://huggingface.co/blog/vdr-2b-multilingual ๐
Post
3926
Published a new blogpost ๐
In this blogpost I have gone through the transformers' architecture emphasizing how shapes propagate throughout each layer.
๐ https://huggingface.co/blog/not-lain/tensor-dims
some interesting takeaways :
In this blogpost I have gone through the transformers' architecture emphasizing how shapes propagate throughout each layer.
๐ https://huggingface.co/blog/not-lain/tensor-dims
some interesting takeaways :
Post
3609
What a beginning to this year in open ML ๐ค
Let's unwrap! merve/jan-10-releases-677fe34177759de0edfc9714
Multimodal ๐ผ๏ธ
> ByteDance released SA2VA: a family of vision LMs that can take image, video, text and visual prompts
> moondream2 is out with new capabilities like outputting structured data and gaze detection!
> Dataset: Alibaba DAMO lab released multimodal textbook โ 22k hours worth of samples from instruction videos ๐คฏ
> Dataset: SciCap captioning on scientific documents benchmark dataset is released along with the challenge!
LLMs ๐ฌ
> Microsoft released Phi-4, sota open-source 14B language model ๐ฅ
> Dolphin is back with Dolphin 3.0 Llama 3.1 8B ๐ฌ๐ฌ
> Prime-RL released Eurus-2-7B-PRIME a new language model trained using PRIME alignment
> SmallThinker-3B is a new small reasoning LM based on Owen2.5-3B-Instruct ๐ญ
> Dataset: QWQ-LONGCOT-500K is the dataset used to train SmallThinker, generated using QwQ-32B-preview ๐
> Dataset: @cfahlgren1 released React Code Instructions: a dataset of code instruction-code pairs ๐
> Dataset: Qwen team is on the roll, they just released CodeElo, a dataset of code preferences ๐ฉ๐ปโ๐ป
Embeddings ๐
> @MoritzLaurer released zero-shot version of ModernBERT large ๐
> KaLM is a new family of performant multilingual embedding models with MIT license built using Qwen2-0.5B
Image/Video Generation โฏ๏ธ
> NVIDIA released Cosmos, a new family of diffusion/autoregressive World Foundation Models generating worlds from images, videos and texts ๐ฅ
> Adobe released TransPixar: a new text-to-video model that can generate assets with transparent backgrounds (a first!)
> Dataset: fal released cosmos-openvid-1m Cosmos-tokenized OpenVid-1M with samples from OpenVid-1M
Others
> Prior Labs released TabPFNv2, the best tabular transformer is out for classification and regression
> Metagene-1 is a new RNA language model that can be used for pathogen detection, zero-shot embedding and genome understanding
Let's unwrap! merve/jan-10-releases-677fe34177759de0edfc9714
Multimodal ๐ผ๏ธ
> ByteDance released SA2VA: a family of vision LMs that can take image, video, text and visual prompts
> moondream2 is out with new capabilities like outputting structured data and gaze detection!
> Dataset: Alibaba DAMO lab released multimodal textbook โ 22k hours worth of samples from instruction videos ๐คฏ
> Dataset: SciCap captioning on scientific documents benchmark dataset is released along with the challenge!
LLMs ๐ฌ
> Microsoft released Phi-4, sota open-source 14B language model ๐ฅ
> Dolphin is back with Dolphin 3.0 Llama 3.1 8B ๐ฌ๐ฌ
> Prime-RL released Eurus-2-7B-PRIME a new language model trained using PRIME alignment
> SmallThinker-3B is a new small reasoning LM based on Owen2.5-3B-Instruct ๐ญ
> Dataset: QWQ-LONGCOT-500K is the dataset used to train SmallThinker, generated using QwQ-32B-preview ๐
> Dataset: @cfahlgren1 released React Code Instructions: a dataset of code instruction-code pairs ๐
> Dataset: Qwen team is on the roll, they just released CodeElo, a dataset of code preferences ๐ฉ๐ปโ๐ป
Embeddings ๐
> @MoritzLaurer released zero-shot version of ModernBERT large ๐
> KaLM is a new family of performant multilingual embedding models with MIT license built using Qwen2-0.5B
Image/Video Generation โฏ๏ธ
> NVIDIA released Cosmos, a new family of diffusion/autoregressive World Foundation Models generating worlds from images, videos and texts ๐ฅ
> Adobe released TransPixar: a new text-to-video model that can generate assets with transparent backgrounds (a first!)
> Dataset: fal released cosmos-openvid-1m Cosmos-tokenized OpenVid-1M with samples from OpenVid-1M
Others
> Prior Labs released TabPFNv2, the best tabular transformer is out for classification and regression
> Metagene-1 is a new RNA language model that can be used for pathogen detection, zero-shot embedding and genome understanding
prithivMLmodsย
posted
an
update
20 days ago
Post
3353
200+ f{๐ค} on Stranger Zone! [ https://huggingface.co/strangerzonehf ]
โค๏ธโ๐ฅStranger Zone's MidJourney Mix Model Adapter is trending on the Very Model Page, with over 45,000+ downloads. Additionally, the Super Realism Model Adapter has over 52,000+ downloads, remains the top two adapter on Stranger Zone!
strangerzonehf/Flux-Midjourney-Mix2-LoRA, strangerzonehf/Flux-Super-Realism-LoRA
๐ฝTry Demo: prithivMLmods/FLUX-LoRA-DLC
๐ฆMost Recent Adapters to Check Out :
+ Ctoon : strangerzonehf/Ctoon-Plus-Plus
+ Cardboard : strangerzonehf/Flux-Cardboard-Art-LoRA
+ Claude Art : strangerzonehf/Flux-Claude-Art
+ Flay Lay : strangerzonehf/Flux-FlatLay-LoRA
+ Smiley Portrait : strangerzonehf/Flux-Smiley-Portrait-LoRA
๐คThanks for Community & OPEN SOURCEEE !!
โค๏ธโ๐ฅStranger Zone's MidJourney Mix Model Adapter is trending on the Very Model Page, with over 45,000+ downloads. Additionally, the Super Realism Model Adapter has over 52,000+ downloads, remains the top two adapter on Stranger Zone!
strangerzonehf/Flux-Midjourney-Mix2-LoRA, strangerzonehf/Flux-Super-Realism-LoRA
๐ฝTry Demo: prithivMLmods/FLUX-LoRA-DLC
๐ฆMost Recent Adapters to Check Out :
+ Ctoon : strangerzonehf/Ctoon-Plus-Plus
+ Cardboard : strangerzonehf/Flux-Cardboard-Art-LoRA
+ Claude Art : strangerzonehf/Flux-Claude-Art
+ Flay Lay : strangerzonehf/Flux-FlatLay-LoRA
+ Smiley Portrait : strangerzonehf/Flux-Smiley-Portrait-LoRA
๐คThanks for Community & OPEN SOURCEEE !!
Post
1798
ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license ๐
ByteDance/sa2va-model-zoo-677e3084d71b5f108d00e093
> The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos โฏ๏ธ
> The models come in 1B, 4B and 8B and are based on InternVL2.5 for base architecture and Qwen2, Qwen2.5 and InternLM2 for language model part (depending on the checkpoint)
> The model is very interesting, it has different encoders for different modalities each (visual prompt, text prompt, image and video) then it concatenates these to feed into LLM ๐ฌ
the output segmentation tokens are passed to SAM2, to sort of match text (captions or semantic classes) to masks โคต๏ธ
> Their annotation pipeline is also interesting, they seems to use two open large vision LMs to refine the annotations, and have different levels of descriptions to provide consistency.
> The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos โฏ๏ธ
> The models come in 1B, 4B and 8B and are based on InternVL2.5 for base architecture and Qwen2, Qwen2.5 and InternLM2 for language model part (depending on the checkpoint)
> The model is very interesting, it has different encoders for different modalities each (visual prompt, text prompt, image and video) then it concatenates these to feed into LLM ๐ฌ
the output segmentation tokens are passed to SAM2, to sort of match text (captions or semantic classes) to masks โคต๏ธ
> Their annotation pipeline is also interesting, they seems to use two open large vision LMs to refine the annotations, and have different levels of descriptions to provide consistency.
prithivMLmodsย
posted
an
update
24 days ago
Post
5902
Reasoning SmolLM2 ๐
๐ฏFine-tuning SmolLM2 on a lightweight synthetic reasoning dataset for reasoning-specific tasks. Future updates will focus on lightweight, blazing-fast reasoning models. Until then, check out the blog for fine-tuning details.
๐ฅBlog : https://huggingface.co/blog/prithivMLmods/smollm2-ft
๐ผ Models :
+ SmolLM2-CoT-360M : prithivMLmods/SmolLM2-CoT-360M
+ Reasoning-SmolLM2-135M : prithivMLmods/Reasoning-SmolLM2-135M
+ SmolLM2-CoT-360M-GGUF : prithivMLmods/SmolLM2-CoT-360M-GGUF
๐ค Other Details :
+ Demo : prithivMLmods/SmolLM2-CoT-360M
+ Fine-tune nB : prithivMLmods/SmolLM2-CoT-360M
๐ฏFine-tuning SmolLM2 on a lightweight synthetic reasoning dataset for reasoning-specific tasks. Future updates will focus on lightweight, blazing-fast reasoning models. Until then, check out the blog for fine-tuning details.
๐ฅBlog : https://huggingface.co/blog/prithivMLmods/smollm2-ft
๐ผ Models :
+ SmolLM2-CoT-360M : prithivMLmods/SmolLM2-CoT-360M
+ Reasoning-SmolLM2-135M : prithivMLmods/Reasoning-SmolLM2-135M
+ SmolLM2-CoT-360M-GGUF : prithivMLmods/SmolLM2-CoT-360M-GGUF
๐ค Other Details :
+ Demo : prithivMLmods/SmolLM2-CoT-360M
+ Fine-tune nB : prithivMLmods/SmolLM2-CoT-360M
prithivMLmodsย
posted
an
update
29 days ago
Post
3863
Triangulum Catalogued ๐ฅ๐ซ
๐ฏTriangulum is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.
+ Triangulum-10B : prithivMLmods/Triangulum-10B
+ Quants : prithivMLmods/Triangulum-10B-GGUF
+ Triangulum-5B : prithivMLmods/Triangulum-5B
+ Quants : prithivMLmods/Triangulum-5B-GGUF
+ Triangulum-1B : prithivMLmods/Triangulum-1B
+ Quants : prithivMLmods/Triangulum-1B-GGUF
๐ฏTriangulum is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.
+ Triangulum-10B : prithivMLmods/Triangulum-10B
+ Quants : prithivMLmods/Triangulum-10B-GGUF
+ Triangulum-5B : prithivMLmods/Triangulum-5B
+ Quants : prithivMLmods/Triangulum-5B-GGUF
+ Triangulum-1B : prithivMLmods/Triangulum-1B
+ Quants : prithivMLmods/Triangulum-1B-GGUF
Post
4835
supercharge your LLM apps with smolagents ๐ฅ
however cool your LLM is, without being agentic it can only go so far
enter smolagents: a new agent library by Hugging Face to make the LLM write code, do analysis and automate boring stuff!
Here's our blog for you to get started https://huggingface.co/blog/smolagents
however cool your LLM is, without being agentic it can only go so far
enter smolagents: a new agent library by Hugging Face to make the LLM write code, do analysis and automate boring stuff!
Here's our blog for you to get started https://huggingface.co/blog/smolagents
suayptalhaย
posted
an
update
about 1 month ago
Post
1926
๐ Introducing ๐
๐ข๐ซ๐ฌ๐ญ ๐๐ฎ๐ ๐ ๐ข๐ง๐ ๐
๐๐๐ ๐๐ง๐ญ๐๐ ๐ซ๐๐ญ๐ข๐จ๐ง ๐จ๐ ๐ฆ๐ข๐ง๐๐๐ ๐๐จ๐๐๐ฅ๐ฌ from the paper ๐๐๐ซ๐ ๐๐๐๐ฌ ๐๐ฅ๐ฅ ๐๐ ๐๐๐๐๐๐?
๐ฅ I have integrated ๐ง๐๐ฑ๐ญ-๐ ๐๐ง๐๐ซ๐๐ญ๐ข๐จ๐ง ๐๐๐๐ฌ, specifically minGRU, which offer faster performance compared to Transformer architectures, into HuggingFace. This allows users to leverage the lighter and more efficient minGRU models with the "๐ญ๐ซ๐๐ง๐ฌ๐๐จ๐ซ๐ฆ๐๐ซ๐ฌ" ๐ฅ๐ข๐๐ซ๐๐ซ๐ฒ for both usage and training.
๐ป I integrated two main tasks: ๐๐ข๐ง๐๐๐๐ ๐จ๐ซ๐๐๐ช๐ฎ๐๐ง๐๐๐๐ฅ๐๐ฌ๐ฌ๐ข๐๐ข๐๐๐ญ๐ข๐จ๐ง and ๐๐ข๐ง๐๐๐๐ ๐จ๐ซ๐๐๐ฎ๐ฌ๐๐ฅ๐๐.
๐๐ข๐ง๐๐๐๐ ๐จ๐ซ๐๐๐ช๐ฎ๐๐ง๐๐๐๐ฅ๐๐ฌ๐ฌ๐ข๐๐ข๐๐๐ญ๐ข๐จ๐ง:
You can use this class for ๐๐๐ช๐ฎ๐๐ง๐๐ ๐๐ฅ๐๐ฌ๐ฌ๐ข๐๐ข๐๐๐ญ๐ข๐จ๐ง tasks. I also trained a Sentiment Analysis model with stanfordnlp/imdb dataset.
๐๐ข๐ง๐๐๐๐ ๐จ๐ซ๐๐๐ฎ๐ฌ๐๐ฅ๐๐:
You can use this class for ๐๐๐ฎ๐ฌ๐๐ฅ ๐๐๐ง๐ ๐ฎ๐๐ ๐ ๐๐จ๐๐๐ฅ tasks such as GPT, Llama. I also trained an example model with roneneldan/TinyStories dataset. You can fine-tune and use it!
๐ ๐๐ข๐ง๐ค๐ฌ:
Models: suayptalha/mingru-676fe8d90760d01b7955d7ab
GitHub: https://github.com/suayptalha/minGRU-hf
LinkedIn Post: https://www.linkedin.com/posts/suayp-talha-kocabay_mingru-a-suayptalha-collection-activity-7278755484172439552-wNY1
๐ฐ ๐๐ซ๐๐๐ข๐ญ๐ฌ:
Paper Link: https://arxiv.org/abs/2410.01201
I am thankful to Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio and Hossein Hajimirsadeghi for their papers.
๐ฅ I have integrated ๐ง๐๐ฑ๐ญ-๐ ๐๐ง๐๐ซ๐๐ญ๐ข๐จ๐ง ๐๐๐๐ฌ, specifically minGRU, which offer faster performance compared to Transformer architectures, into HuggingFace. This allows users to leverage the lighter and more efficient minGRU models with the "๐ญ๐ซ๐๐ง๐ฌ๐๐จ๐ซ๐ฆ๐๐ซ๐ฌ" ๐ฅ๐ข๐๐ซ๐๐ซ๐ฒ for both usage and training.
๐ป I integrated two main tasks: ๐๐ข๐ง๐๐๐๐ ๐จ๐ซ๐๐๐ช๐ฎ๐๐ง๐๐๐๐ฅ๐๐ฌ๐ฌ๐ข๐๐ข๐๐๐ญ๐ข๐จ๐ง and ๐๐ข๐ง๐๐๐๐ ๐จ๐ซ๐๐๐ฎ๐ฌ๐๐ฅ๐๐.
๐๐ข๐ง๐๐๐๐ ๐จ๐ซ๐๐๐ช๐ฎ๐๐ง๐๐๐๐ฅ๐๐ฌ๐ฌ๐ข๐๐ข๐๐๐ญ๐ข๐จ๐ง:
You can use this class for ๐๐๐ช๐ฎ๐๐ง๐๐ ๐๐ฅ๐๐ฌ๐ฌ๐ข๐๐ข๐๐๐ญ๐ข๐จ๐ง tasks. I also trained a Sentiment Analysis model with stanfordnlp/imdb dataset.
๐๐ข๐ง๐๐๐๐ ๐จ๐ซ๐๐๐ฎ๐ฌ๐๐ฅ๐๐:
You can use this class for ๐๐๐ฎ๐ฌ๐๐ฅ ๐๐๐ง๐ ๐ฎ๐๐ ๐ ๐๐จ๐๐๐ฅ tasks such as GPT, Llama. I also trained an example model with roneneldan/TinyStories dataset. You can fine-tune and use it!
๐ ๐๐ข๐ง๐ค๐ฌ:
Models: suayptalha/mingru-676fe8d90760d01b7955d7ab
GitHub: https://github.com/suayptalha/minGRU-hf
LinkedIn Post: https://www.linkedin.com/posts/suayp-talha-kocabay_mingru-a-suayptalha-collection-activity-7278755484172439552-wNY1
๐ฐ ๐๐ซ๐๐๐ข๐ญ๐ฌ:
Paper Link: https://arxiv.org/abs/2410.01201
I am thankful to Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio and Hossein Hajimirsadeghi for their papers.
suayptalhaย
posted
an
update
about 1 month ago
Post
2490
๐ Introducing Substitution Cipher Solvers!
As @suayptalha and @Synd209 we are thrilled to share an update!
๐ This project contains a text-to-text model designed to decrypt English and Turkish text encoded using a substitution cipher. In a substitution cipher, each letter in the plaintext is replaced by a corresponding, unique letter to form the ciphertext. The model leverages statistical and linguistic properties of English to make educated guesses about the letter substitutions, aiming to recover the original plaintext message.
These models were fine-tuned on T5-base. The models are for monoalphabetic English and Turkish substitution ciphers, and they output decoded text and the alphabet with an accuracy that has never been achieved before!
Example:
Encoded text: Z hztwgx tstcsf qf z ulooqfe osfuqb tzx uezx awej z ozewsbe vlfwby fsmqisfx.
Decoded text: A family member or a support person may stay with a patient during recovery.
Model Collection Link: Cipher-AI/substitution-cipher-solvers-6731ebd22f0f0d8e0e2e2e00
Organization Link: https://huggingface.co/Cipher-AI
As @suayptalha and @Synd209 we are thrilled to share an update!
๐ This project contains a text-to-text model designed to decrypt English and Turkish text encoded using a substitution cipher. In a substitution cipher, each letter in the plaintext is replaced by a corresponding, unique letter to form the ciphertext. The model leverages statistical and linguistic properties of English to make educated guesses about the letter substitutions, aiming to recover the original plaintext message.
These models were fine-tuned on T5-base. The models are for monoalphabetic English and Turkish substitution ciphers, and they output decoded text and the alphabet with an accuracy that has never been achieved before!
Example:
Encoded text: Z hztwgx tstcsf qf z ulooqfe osfuqb tzx uezx awej z ozewsbe vlfwby fsmqisfx.
Decoded text: A family member or a support person may stay with a patient during recovery.
Model Collection Link: Cipher-AI/substitution-cipher-solvers-6731ebd22f0f0d8e0e2e2e00
Organization Link: https://huggingface.co/Cipher-AI
Post
4690
QwQ can see ๐ฅ
Qwen team released QvQ, a large vision LM with reasoning ๐ฑ
it outperforms proprietary VLMs on several benchmarks, comes with open weights and a demo!
Check them out โฌ๏ธ
Demo Qwen/QVQ-72B-preview
Model Qwen/QVQ-72B-Preview
Read more https://qwenlm.github.io/blog/qvq-72b-preview/
Congratulations @JustinLin610 and team!
Qwen team released QvQ, a large vision LM with reasoning ๐ฑ
it outperforms proprietary VLMs on several benchmarks, comes with open weights and a demo!
Check them out โฌ๏ธ
Demo Qwen/QVQ-72B-preview
Model Qwen/QVQ-72B-Preview
Read more https://qwenlm.github.io/blog/qvq-72b-preview/
Congratulations @JustinLin610 and team!