Prithiv Sakthi PRO

prithivMLmods

AI & ML interests

Computer Vision - AI/ML

Recent Activity

updated a model 26 minutes ago
prithivMLmods/Flux-Polaroid-Plus
upvoted a collection 29 minutes ago
Open Source Custom Finetunes
updated a model 33 minutes ago
prithivMLmods/Flux-Product-Ad-Backdrop

Articles

Organizations

prithivMLmods's activity

reacted to elliesleightholm's post with ๐Ÿค— about 2 hours ago
posted an update about 23 hours ago
view post
Post
983
๐Ÿ… Glif App's Remixes feature allows you to slap a logo onto anything, seamlessly integrating the input image (logo) into various contexts. The result is stunning remixes that blend the input logo with generated images (img2img logo mapping) for incredible outcomes.

Check out Any Logo Anywhere remixes on Glif: [Glif Remixes](https://glif.app/glifs/cm3o7dfsd002610z48sz89yih/remixes)

๐ŸŒThe browser extension enables thousands of Glif-based img2img workflows on any image you find online. Experience Glif Remix with WebAI: [Chrome Extension](https://chromewebstore.google.com/detail/glif-remix-the-web-with-a/abfbooehhdjcgmbmcpkcebcmpfnlingo)

.
.
.
๐Ÿค—Have fun with the cool stuff !!
@prithivMLmods
replied to LukeNeumann's post 1 day ago
view reply

Awesome datasets! Looking forward to seeing them in the big leagues soon.๐Ÿ”ฅ๐Ÿš€

reacted to fffiloni's post with ๐Ÿ”ฅ 1 day ago
reacted to albertvillanova's post with โค๏ธ 2 days ago
view post
Post
1018
๐Ÿšจ How green is your model? ๐ŸŒฑ Introducing a new feature in the Comparator tool: Environmental Impact for responsible #LLM research!
๐Ÿ‘‰ open-llm-leaderboard/comparator
Now, you can not only compare models by performance, but also by their environmental footprint!

๐ŸŒ The Comparator calculates COโ‚‚ emissions during evaluation and shows key model characteristics: evaluation score, number of parameters, architecture, precision, type... ๐Ÿ› ๏ธ
Make informed decisions about your model's impact on the planet and join the movement towards greener AI!
reacted to m-ric's post with ๐Ÿ”ฅ 2 days ago
view post
Post
1274
Great feature alert: ๐—ฌ๐—ผ๐˜‚ ๐—ฐ๐—ฎ๐—ป ๐—ป๐—ผ๐˜„ ๐˜‚๐˜€๐—ฒ ๐—ฎ๐—ป๐˜† ๐—ฆ๐—ฝ๐—ฎ๐—ฐ๐—ฒ ๐—ฎ๐˜€ ๐—ฎ ๐˜๐—ผ๐—ผ๐—น ๐—ณ๐—ผ๐—ฟ ๐˜†๐—ผ๐˜‚๐—ฟ ๐˜๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ๐—ฒ๐—ฟ๐˜€.๐—ฎ๐—ด๐—ฒ๐—ป๐˜! ๐Ÿ› ๏ธ๐Ÿ”ฅ๐Ÿ”ฅ

This lets you take the coolest spaces, like FLUX.1-dev, and use them in agentic workflows with a few lines of code! ๐Ÿง‘โ€๐Ÿ’ป

On the video below, I set up my fake vacation pictures where I'm awesome at surfing (I'm really not) ๐Ÿ„

Head to the doc to learn this magic ๐Ÿ‘‰ https://huggingface.co/docs/transformers/main/en/agents_advanced#import-a-space-as-a-tool-
reacted to rwightman's post with ๐Ÿš€ 2 days ago
view post
Post
1520
New MobileNetV4 weights were uploaded a few days ago -- more ImageNet-12k training at 384x384 for the speedy 'Conv Medium' models.

There are 3 weight variants here for those who like to tinker. On my hold-out eval they are ordered as below, not that different, but the Adopt 180 epochs closer to AdamW 250 than to AdamW 180.
* AdamW for 250 epochs - timm/mobilenetv4_conv_medium.e250_r384_in12k
* Adopt for 180 epochs - timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
* AdamW for 180 epochs - timm/mobilenetv4_conv_medium.e180_r384_in12k

This was by request as a user reported impressive results using the 'Conv Large' ImagNet-12k pretrains as object detection backbones. ImageNet-1k fine-tunes are pending, the weights do behave differently with the 180 vs 250 epochs and the Adopt vs AdamW optimizer.

posted an update 3 days ago
view post
Post
2886
The (768 x 1024) mix of MidJourney and Flux's LoRA is nearly identical to the actual visual design. It hasnโ€™t undergone much concept art development for now. In the meantime, try out the impressive visual designs on:

๐ŸฅšMidjourney Flux Mix : prithivMLmods/Midjourney-Flux

๐ŸฅšFlux LoRA Collection: prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
.
.
.
@prithivMLmods ๐Ÿค—
reacted to Xenova's post with ๐Ÿ”ฅ 3 days ago
view post
Post
3531
Have you tried out ๐Ÿค— Transformers.js v3? Here are the new features:
โšก WebGPU support (up to 100x faster than WASM)
๐Ÿ”ข New quantization formats (dtypes)
๐Ÿ› 120 supported architectures in total
๐Ÿ“‚ 25 new example projects and templates
๐Ÿค– Over 1200 pre-converted models
๐ŸŒ Node.js (ESM + CJS), Deno, and Bun compatibility
๐Ÿก A new home on GitHub and NPM

Get started with npm i @huggingface/transformers.

Learn more in our blog post: https://huggingface.co/blog/transformersjs-v3
  • 2 replies
ยท
reacted to davidberenstein1957's post with ๐Ÿค— 3 days ago
view post
Post
1795
For anyone who struggles with NER or information extraction with LLM.

We showed an efficient workflow for token classification including zero-shot suggestions and model fine-tuning with Argilla, GliNER, the NuMind NuExtract LLM and SpanMarker. @argilla

Video: https://youtu.be/JvLpaYgNd84?feature=shared
Notebooks and slides included to try it yourself ๐Ÿ™‚
reacted to reach-vb's post with ๐Ÿค— 3 days ago
view post
Post
3918
What a brilliant week for Open Source AI!

Qwen 2.5 Coder by Alibaba - 0.5B / 1.5B / 3B / 7B / 14B/ 32B (Base + Instruct) Code generation LLMs, with 32B tackling giants like Gemnini 1.5 Pro, Claude Sonnet
Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f

LLM2CLIP from Microsoft - Leverage LLMs to train ultra-powerful CLIP models! Boosts performance over the previous SOTA by ~17%
microsoft/llm2clip-672323a266173cfa40b32d4c

Athene v2 Chat & Agent by NexusFlow - SoTA general LLM fine-tuned from Qwen 2.5 72B excels at Chat + Function Calling/ JSON/ Agents
Nexusflow/athene-v2-6735b85e505981a794fb02cc

Orca Agent Instruct by Microsoft - 1 million instruct pairs covering text editing, creative writing, coding, reading comprehension, etc - permissively licensed
microsoft/orca-agentinstruct-1M-v1

Ultravox by FixieAI - 70B/ 8B model approaching GPT4o level, pick any LLM, train an adapter with Whisper as Audio Encoder
reach-vb/ultravox-audio-language-model-release-67373b602af0a52b2a88ae71

JanusFlow 1.3 by DeepSeek - Next iteration of their Unified MultiModal LLM Janus with RectifiedFlow
deepseek-ai/JanusFlow-1.3B

Common Corpus by Pleais - 2,003,039,184,047 multilingual, commercially permissive and high quality tokens!
PleIAs/common_corpus

I'm sure I missed a lot, can't wait for the next week!

Put down in comments what I missed! ๐Ÿค—
reacted to merve's post with ๐Ÿค— 4 days ago
view post
Post
4635
OmniVision-968M: a new local VLM for edge devices, fast & small but performant
๐Ÿ’จ a new vision language model with 9x less image tokens, super efficient
๐Ÿ“– aligned with DPO for reducing hallucinations
โšก๏ธ Apache 2.0 license ๐Ÿ”ฅ

Demo hf.co/spaces/NexaAIDev/omnivlm-dpo-demo
Model NexaAIDev/omnivision-968M
  • 4 replies
ยท
replied to their post 4 days ago
posted an update 4 days ago
view post
Post
3831
Minimalistic Adapters ๐ŸŽƒ

๐Ÿš€Demo Here:
prithivMLmods/FLUX-LoRA-DLC

๐Ÿš€Model:
{ Quote Tuner } : prithivMLmods/Flux.1-Dev-Quote-LoRA
{ Stamp Art } : prithivMLmods/Flux.1-Dev-Stamp-Art-LoRA
{ Hand Sticky } : prithivMLmods/Flux.1-Dev-Hand-Sticky-LoRA
{ Poster HQ } : prithivMLmods/Flux.1-Dev-Poster-HQ-LoRA
{ Ctoon Min } : prithivMLmods/Flux.1-Dev-Ctoon-LoRA

๐Ÿš€Collection:
{ Flux LoRA Collection} : prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
{ LoRA Space Collection } : prithivMLmods/lora-space-collections-6714b72e0d49e1c97fbd6a32

๐Ÿš€For More Visit
https://huggingface.co/strangerzonehf
.
.
.
๐Ÿค—@prithivMLmods
  • 3 replies
ยท
reacted to MonsterMMORPG's post with ๐Ÿค— 4 days ago
view post
Post
2312
Kohya brought massive improvements to FLUX LoRA (as low as 4 GB GPUs) and DreamBooth / Fine-Tuning (as low as 6 GB GPUs) training - check attached images in full size to see full details

You can download all configs and full instructions

> https://www.patreon.com/posts/112099700 - Fine Tuning post

> https://www.patreon.com/posts/110879657 - LoRA post

Kohya brought massive improvements to FLUX LoRA and DreamBooth / Fine-Tuning (min 6GB GPU) training.

Now as low as 4GB GPUs can train FLUX LoRA with decent quality and 24GB and below GPUs got a huge speed boost when doing Full DreamBooth / Fine-Tuning training

You need minimum 4GB GPU to do a FLUX LoRA training and minimum 6 GB GPU to do FLUX DreamBooth / Full Fine-Tuning training. It is just mind blowing.

You can download all configs and full instructions > https://www.patreon.com/posts/112099700

The above post also has 1-click installers and downloaders for Windows, RunPod and Massed Compute

The model downloader scripts also updated and downloading 30+GB models takes total 1 minute on Massed Compute

You can read the recent updates here : https://github.com/kohya-ss/sd-scripts/tree/sd3?tab=readme-ov-file#recent-updates

This is the Kohya GUI branch : https://github.com/bmaltais/kohya_ss/tree/sd3-flux.1

Key thing to reduce VRAM usage is using block swap

Kohya implemented the logic of OneTrainer to improve block swapping speed significantly and now it is supported for LoRAs as well

Now you can do FP16 training with LoRAs on 24 GB and below GPUs

Now you can train a FLUX LoRA on a 4 GB GPU - key is FP8, block swap and using certain layers training (remember single layer LoRA training)

It took me more than 1 day to test all newer configs, their VRAM demands, their relative step speeds and prepare the configs :)
reacted to pagezyhf's post with ๐Ÿ‘ 5 days ago
view post
Post
1325
Hello Hugging Face Community,

I'd like to share here a bit more about our Deep Learning Containers (DLCs) we built with Google Cloud, to transform the way you build AI with open models on this platform!

With pre-configured, optimized environments for PyTorch Training (GPU) and Inference (CPU/GPU), Text Generation Inference (GPU), and Text Embeddings Inference (CPU/GPU), the Hugging Face DLCs offer:

โšก Optimized performance on Google Cloud's infrastructure, with TGI, TEI, and PyTorch acceleration.
๐Ÿ› ๏ธ Hassle-free environment setup, no more dependency issues.
๐Ÿ”„ Seamless updates to the latest stable versions.
๐Ÿ’ผ Streamlined workflow, reducing dev and maintenance overheads.
๐Ÿ”’ Robust security features of Google Cloud.
โ˜๏ธ Fine-tuned for optimal performance, integrated with GKE and Vertex AI.
๐Ÿ“ฆ Community examples for easy experimentation and implementation.
๐Ÿ”œ TPU support for PyTorch Training/Inference and Text Generation Inference is coming soon!

Find the documentation at https://huggingface.co/docs/google-cloud/en/index
If you need support, open a conversation on the forum: https://discuss.huggingface.co/c/google-cloud/69
reacted to LukeNeumann's post with ๐Ÿ”ฅ 5 days ago
view post
Post
1826
Hello Hugging Face community!

I wanted to introduce myself and my company @Overlaiapp . We are a collective of filmmakers, photographers, and AI engineers working on high resolution (8K+) training data.

We plan to share a lot of our datasets with the community and are kicking things off with two curated datasets:

- Overlaiai/OregonCoastin4K

- Overlaiai/SubArcticPolarBear


Overlai.ai Dataset Features

๐ŸŽฅ Oversampled: Every clip is captured in stunning 8K resolution, delivering rich detail ideal for fine tuning scenic landscapes and ocean dynamics.

๐Ÿ“ธ Variance: Includes close-up details, slow-motion footage of crashing waves, sweeping landscapes, and wildlife shots.

๐Ÿ“‹ Detailed Metadata: Every clip is paired with structured metadata, including creative descriptions, precise camera movements, lens information, field of view calculations, and shot settings, ensuring AI models can fully understand and replicate real-world cinematography with accuracy.

โš™๏ธ Consistency: Re-thinking training data at the point of capture by "overshooting" a subject, enabling models to learn more nuanced relationships and views across scenes.

๐ŸŒ… Light: Shot during early morning and sunset light for optimal color contrast and dynamic range, maximizing visual quality for color and lighting-sensitive tasks.

๐Ÿ” Curation: Curated specifically for machine learning, providing clean, high-quality data for next generation model training.
posted an update 6 days ago
view post
Post
1618
  • 1 reply
ยท
reacted to AdinaY's post with ๐Ÿ”ฅ 6 days ago
view post
Post
2478
Letโ€™s dive into the exciting releases from the Chinese community last week ๐Ÿ”ฅ๐Ÿš€
More details ๐Ÿ‘‰ https://huggingface.co/zh-ai-community

Code model:
โœจQwen 2.5 coder by Alibaba Qwen
Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f
โœจOpenCoder by InflyAI - Fully open code model๐Ÿ™Œ
infly/opencoder-672cec44bbb86c39910fb55e

Image model:
โœจHunyuan3D-1.0 by Tencent
tencent/Hunyuan3D-1

MLLM:
โœจJanusFlow by DeepSeek
deepseek-ai/JanusFlow-1.3B
deepseek-ai/JanusFlow-1.3B
โœจMono-InternVL-2B by OpenGVlab
OpenGVLab/Mono-InternVL-2B

Video model:
โœจCogVideoX 1.5 by ChatGLM
THUDM/CogVideoX1.5-5B-SAT

Audio model:
โœจFish Agent by FishAudio
fishaudio/fish-agent-v0.1-3b

Dataset:
โœจOPI dataset by BAAIBeijing
BAAI/OPI
reacted to merve's post with ๐Ÿค— 7 days ago
view post
Post
1622
Microsoft released LLM2CLIP: a CLIP model with longer context window for complex text inputs ๐Ÿคฏ
All models with Apache 2.0 license here microsoft/llm2clip-672323a266173cfa40b32d4c

TLDR; they replaced CLIP's text encoder with various LLMs fine-tuned on captioning, better top-k accuracy on retrieval.
This will enable better image-text retrieval, better zero-shot image classification, better vision language models ๐Ÿ”ฅ
Read the paper to learn more: LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation (2411.04997)