Lain

not-lain

AI & ML interests

custom AI models with HF integration, multimodal rag and open-source contributions && may or may not be a huggingface fellow

Recent Activity

updated a Space 3 minutes ago
not-lain/RMBG1.4-with-imageslider
liked a model 10 minutes ago
boltz-community/boltz-1
liked a Space 10 minutes ago
simonduerr/boltz-1

Articles

Organizations

not-lain's activity

replied to cfahlgren1's post 2 days ago
reacted to davanstrien's post with โค๏ธ 2 days ago
reacted to sayakpaul's post with ๐Ÿš€โค๏ธ 2 days ago
view post
Post
2088
It's been a while we shipped native quantization support in diffusers ๐Ÿงจ

We currently support bistandbytes as the official backend but using others like torchao is already very simple.

This post is just a reminder of what's possible:

1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4. enable_model_cpu_offload()
5. Training and loading LoRAs into quantized checkpoints

Docs:
https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes
  • 1 reply
ยท
reacted to merve's post with ๐Ÿ‘€๐Ÿ”ฅ 4 days ago
view post
Post
4654
OmniVision-968M: a new local VLM for edge devices, fast & small but performant
๐Ÿ’จ a new vision language model with 9x less image tokens, super efficient
๐Ÿ“– aligned with DPO for reducing hallucinations
โšก๏ธ Apache 2.0 license ๐Ÿ”ฅ

Demo hf.co/spaces/NexaAIDev/omnivlm-dpo-demo
Model NexaAIDev/omnivision-968M
  • 4 replies
ยท
reacted to BlinkDL's post with ๐Ÿ”ฅ 7 days ago
view post
Post
2679
RWKV-6-world-v3 (+3.1T tokens) is our best multilingual 7B model as of now: BlinkDL/rwkv-6-world

It's 100% RNN and attention-free. MMLU 54.2% (previous world-v2.1 = 47.9%. note: without eval-boosting tricks such as annealing).

RWKV-7-world-v4 soon :)
reacted to m-ric's post with โค๏ธโค๏ธ๐Ÿ”ฅ 7 days ago
view post
Post
3629
๐—ง๐—ต๐—ฒ ๐—ป๐—ฒ๐˜…๐˜ ๐—ฏ๐—ถ๐—ด ๐˜€๐—ผ๐—ฐ๐—ถ๐—ฎ๐—น ๐—ป๐—ฒ๐˜๐˜„๐—ผ๐—ฟ๐—ธ ๐—ถ๐˜€ ๐—ป๐—ผ๐˜ ๐Ÿฆ‹, ๐—ถ๐˜'๐˜€ ๐—›๐˜‚๐—ฏ ๐—ฃ๐—ผ๐˜€๐˜๐˜€! [INSERT STONKS MEME WITH LASER EYES]

See below: I got 105k impressions since regularly posting Hub Posts, coming close to my 275k on Twitter!

โš™๏ธ Computed with the great dataset maxiw/hf-posts
โš™๏ธ Thanks to Qwen2.5-Coder-32B for showing me how to access dict attributes in a SQL request!

cc @merve who's far in front of me
ยท
posted an update 7 days ago
view post
Post
1262
ever wondered how you can make an API call to a visual-question-answering model without sending an image url ๐Ÿ‘€

you can do that by converting your local image to base64 and sending it to the API.

recently I made some changes to my library "loadimg" that allows you to make converting images to base64 a breeze.
๐Ÿ”— https://github.com/not-lain/loadimg

API request example ๐Ÿ› ๏ธ:
from loadimg import load_img
from huggingface_hub import InferenceClient

# or load a local image
my_b64_img = load_img(imgPath_url_pillow_or_numpy ,output_type="base64" ) 

client = InferenceClient(api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")

messages = [
	{
		"role": "user",
		"content": [
			{
				"type": "text",
				"text": "Describe this image in one sentence."
			},
			{
				"type": "image_url",
				"image_url": {
					"url": my_b64_img # base64 allows using images without uploading them to the web
				}
			}
		]
	}
]

stream = client.chat.completions.create(
    model="meta-llama/Llama-3.2-11B-Vision-Instruct", 
	messages=messages, 
	max_tokens=500,
	stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")
reacted to louisbrulenaudet's post with ๐Ÿค— 7 days ago
view post
Post
1558
Iโ€™ve published a new dataset to simplify model merging ๐Ÿค—

This dataset facilitates the search for compatible architectures for model merging with @arcee_aiโ€™s mergekit, streamlining the automation of high-performance merge searches ๐Ÿ“–

Dataset : louisbrulenaudet/mergekit-configs
reacted to DavidGF's post with ๐Ÿ”ฅ 16 days ago
view post
Post
2963
๐ŸŽ‰ Celebrating One Year of #SauerkrautLM with Two Groundbreaking Releases!

We're thrilled to announce the release of SauerkrautLM-v2-14b in two specialized versions: VAGOsolutions/SauerkrautLM-v2-14b-SFT and VAGOsolutions/SauerkrautLM-v2-14b-DPO. Built on the robust Qwen2.5-14B foundation, these models represent a significant leap forward in multilingual AI capabilities.

๐Ÿ”ฌ Technical Breakthroughs:
๐Ÿ’  Innovative three-phase Fine-Tuning approach
๐Ÿ’  Two-step Spectrum SFT + one-step Spectrum DPO optimization phase for enhanced performance
๐Ÿ’  Balance of German and English language capabilities
๐Ÿ’  Advanced function calling - almost on par with Claude-3.5-Sonnet-20240620

๐Ÿ‡ฉ๐Ÿ‡ช German Language Excellence:
What sets this release apart is our unique achievement in simultaneously improving both German and English capabilities. Through our specialized training approach with over 1.2B tokens across two phases, we've managed to:
๐Ÿ’  Enhance German language understanding and generation (SFT Version > DPO Version)
๐Ÿ’  Maintain authentic German linguistic nuances
๐Ÿ’  Improve cross-lingual capabilities
๐Ÿ’  Preserve cultural context awareness

๐Ÿ“Š Training Innovation:
Our three-phase approach targeted specific layer percentages (15%, 20% and 25%) with carefully curated datasets, including:
๐Ÿ’  Mathematics-focused content (proprietary classifier-selected)
๐Ÿ’  High-quality German training data
๐Ÿ’  Specialized function calling datasets
๐Ÿ’  Premium multilingual content

๐ŸŽ Community Contribution:
We're also releasing two new datasets in a few days:
1๏ธโƒฃ SauerkrautLM-Fermented-GER-DPO: 3,300 high-quality German training samples
2๏ธโƒฃ SauerkrautLM-Fermented-Irrelevance-GER-DPO: 2,000 specialized samples for optimized function call irrelevance handling

Thank you to our incredible community and partners who have supported us throughout this journey. Here's to another year of AI innovation!ย ๐Ÿš€
reacted to reach-vb's post with ๐Ÿ”ฅ๐Ÿš€ 17 days ago
view post
Post
2956
Smol models ftw! AMD released AMD OLMo 1B - beats OpenELM, tiny llama on MT Bench, Alpaca Eval - Apache 2.0 licensed ๐Ÿ”ฅ

> Trained with 1.3 trillion (dolma 1.7) tokens on 16 nodes, each with 4 MI250 GPUs

> Three checkpoints:

- AMD OLMo 1B: Pre-trained model
- AMD OLMo 1B SFT: Supervised fine-tuned on Tulu V2, OpenHermes-2.5, WebInstructSub, and Code-Feedback datasets
- AMD OLMo 1B SFT DPO: Aligned with human preferences using Direct Preference Optimization (DPO) on UltraFeedback dataset

Key Insights:
> Pre-trained with less than half the tokens of OLMo-1B
> Post-training steps include two-phase SFT and DPO alignment
> Data for SFT:
- Phase 1: Tulu V2
- Phase 2: OpenHermes-2.5, WebInstructSub, and Code-Feedback

> Model checkpoints on the Hub & Integrated with Transformers โšก๏ธ

Congratulations & kudos to AMD on a brilliant smol model release! ๐Ÿค—

amd/amd-olmo-6723e7d04a49116d8ec95070
reacted to nroggendorff's post with ๐Ÿคฏ 18 days ago
view post
Post
1813
Did you guys know that if you try to link a prepaid card to huggingface it won't work, but then if you press the button again it links anyway? Then you can lock the card (deny any charges), and get resources for free? You're welcome :P
ยท
reacted to merve's post with โค๏ธ๐Ÿ”ฅ 20 days ago
view post
Post
5345
Another great week in open ML!
Here's a small recap ๐Ÿซฐ๐Ÿป

Model releases
โฏ๏ธ Video Language Models
AI at Meta released Vision-CAIR/LongVU_Qwen2_7B, a new state-of-the-art long video LM model based on DINOv2, SigLIP, Qwen2 and Llama 3.2

๐Ÿ’ฌ Small language models
Hugging Face released HuggingFaceTB/SmolLM2-1.7B, a family of new smol language models with Apache 2.0 license that come in sizes 135M, 360M and 1.7B, along with datasets.
Meta released facebook/MobileLLM-1B, a new family of on-device LLMs of sizes 125M, 350M and 600M

๐Ÿ–ผ๏ธ Image Generation
Stability AI released stabilityai/stable-diffusion-3.5-medium, a 2B model with commercially permissive license

๐Ÿ–ผ๏ธ๐Ÿ’ฌAny-to-Any
gpt-omni/mini-omni2 is closest reproduction to GPT-4o, a new LLM that can take image-text-audio input and output speech is released!

Dataset releases
๐Ÿ–ผ๏ธ Spawning/PD12M, a new captioning dataset of 12.4 million examples generated using Florence-2
reacted to vikhyatk's post with ๐Ÿ”ฅ 24 days ago
view post
Post
4313
Pushed a new update to vikhyatk/moondream2 today. TextVQA up from 60.2 to 65.2, DocVQA up from 61.9 to 70.5.

Space has been updated to the new model if you want to try it out! vikhyatk/moondream2
reacted to vikhyatk's post with ๐Ÿ”ฅ 24 days ago
view post
Post
1476
Just released a dataset with 7000+ hours of synthetically generated lo-fi music. vikhyatk/lofi