Jędrzej Grabala

jgitsolutions

https://jgitsol.github.io

AI & ML interests

Local Drive Human Overseered System of Agents, LLMs, Langchains & other useful stuff on mid-to-low-end of commercial hardware.

Recent Activity

upvoted a paper 8 days ago

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

upvoted a collection 8 days ago

DeepSeek-VL2

liked a Space 14 days ago

depth-anything/Video-Depth-Anything

View all activity

Organizations

jgitsolutions's activity

upvoted a paper 8 days ago

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Paper • 2412.10302 • Published Dec 13, 2024 • 17

upvoted a collection 8 days ago

DeepSeek-VL2

Collection

5 items • Updated 10 days ago • 69

liked a Space 14 days ago

116

Video Depth Anything

👀

Generate depth video from input video

reacted to chansung's post with 👍 14 days ago

Post

2846

Simple Paper Review #5

I briefly reviewed the paper "SFT Memorizes, RL Generalizes," which compares SFT and RL in post-training of LLM/VLM from HKU, UC Berkeley, Google DeepMind, and New York University

The conclusion suggests SFT excels in memorization, while RL is better for generalization. However, since LLM/VLM should benefit humans beyond just generalization, a mix of SFT and RL is advisable. Typically, some SFT is followed by RL to understand prompt formats and enhance generalization through trial and error.

The study focused on one model, Llama-3.2-Vision-11B, using environments like General Points for arithmetic reasoning and V-IRL for spatial reasoning. Training data was used for both SFT and RL, with evaluations on in-distribution and out-of-distribution data to assess memorization and generalization.

I want to apply RL extensively, but it requires building a similar simulation environment. For domain-specific models, significant investment in creating a "playground" for the model is crucial, as the effort will directly influence the outcomes.

https://arxiv.org/abs/2501.17161

liked a Space 15 days ago

169

OCR

🍍

perfect ocr vlm

liked a Space 17 days ago

1.67k

Hunyuan3D-2.0

🌍

Text-to-3D and Image-to-3D Generation

upvoted an article 17 days ago

Article

Welcome to Inference Providers on the Hub 🔥

22 days ago

• 377

liked a model 22 days ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 10 days ago • 4.13M • • 9.46k

liked a model 2 months ago

microsoft/Phi-3.5-MoE-instruct

Text Generation • Updated Oct 24, 2024 • 35k • • 555

liked a Space 2 months ago

748

LTX-Video-Playground

🚀

Generate a video from text or an image

liked 5 Spaces 3 months ago

tankwar

🏆

"One-minute creation by AI Coding Autonomous Agent MOUSE-I"

112

MOUSE-Visual AI Chatbot

🔥

Text2Visual Web Converter with AI Image Generation

1.97k

Anychat

🏢

Browse and use coding demos from various providers

7.46k

Kolors Virtual Try-On

👕

Try on garments on virtual models

351

Logo In Context

🤗

Add a logo to anything

reacted to anakin87's post with 👀 3 months ago

Post

1107

Ok, you're finally convinced that synthetic data works... ⚗️

𝐍𝐨𝐰 𝐲𝐨𝐮 𝐰𝐚𝐧𝐭 𝐭𝐨 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐞 𝐚𝐧 𝐢𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 𝐝𝐚𝐭𝐚𝐬𝐞𝐭 𝐟𝐨𝐫 𝐟𝐢𝐧𝐞-𝐭𝐮𝐧𝐢𝐧𝐠 𝐢𝐧 𝐚 𝐥𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐨𝐭𝐡𝐞𝐫 𝐭𝐡𝐚𝐧 𝐄𝐧𝐠𝐥𝐢𝐬𝐡.
But how do you get started?

I explore how to do this with Magpie in my new article
https://huggingface.co/blog/anakin87/multilingual-magpie

---

🐦‍⬛ 𝐖𝐡𝐚𝐭 𝐢𝐬 𝐌𝐚𝐠𝐩𝐢𝐞?

It's a recent technique for creating synthetic instruction datasets.

Magpie is based on a simple but ingenious idea 👇
if you prompt an instruction-tuned model with a pre-query template, you can make it generate a plausible user query/instruction

Here's an example:
model: Llama-3-8B-Instruct
pre-query template: "<|begin_of_text|><|start_header_id|>user<|end_header_id|>"
generated user instruction: "What are some of the responsibilities of a commercial pilot?"

You can then feed this instruction back into the same model to get the assistant response.

By repeating this process, it's possible to generate large synthetic datasets with relatively little effort.

🪄 The authors demonstrate that using these datasets for Supervised Fine Tuning (SFT) can yield strong performance, even competitive with the original instruct model.

🧗𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐧𝐨𝐧-𝐄𝐧𝐠𝐥𝐢𝐬𝐡 𝐝𝐚𝐭𝐚

Most Language Models are primarily trained on English texts, so they tend to produce data in English.

How can we overcome this?

Earlier approaches were complex or costly.

Then @mrm8488 found a simple solution: add the target language to the pre-query template.
For Spanish, the template becomes "<|begin_of_text|><|start_header_id|>user<|end_header_id|>spanish:".

This method works for Spanish and German!

❌ Unfortunately, it does not work well for other languages (🇮🇹, 🇳🇱, ...)

👇

1 reply

reacted to yongchanghao's post with 👀🔥 3 months ago

Post

3765

We just released a paper (NeuZip) that compresses VRAM in a lossless manner to run larger models. This should be particularly useful when VRAM is insufficient during training/inference. Specifically, we look inside each floating number and find that the exponents are highly compressible (as shown in the figure below).

Read more about the work at NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks (2410.20650)

reacted to PLB's post with 🚀 3 months ago

Post

1882

⚠️ People selling AI chatbots for websites hate us.
Add an open source chat assistant on your website in 5 minutes: https://github.com/phospho-app/ai-chat-bubble

How does it work ?
- You give an URL
- The AI assistant crawls the website content and embed it
- Add it to your frontend in one line of code
- People on your website can ask the assistant questions

Powered by BAAI/bge-small-en-v1.5 and Mistral AI

5 replies

reacted to LukeNeumann's post with 👍 3 months ago

Post

1864

Hello Hugging Face community!

I wanted to introduce myself and my company @Overlaiapp . We are a collective of filmmakers, photographers, and AI engineers working on high resolution (8K+) training data.

We plan to share a lot of our datasets with the community and are kicking things off with two curated datasets:

- Overlaiai/OregonCoastin4K

- Overlaiai/SubArcticPolarBear

Overlai.ai Dataset Features

🎥 Oversampled: Every clip is captured in stunning 8K resolution, delivering rich detail ideal for fine tuning scenic landscapes and ocean dynamics.

📸 Variance: Includes close-up details, slow-motion footage of crashing waves, sweeping landscapes, and wildlife shots.

📋 Detailed Metadata: Every clip is paired with structured metadata, including creative descriptions, precise camera movements, lens information, field of view calculations, and shot settings, ensuring AI models can fully understand and replicate real-world cinematography with accuracy.

⚙️ Consistency: Re-thinking training data at the point of capture by "overshooting" a subject, enabling models to learn more nuanced relationships and views across scenes.

🌅 Light: Shot during early morning and sunset light for optimal color contrast and dynamic range, maximizing visual quality for color and lighting-sensitive tasks.

🔍 Curation: Curated specifically for machine learning, providing clean, high-quality data for next generation model training.