zero-gpu-explorers (ZeroGPU Explorers)

DongfuJiang

authored a paper 1 day ago

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published 3 days ago • 51

codelion

posted an update 3 days ago

Post

275

Over 40 percent of AI-generated code contains security vulnerabilities. We recently worked on a LoRA to write secure code by default using automated Semgrep analysis and GRPO, achieving 97 percent reduction in vulnerabilities without requiring security-specific prompts.

Technical Approach:
Automated security training pipeline combining Semgrep vulnerability detection with preference learning. Generate multiple solutions with varying security awareness, automatically analyze for vulnerabilities, create preference pairs based on security scores, train using GRPO with multi-factor scoring.

Scoring System (100 points total):
- Functionality: 40 points - Does the code work correctly
- Security patterns: 40 points - Uses secure coding practices
- Low vulnerabilities: 20 points - Semgrep score below threshold

This balanced scoring prevents reward hacking where models generate empty functions to avoid vulnerabilities.

Real Transformation Examples:

Database query before:
query = f"SELECT * FROM products WHERE name = '{name}'"

Database query after:
query = "SELECT * FROM products WHERE name = ?"
db.execute(query, (name,))

Password hashing before:
password_hash = hashlib.md5(password).hexdigest()

Password hashing after:
salt = bcrypt.gensalt(rounds=12)
password_hash = bcrypt.hashpw(password.encode('utf-8'), salt)

Model: codelion/Qwen2.5-Coder-0.5B-Instruct-security-grpo-lora
Notebook: https://github.com/codelion/ellora/blob/main/Ellora_Recipe_5_Secure_Code_Generation_LoRA.ipynb
Repository: https://github.com/codelion/ellora

codelion

posted an update 5 days ago

Post

6016

I recently worked on a LoRA that improves tool use in LLM. Thought the approach might interest folks here.

The issue I have had when trying to use some of the local LLMs with coding agents is this:

Me: "Find all API endpoints with authentication in this codebase"
LLM: "You should look for @app .route decorators and check if they have auth middleware..."

But I often want it to search the files and show me but the LLM doesn't trigger a tool use call.

To fine-tune it for tool use I combined two data sources:

1. Magpie scenarios - 5000+ diverse tasks (bug hunting, refactoring, security audits)
2. Real execution - Ran these on actual repos (FastAPI, Django, React) to get authentic tool responses

This ensures the model learns both breadth (many scenarios) and depth (real tool behavior).

Tools We Taught:
- read_file - Actually read file contents
- search_files - Regex/pattern search across codebases
- find_definition - Locate classes/functions
- analyze_imports - Dependency tracking
- list_directory - Explore structure
- run_tests - Execute test suites

Improvements:
- Tool calling accuracy: 12% → 80%
- Correct parameters: 8% → 87%
- Multi-step tasks: 3% → 78%
- End-to-end completion: 5% → 80%
- Tools per task: 0.2 → 3.8

The LoRA really improves on intential tool call as an example consider the query: "Find ValueError in payment module"

The response proceeds as follows:

1. Calls search_files with pattern "ValueError"
2. Gets 4 matches across 3 files
3. Calls read_file on each match
4. Analyzes context
5. Reports: "Found 3 ValueError instances: payment/processor.py:47 for invalid amount, payment/validator.py:23 for unsupported currency..."

Resources:
- Colab notebook https://colab.research.google.com/github/codelion/ellora/blob/main/Ellora_Recipe_3_Enhanced_Tool_Calling_and_Code_Understanding.ipynb
- Model - codelion/Llama-3.2-1B-Instruct-tool-calling-lora
- GitHub - https://github.com/codelion/ellora

codelion

posted an update 7 days ago

Post

5100

I wanted to share a technique that's been working really well for recovering performance after INT4 quantization.

Typically, quantizing the LLM to INT4 (unlike say INT8) for inference can incur some accuracy loss. Instead of accepting the quality loss, we used the FP16 model as a teacher to train a tiny LoRA adapter (rank=16) for the quantized model. The cool part: the model generates its own training data using the Magpie technique so no external datasets needed. This is critical because we want to remain as much as possible in the distribution of the model's natural responses.

Last year Apple's foundational models paper (https://arxiv.org/pdf/2407.21075) had proposed a similar technique and found "By using accuracy-recovery LoRA adapters with only rank 16, Alpaca win rate can be improved by 7-18%, GMS8K accuracy is boosted by 5-10%." (page 47).

We saw similar results on Qwen3-0.6B:

Perplexity: 2.40 → 2.09 (only 5.7% degradation from FP16 baseline)
Memory: Only 0.28GB vs 1.0GB for FP16 (75% reduction)
Speed: 3.0x faster inference than FP16
Quality: Generates correct, optimized code solutions

- Pre-trained adapter: codelion/Qwen3-0.6B-accuracy-recovery-lora
- GitHub repo: https://github.com/codelion/ellora

Happy to answer questions about the implementation or help anyone trying to replicate this. The key insight is that quantization errors are systematic and learnable - a small adapter can bridge the gap without negating the benefits of quantization.

Has anyone else experimented with self-distillation for quantization recovery? Would love to hear about different approaches!

sergiopaniego

posted an update 8 days ago

Post

381

It's now posible to do end-2-end ML without leaving the @huggingface Hub, by combining TRL + HF jobs + Trackio!!

🐡We just released a full guide explaining the process.

Go check it out!

📖 Guide: https://huggingface.co/docs/trl/main/en/jobs_training

💡 Reminder: HF Jobs is only available for Pro, Team, or Enterprise plans. Yet another reason to upgrade

codelion

posted an update 9 days ago

Post

4919

I recently added a recipe in ellora to improve reasoning capabilities to Gemma-3-1B using self-supervised learning. Model now shows step-by-step thinking in <think> tags before answering.

Logic puzzle accuracy: 61% → 84%. 3 hours training on single GPU. 🧠

Used GRPO where model generates multiple responses and learns to prefer better reasoning. Works surprisingly well for making smaller models more transparent.

🔗 Colab: https://colab.research.google.com/github/codelion/ellora/blob/main/Ellora_Recipe_2_Reasoning_LoRA_with_Self-Rewarding_GRPO.ipynb

🤗 Model: codelion/gemma-3-1b-it-reasoning-grpo-lora

💻 Code: https://github.com/codelion/ellora

1 reply

·

mrfakename

in zero-gpu-explorers/README 14 days ago

Limits on PRO account

46

#88 opened about 1 year ago by

moonslink

EvanTHU

authored 2 papers 15 days ago

Motion2Motion: Cross-topology Motion Transfer with Sparse Correspondence

Paper • 2508.13139 • Published 17 days ago • 3

Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

Paper • 2508.09131 • Published 23 days ago • 16

appvoid

posted an update 19 days ago

Post

3503

suppose someone is working on a reasoning model, which ends up unlocking achievements that lead to agi, should it be open source?

keep in mind everybody will have access to it: scientists, governments, terrorists, average people, etc...

11 replies

·

mrfakename

in zero-gpu-explorers/README 20 days ago

use authentication in huggingface Gradio API!!!(hosting on ZeroGPU)

❤️ 👀 5

49

#129 opened 10 months ago by

Nerva1228

fdaudens

posted an update 21 days ago

Post

5815

Want to learn to build an AI Agent? I put together a cookbook for creating your own news research agent with OpenAI GPT-OSS:

- Searches headlines & specific sites
- Pulls full articles when you need depth
- Summarizes with clickable sources
- Runs in a simple Gradio chat UI
- No GPU, no local setup — just open-weight GPT-OSS models via Hugging Face

If you’ve been wanting to try agents but weren’t sure where to start, this is an end-to-end example you can fork, run, and adapt.

Full guide + code https://huggingface.co/blog/fdaudens/openai-gpt-oss-agent-inference-providers

2 replies

·

sergiopaniego

posted an update 23 days ago

Post

2876

So you can now SFT a model with hf jobs + TRL in ONE command lol 🏎️💨

Without worrying about infrastructure since it runs entirely on HF!

docs: https://huggingface.co/docs/huggingface_hub/main/en/guides/jobs
blog: https://huggingface.co/blog/hf-cli

fdaudens

posted an update 23 days ago

Post

506

What can OpenAI’s new open models do with the news? I built a News Agent to find out.

It can answer questions about the news in real time, and every answer comes with original source links so you can dive deeper.

Ask it things like:
- "What are the top news stories today?"
- "What's the latest on artificial intelligence?"
- Follow-up questions on specific stories

Runs with Hugging Face inference providers, letting you compare results from the OpenAI 20B and 120B models

So far, I’m quite impressed by the capabilities of even the smaller 20B model. Definitely not a perfect project, but curious to hear your thoughts!

fdaudens/gpt-oss-news-agent

2 replies

·

sergiopaniego

posted an update 24 days ago

Post

396

New Zero-Shot Object Detectors in transformers! 🥽

We’ve added LLMDet and MM GroundingDINO, plus a demo Space to compare them with others 🖼️

Play with it: ariG23498/zero-shot-od

fdaudens

posted an update 24 days ago

Post

3369

OpenAI’s GPT-OSS has sparked ~400 new models on Hugging Face and racked up 5M downloads in less than a week, already outpacing DeepSeek R1’s first-week numbers.

For comparison: when R1 launched, I tracked 550 derivatives (across 8 base models) in a week, with ~3M downloads. GPT-OSS is ahead on adoption and engagement.

It’s also the most-liked release of any major LLM this summer. The 20B and 120B versions quickly shot past Kimi K2, GLM 4.5, and others in likes.

Most-downloaded GPT-OSS models include LM Studio and Unsloth AI versions:
1️⃣ openai/gpt-oss-20b - 2.0M
2️⃣ lmstudio-community/gpt-oss-20b-MLX-8bit - 750K
3️⃣ openai/gpt-oss-120b - 430K
4️⃣ unsloth/gpt-oss-20b-GGUF - 380K
5️⃣ lmstudio-community/gpt-oss-20b-GGUF - 330K

The 20B version is clearly finding its audience, showing the power of smaller, faster, more memory- and energy-efficient models. (These numbers don’t include calls to the models via inference providers, so the real usage is likely even bigger, especially for the 120B version)

Open-weight models let anyone build on top. Empower the builders, and innovation takes off. 🚀

1 reply

·

sergiopaniego

posted an update 24 days ago

Post

361

Missed last week's OpenAI GPT OSS release?

Here are 2 quick-start recipes we developed to get you up to speed:

🏃‍♀️ How to run gpt-oss-20b on Google Colab
https://cookbook.openai.com/articles/gpt-oss/run-colab

🧑‍🔧 Fine-tuning with gpt-oss and Hugging Face Transformers
https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers

asigalov61

posted an update 25 days ago

Post

4619

🔥Check out new SOTA Orpheus Auto-Continuations Generator🔥

asigalov61/Orpheus-Music-Transformer

Now you can generate good music with Orpheus without supervision!!!

@Timzoid @John6666 @alvanalrakib

codelion

posted an update 26 days ago

Post

4685

Released 17 production-ready adaptive text classifiers that learn from just 100 examples per class and continuously improve without retraining.

These models achieve 93% average accuracy across enterprise use cases like email routing, fraud detection, document classification, and support ticket categorization. Built on ModernBERT with prototype memory and elastic weight consolidation.

Key benefits: 90% cost reduction vs API solutions, 90-120ms local inference, dynamic class addition, and zero vendor lock-in.

All models available under adaptive-classifier organization. Install with pip install adaptive-classifier.

Full technical details: https://huggingface.co/blog/codelion/enterprise-ready-classifiers
Code: https://github.com/codelion/adaptive-classifier

2 replies

·

sergiopaniego

posted an update 28 days ago

Post

448

Latest TRL release brings major upgrades for multimodal alignment!

We dive into 3 new techniques to improve VLM post-training in our new blog:

🌋 GRPO
🎞️ GSPO
🐙 MPO
➕ vLLM integration for online training w/ transformers backend\

🐡 Blog: https://huggingface.co/blog/trl-vlm-alignment

ZeroGPU Explorers

AI & ML interests

Recent Activity

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Limits on PRO account

Motion2Motion: Cross-topology Motion Transfer with Sparse Correspondence

Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

use authentication in huggingface Gradio API!!!(hosting on ZeroGPU)

AI & ML interests

Recent Activity

Team members 752

zero-gpu-explorers's activity

Limits on PRO account

use authentication in huggingface Gradio API!!!(hosting on ZeroGPU)