Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
196.0
TFLOPS
6
21
UygarUsta
RivianG
Follow
yehors-cv's profile picture
Stanislav9801's profile picture
2 followers
·
6 following
UygarUsta
AI & ML interests
Computer Vision
Recent Activity
reacted
to
hesamation
's
post
with ❤️
12 days ago
this is big... 50 AI researchers from Bytedance, Alibaba, Tencent, and other labs/universities just published a 300-page paper with surprising lessons about coding models and agents (data, pre and post-training, etc). key highlights: > small LLMs can beat proprietary giants RL (RLVR specifically) gives small open-source models an edge over big models in reasoning. a 14B model trained with RLVR on high-quality verified problems can match the performance of OpenAI's o3. > models have a hard time learning Python. mixing language models during pre-training is good, but Python behaves different from statically typed languages. languages with similar syntax (Java and C#, or JavaScript and TypeScript) creates high positive synergy. mixing Python heavily into the training of statically typed languages can actually hurt because of Python's dynamic typing. > not all languages are equal (coding scaling laws) the amount of data required to specialize a model on a language drastically depends on the language. paper argues like C# and Java are easier to learn (less training data required). languages like Python and Javascript are actually more tricky to learn, ironically (you see AI most used for these languages :) > MoE vs Dense (ability vs stability) MoE models offer higher capacity, but are much more fragile during SFT than dense models. hyperparams in training have a more drastic effect in MoE models, while dense models are more stable. MoE models also require constant learning rate schedules to avoid routing instability. > code models are "insecure" by default (duh) training on public repos makes models learn years of accumulated insecure coding patterns. safety fine-tuning often fails to work much on code. a model might refuse to write a hate speech email but will happily generate a SQL-injection vulnerable function because it "works." read the full paper: https://huggingface.co/papers/2511.18538
liked
a model
13 days ago
cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit
upvoted
a
collection
13 days ago
Qwen AWQ & GPTQ
View all activity
Organizations
None yet
spaces
1
Runtime error
2
Plate_Ocr
🏢
models
9
Sort: Recently updated
RivianG/my_lora_bk
Text Generation
•
Updated
29 days ago
•
12
RivianG/Oriented_Barcode_Centernet
Object Detection
•
Updated
Jun 24
RivianG/AceReason-Nemotron-1.1-7B-bnb-4bit
Text Generation
•
7B
•
Updated
Jun 24
•
5
RivianG/AceReason-Nemotron-1.1-7B_quant
Text Generation
•
7B
•
Updated
Jun 24
•
6
RivianG/dqn-SpaceInvadersNoFrameskip-v4
Reinforcement Learning
•
Updated
May 20
•
6
RivianG/Taxiv3-DRL-HF
Reinforcement Learning
•
Updated
May 20
RivianG/q-FrozenLake-v1-4x4-noSlippery
Reinforcement Learning
•
Updated
May 20
RivianG/ppo-LunarLander-v2
Reinforcement Learning
•
Updated
May 4
•
3
RivianG/my_awesome_qa_model
66.4M
•
Updated
Aug 13, 2024
•
5
datasets
0
None public yet