UygarUsta's picture

6 21

UygarUsta

RivianG

·

UygarUsta

AI & ML interests

Computer Vision

Recent Activity

reacted to hesamation's post with ❤️ 12 days ago

this is big... 50 AI researchers from Bytedance, Alibaba, Tencent, and other labs/universities just published a 300-page paper with surprising lessons about coding models and agents (data, pre and post-training, etc). key highlights: > small LLMs can beat proprietary giants RL (RLVR specifically) gives small open-source models an edge over big models in reasoning. a 14B model trained with RLVR on high-quality verified problems can match the performance of OpenAI's o3. > models have a hard time learning Python. mixing language models during pre-training is good, but Python behaves different from statically typed languages. languages with similar syntax (Java and C#, or JavaScript and TypeScript) creates high positive synergy. mixing Python heavily into the training of statically typed languages can actually hurt because of Python's dynamic typing. > not all languages are equal (coding scaling laws) the amount of data required to specialize a model on a language drastically depends on the language. paper argues like C# and Java are easier to learn (less training data required). languages like Python and Javascript are actually more tricky to learn, ironically (you see AI most used for these languages :) > MoE vs Dense (ability vs stability) MoE models offer higher capacity, but are much more fragile during SFT than dense models. hyperparams in training have a more drastic effect in MoE models, while dense models are more stable. MoE models also require constant learning rate schedules to avoid routing instability. > code models are "insecure" by default (duh) training on public repos makes models learn years of accumulated insecure coding patterns. safety fine-tuning often fails to work much on code. a model might refuse to write a hate speech email but will happily generate a SQL-injection vulnerable function because it "works." read the full paper: https://huggingface.co/papers/2511.18538

liked a model 13 days ago

cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit

upvoted a collection 13 days ago

Qwen AWQ & GPTQ

View all activity

Organizations

None yet

spaces 1

Plate_Ocr

models 9

RivianG/my_lora_bk

Text Generation • Updated 29 days ago • 12

RivianG/Oriented_Barcode_Centernet

Object Detection • Updated Jun 24

RivianG/AceReason-Nemotron-1.1-7B-bnb-4bit

Text Generation • 7B • Updated Jun 24 • 5

RivianG/AceReason-Nemotron-1.1-7B_quant

Text Generation • 7B • Updated Jun 24 • 6

RivianG/dqn-SpaceInvadersNoFrameskip-v4

Reinforcement Learning • Updated May 20 • 6

RivianG/Taxiv3-DRL-HF

Reinforcement Learning • Updated May 20

RivianG/q-FrozenLake-v1-4x4-noSlippery

Reinforcement Learning • Updated May 20

RivianG/ppo-LunarLander-v2

Reinforcement Learning • Updated May 4 • 3

RivianG/my_awesome_qa_model

66.4M • Updated Aug 13, 2024 • 5

datasets 0

None public yet