archit's picture

archit PRO

archit11

·

archit-spec

AI & ML interests

small language models

Recent Activity

liked a dataset 2 days ago

TeichAI/claude-4.5-opus-high-reasoning-250x

upvoted an article 21 days ago

Controlling Language Model Generation with NVIDIA's LogitsProcessorZoo

updated a dataset about 1 month ago

archit11/claude_code_traces_dirty

View all activity

Organizations

upvoted an article 21 days ago

Article

Controlling Language Model Generation with NVIDIA's LogitsProcessorZoo

Dec 23, 2024

•

51

upvoted an article 2 months ago

Article

Exploring Direct Tensor Manipulation in Language Models: A Case Study in Binary-Level Model Enhancement

Nov 7, 2025

•

4

upvoted 2 articles 5 months ago

Article

From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels

Aug 18, 2025

•

88

Article

How to Run a Hugging Face Model in JAX (Part 1)

Jul 20, 2025

•

31

upvoted a paper 6 months ago

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 158

upvoted 3 articles 6 months ago

Article

You could have designed state of the art positional encoding

Nov 25, 2024

•

433

Article

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

Jun 26, 2025

•

48

Article

G2P Shrinks Speech Models

Feb 5, 2025

•

83

upvoted 3 articles 7 months ago

Article

State of open video generation models in Diffusers

+1

Jan 27, 2025

•

66

Article

How Long Prompts Block Other Requests - Optimizing LLM Performance

Jun 12, 2025

•

8

Article

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

Apr 16, 2025

•

59

upvoted 2 papers 7 months ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2, 2025 • 187

upvoted an article 9 months ago

Article

Enabling Long Context Training with Sequence Parallelism in Axolotl

Apr 4, 2025

•

15

upvoted 2 articles 11 months ago

Article

SigLIP 2: A better multilingual vision language encoder

+1

Feb 21, 2025

•

193

Article

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

Aug 4, 2024

•

30

upvoted 3 collections 11 months ago

Scotch & SOTA 🥃 Pt. 7: Human Feedback Datasets 🫣

The elusive “human” feedback • 1 item • Updated Sep 13, 2023 • 1

Scotch & SOTA 🥃 Pt. 6: Dialogue Tuning Datasets 💬

Conversations, turn-based dialog, and things that can be turned into that. • 4 items • Updated Sep 13, 2023 • 1

Scotch & SOTA 🥃 Pt. 5: Instruction Tuning Datasets 👩‍🏫

Question & answer, task completion, general SFT and otherwise finetuney data. • 7 items • Updated Sep 13, 2023 • 1

upvoted an article 12 months ago

Article

How to deploy and fine-tune DeepSeek models on AWS

+1

Jan 30, 2025

•

55