Reading list - a ron-wolf Collection

ron-wolf 's Collections

Reading list

updated about 9 hours ago

No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published Dec 16, 2024 • 43
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Paper • 2412.14161 • Published Dec 18, 2024 • 51
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments

Paper • 2408.10945 • Published Aug 20, 2024 • 10
PDFTriage: Question Answering over Long, Structured Documents

Paper • 2309.08872 • Published Sep 16, 2023 • 55
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations

Paper • 2412.13171 • Published Dec 17, 2024 • 35
The Matrix Calculus You Need For Deep Learning

Paper • 1802.01528 • Published Feb 5, 2018 • 2
A Modern Self-Referential Weight Matrix That Learns to Modify Itself

Paper • 2202.05780 • Published Feb 11, 2022
Recurrent Memory Transformer

Paper • 2207.06881 • Published Jul 14, 2022 • 1
How many words does ChatGPT know? The answer is ChatWords

Paper • 2309.16777 • Published Sep 28, 2023 • 1
Weaver: Foundation Models for Creative Writing

Paper • 2401.17268 • Published Jan 30, 2024 • 45
Graph of Thoughts: Solving Elaborate Problems with Large Language Models

Paper • 2308.09687 • Published Aug 18, 2023 • 7
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking

Paper • 2306.05426 • Published Jun 8, 2023
Think before you speak: Training Language Models With Pause Tokens

Paper • 2310.02226 • Published Oct 3, 2023 • 3
What do tokens know about their characters and how do they know it?

Paper • 2206.02608 • Published Jun 6, 2022
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 111
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7, 2025 • 46
Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers

Paper • 2504.18412 • Published Apr 25, 2025 • 1
Chain of Draft: Thinking Faster by Writing Less

Paper • 2502.18600 • Published Feb 25, 2025 • 50
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models

Paper • 2506.19697 • Published Jun 24, 2025 • 44
Jasper and Stella: distillation of SOTA embedding models

Paper • 2412.19048 • Published Dec 26, 2024 • 2
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

Paper • 2301.13688 • Published Jan 31, 2023 • 9
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs

Paper • 2411.19146 • Published Nov 28, 2024 • 17
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 109
Robust and Fine-Grained Detection of AI Generated Texts

Paper • 2504.11952 • Published Apr 16, 2025 • 12
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1, 2025 • 79
The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation

Paper • 2507.05578 • Published Jul 8, 2025 • 6
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

Paper • 2507.09089 • Published Jul 12, 2025
Stochastic LLMs do not Understand Language: Towards Symbolic, Explainable and Ontologically Based LLMs

Paper • 2309.05918 • Published Sep 12, 2023
The Debate Over Understanding in AI's Large Language Models

Paper • 2210.13966 • Published Oct 14, 2022
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

Paper • 2210.13382 • Published Oct 24, 2022
Evidence of Meaning in Language Models Trained on Programs

Paper • 2305.11169 • Published May 18, 2023
Locally Typical Sampling

Paper • 2202.00666 • Published Feb 1, 2022 • 4
Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation

Paper • 2408.13586 • Published Aug 24, 2024 • 3
Language Models are Injective and Hence Invertible

Paper • 2510.15511 • Published Oct 17, 2025 • 69
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples

Paper • 2510.07192 • Published Oct 8, 2025 • 5
Not All Bits Are Equal: Scale-Dependent Memory Optimization Strategies for Reasoning Models

Paper • 2510.10964 • Published Oct 13, 2025 • 3
SETOL: A Semi-Empirical Theory of (Deep) Learning

Paper • 2507.17912 • Published Jul 23, 2025 • 1
Attention Is Not What You Need

Paper • 2512.19428 • Published Dec 22, 2025
The Geometry of Reasoning: Flowing Logics in Representation Space

Paper • 2510.09782 • Published Oct 10, 2025 • 7
GrokAlign: Geometric Characterisation and Acceleration of Grokking

Paper • 2506.12284 • Published Jun 14, 2025
Intelligence per Watt: Measuring Intelligence Efficiency of Local AI

Paper • 2511.07885 • Published Nov 11, 2025 • 10
Base Models Beat Aligned Models at Randomness and Creativity

Paper • 2505.00047 • Published Apr 30, 2025 • 1