Text Generation
Transformers
Safetensors
English
qwen3
reinforcement-learning
teacher-student
adaptive-learning
pedagogy
rlhf
rlaif
conversational
Eval Results (legacy)
text-generation-inference
Instructions to use Arc-Intelligence/ATLAS-8B-Thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Arc-Intelligence/ATLAS-8B-Thinking with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Arc-Intelligence/ATLAS-8B-Thinking") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/ATLAS-8B-Thinking") model = AutoModelForCausalLM.from_pretrained("Arc-Intelligence/ATLAS-8B-Thinking") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Arc-Intelligence/ATLAS-8B-Thinking with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Arc-Intelligence/ATLAS-8B-Thinking" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Arc-Intelligence/ATLAS-8B-Thinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Arc-Intelligence/ATLAS-8B-Thinking
- SGLang
How to use Arc-Intelligence/ATLAS-8B-Thinking with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Arc-Intelligence/ATLAS-8B-Thinking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Arc-Intelligence/ATLAS-8B-Thinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Arc-Intelligence/ATLAS-8B-Thinking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Arc-Intelligence/ATLAS-8B-Thinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Arc-Intelligence/ATLAS-8B-Thinking with Docker Model Runner:
docker model run hf.co/Arc-Intelligence/ATLAS-8B-Thinking
| license: apache-2.0 | |
| language: | |
| - en | |
| tags: | |
| - reinforcement-learning | |
| - teacher-student | |
| - adaptive-learning | |
| - pedagogy | |
| - rlhf | |
| - rlaif | |
| base_model: Qwen/Qwen3-8B | |
| model_type: qwen3 | |
| datasets: | |
| - Arc-Intelligence/Arc-ATLAS-Teach-v0 | |
| model-index: | |
| - name: ATLAS-8B-Thinking | |
| results: | |
| - task: | |
| type: text-generation | |
| name: Reinforcement Learning Teaching | |
| dataset: | |
| name: Arc-Intelligence/Arc-ATLAS-Teach-v0 | |
| type: Arc-Intelligence/Arc-ATLAS-Teach-v0 | |
| metrics: | |
| - name: Non-Degradation Rate | |
| value: 97% | |
| type: non_degradation_rate | |
| - name: Average Accuracy Improvement | |
| value: +15.7% | |
| type: average_accuracy_improvement | |
| - name: Task Completion Rate Improvement | |
| value: +31.2% | |
| type: task_completion_rate_improvement | |
| - name: Response Token Reduction | |
| value: '-37.2%' | |
| type: response_token_reduction | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| # ATLAS-8B-Thinking | |
|  | |
| **ATLAS-8B-Thinking** is a specialized teacher model developed by Arc Intelligence, designed to solve the core reliability problem in reinforcement learning for LLMs. Standard RL fine-tuning is often brittle, leading to performance degradation where new skills are learned at the expense of old ones. | |
| This model reframes the training process as one of **effective pedagogy**. Instead of just optimizing a student model, `ATLAS-8B-Thinking` first uses a lightweight **diagnostic probe** to assess the student's reasoning. Based on this diagnosis, it provides **adaptive guidance**—comprehensive help for struggling models and minimal intervention for capable ones. This "do no harm" approach ensures consistent capability improvement without the usual side effects of RL. | |
| This model is a core component of the open-source [ATLAS Framework](https://github.com/Arc-Computer/ATLAS) and is designed to train and improve other language models. | |
| ## Model Performance | |
|  | |
| The ATLAS framework, using this teacher model, produces the following improvements in a student model (Qwen3-4B) compared to the student baseline. The results highlight a rare combination of increased performance, higher efficiency, and fundamental reliability. | |
| | Metric | Improvement | Notes | | |
| | ---------------------- | ----------- | ---------------------------------------------------------- | | |
| | **Non-Degradation Rate** | **97%** | Core metric showing reliability and avoidance of skill loss. | | |
| | Average Accuracy | +15.7% | Across the Arc-ATLAS-Teach-v0 evaluation set. | | |
| | Task Completion Rate | +31.2% | Student model completes tasks it previously failed. | | |
| | Response Tokens | -37.2% | More efficient and concise reasoning. | | |
| ## How to Use | |
| `ATLAS-8B-Thinking` is not a standard instruction-tuned model for direct chat. It is a core component of the ATLAS training framework, designed to interact with a "student" model in a two-pass process. | |
| ### Loading the Model | |
| **Important:** This model requires `trust_remote_code=True` due to custom Qwen3 architecture components. | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| # Load the teacher model | |
| teacher_model = AutoModelForCausalLM.from_pretrained( | |
| "Arc-Intelligence/ATLAS-8B-Thinking", | |
| trust_remote_code=True, # Required for custom architecture | |
| torch_dtype=torch.bfloat16 # Recommended for efficiency | |
| ) | |
| teacher_tokenizer = AutoTokenizer.from_pretrained( | |
| "Arc-Intelligence/ATLAS-8B-Thinking", | |
| trust_remote_code=True | |
| ) | |
| ``` | |
| ### Conceptual Usage | |
| The following is a simplified, conceptual example of the ATLAS interaction loop. The full implementation is available in the official repository. | |
| ```python | |
| # A conceptual example of the ATLAS interaction loop | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| # Load the teacher and a student model | |
| teacher_model = AutoModelForCausalLM.from_pretrained("Arc-Intelligence/ATLAS-8B-Thinking") | |
| student_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B") # The model to be improved | |
| problem = "A farmer has 52 trees planted in a row over a length of 1850 meters. What is the distance between each tree?" | |
| # 1. Teacher creates a diagnostic probe to assess the student's initial approach | |
| # This step is abstracted in the actual framework | |
| diagnostic_probe = "To find the distance between the trees, what is the first critical calculation you would make?" | |
| # 2. Student responds to the probe | |
| # (Implementation detail: you would get the student's response here) | |
| student_reasoning_trace = "I would divide the total length (1850m) by the number of trees (52)." | |
| # 3. Teacher assesses the trace and provides adaptive guidance | |
| # The teacher recognizes this common off-by-one error. | |
| # (Implementation detail: the teacher model generates this guidance) | |
| adaptive_guidance = "Your approach is close. Remember that 52 trees create 51 intervals between them. The distance is uniform across these intervals." | |
| # 4. The student uses the guidance to solve the problem | |
| final_prompt = problem + "\n" + adaptive_guidance | |
| # (Implementation detail: the student model generates the final answer) | |
| final_answer = "1850 meters / 51 intervals = 36.27 meters per interval." | |
| ``` | |
| ### Running the Full Training Pipeline | |
| To replicate our results or train your own models using the ATLAS framework, clone the official repository and follow the setup instructions. | |
| ```bash | |
| # 1. Clone the repository | |
| git clone [https://github.com/Arc-Computer/ATLAS](https://github.com/Arc-Computer/ATLAS) | |
| cd ATLAS | |
| # 2. Install dependencies | |
| bash scripts/install_py312.sh | |
| # 3. Run training | |
| # Phase 1: Supervised Fine-Tuning (SFT) | |
| scripts/launch.sh 4 configs/run/teacher_sft.yaml | |
| # Phase 2: Reinforcement Learning (RL) | |
| scripts/launch_with_server.sh 1 3 configs/run/teacher_rcl.yaml | |
| ``` | |
| ## Training Details | |
| - **Base Model:** [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | |
| - **Training Framework:** ATLAS (SFT → RL with GRPO) | |
| - **Key Feature:** The RL phase uses an asymmetric reward function that heavily penalizes any instance of student performance degradation, which is key to the framework's reliability. | |
| - **Dataset:** [Arc-Intelligence/Arc-ATLAS-Teach-v0](https://huggingface.co/datasets/Arc-Intelligence/Arc-ATLAS-Teach-v0) | |
| - **Context Length:** 8192 tokens | |
| ## Citation | |
| If you use the ATLAS framework or our models in your research, please cite our work: | |
| ```bibtex | |
| @misc{barnes2025atlas, | |
| title={{ATLAS: Adaptive Teaching and Learning Alignment System for Reinforcement Learning}}, | |
| author={Jarrod Barnes and Aman Jaglan}, | |
| year={2025}, | |
| publisher={Arc Intelligence}, | |
| note={Technical Report}, | |
| url={[https://github.com/Arc-Computer/ATLAS](https://github.com/Arc-Computer/ATLAS)} | |
| } | |
| ``` | |
| ## Project Resources | |
| - **GitHub Repository:** [https://github.com/Arc-Computer/ATLAS](https://github.com/Arc-Computer/ATLAS) | |
| - **Companion Model:** [ATLAS-8B-Instruct](https://huggingface.co/Arc-Intelligence/ATLAS-8B-Instruct) | |
| - **Training Dataset:** [Arc-ATLAS-Teach-v0](https://huggingface.co/datasets/Arc-Intelligence/Arc-ATLAS-Teach-v0) |