--- library_name: transformers license: apache-2.0 base_model: Qwen/Qwen3-8B tags: - llama-factory - full - generated_from_trainer datasets: - rl-research/dr-tulu-sft-data --- > [!NOTE] > For full information, go check out the Dr Tulu paper [here](https://arxiv.org/abs/2511.19399). Figure 1 # DR Tulu SFT 8B This is the SFT checkpoint of DR Tulu, an open deep research agent trained on top of [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). This model has undergone SFT training on [this dataset](https://huggingface.co/datasets/rl-research/dr-tulu-sft-data). For more details on DR Tulu please **read our [paper](https://allenai.org/papers/drtulu)**! # Inference and Usage **This model has been trained for tool-use using the dr-agent-lib framework**. As such, running it out of the box with HuggingFace or vLLM will not work well! See [our github](https://github.com/rlresearch/dr-tulu) for more details on installation and how to run our model. Or check out our [demo](https://dr-tulu.github.io/)! # Evaluation Results We provide evaluation instructions in [our github](https://github.com/rlresearch/dr-tulu). | Benchmark | SQAv2 | HealthBench | ResearchQA | DeepResearch Bench | SimpleQA | 2Wiki | WebWalker | Average | |:----------|:------:|:----------:|:---------:|:-------------------:|:------:|:-------:|-------:|-------:| | [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) (naive rag) | 40.4 | 16.5 | 56.1 | 33.3 | 52.6 | 18.9 | 8.8 | 32.4 | | [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) (our search pipeline) | 57.2 | 5.9 | 46.3 | 18.2 | 70.5 | 44.0 | 27.9 | 38.6 | | [DR-Tulu-SFT-8B](https://huggingface.co/rl-research/DR-Tulu-SFT-8B) (**this model**) | 72.3 | 38.1 | 68.5 | 39.0 | 75.5 | 66.5 | 31.9 | 56.0 | | [DR-Tulu-8B](https://huggingface.co/rl-research/DR-Tulu-8B) | **86.7** | **43.7** | **71.1** | **41.8** | **80.1** | **68.0** | **39.1** | **61.5** | For more baselines, explanations of this table, and analysis of results, check out the [Dr Tulu paper](https://allenai.org/papers/drtulu)! # Intended uses & limitations This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with [Ai2's Responsible Use Guidelines](https://allenai.org/responsible-use). ## Training The following hyperparameters were used during training: - learning_rate: 4e-05 - train_batch_size: 1 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - total_eval_batch_size: 64 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 5 For futher details, check out the [Dr Tulu paper](https://allenai.org/papers/drtulu). # Links - 📝 [DR Tulu Paper](https://allenai.org/papers/drtulu) - ⚙️ [DR Tulu demo](https://dr-tulu.github.io/) - 💻 [DR Tulu code](https://github.com/rlresearch/DR-Tulu) - 🤖 [DR Tulu collection](https://huggingface.co/collections/rl-research/dr-tulu) # Citation ``` @article{shao2025dr, title={DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research}, author={Shao, Rulin and Asai, Akari and Shen, Shannon Zejiang and Ivison, Hamish and Kishore, Varsha and Zhuo, Jingming and Zhao, Xinran and Park, Molly and Finlayson, Samuel G and Sontag, David and others}, journal={arXiv preprint arXiv:2511.19399}, year={2025} } ```