rl-research
/

DR-Tulu-SFT-8B

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

akariasai commited on 25 days ago

Commit

3b5e87c

·

verified ·

1 Parent(s): 1fb1b82

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ datasets:
 ---
 > [!NOTE]
-> For full information, go check out the Dr Tulu paper [here](https://arxiv.org/abs/TODO).
 <img src="https://huggingface.co/rl-research/DR-Tulu-SFT-8B/resolve/main/dr_tulu_logo.png" alt="Figure 1" width="500"/>
@@ -50,7 +50,7 @@ We provide evaluation instructions in the [dr-agent-lib github](TODO).
 | [DR-Tulu-SFT-8B](https://huggingface.co/rl-research/DR-Tulu-SFT-8B) (**this model**) | 72.3 | 38.1 | 68.5 | 39.0 | 75.5 | 66.5 | 31.9 | 56.0 |
 | [DR-Tulu-8B](https://huggingface.co/rl-research/DR-Tulu-8B) | **86.7** | **43.7** |  **71.1** | **41.8** | **80.1** | **68.0** | **39.1** | **61.5** |
-For more baselines, explanations of this table, and analysis of reesults, check out the [Dr Tulu paper](https://arxiv.org/abs/TODO)!
 # Intended uses & limitations
@@ -86,7 +86,7 @@ For futher details, check out the [Dr Tulu paper](https://arxiv.org/abs/TODO).
 ```
 @article{drtulu,
   title = {{DR Tulu:  Reinforcement Learning with Evolving Rubrics for Deep Research}},
-  author = {{Rulin Shao, Akari Asai, Shannon Shen, Hamish Ivison, Varsha Kishore, Jingming Zhuo, Xinran Zhao, Molly Park, David Sontag, Tyler Murray, Sam Finlayson, Sewon Min, Pradeep Dasigi, Luca Soldani, Faeze Brahman, Scott Yih, Sherry Tongshuang Wu, Luke Zettlemoyer, Yoon Kim, Hanna Hajishirzi, Pang Wei Koh}},
   journal={arXiv preprint TODO}
   year = {2025},
 }

 ---
 > [!NOTE]
+> For full information, go check out the Dr Tulu paper [here](http://allenai-web/papers/drtulu).
 <img src="https://huggingface.co/rl-research/DR-Tulu-SFT-8B/resolve/main/dr_tulu_logo.png" alt="Figure 1" width="500"/>
 | [DR-Tulu-SFT-8B](https://huggingface.co/rl-research/DR-Tulu-SFT-8B) (**this model**) | 72.3 | 38.1 | 68.5 | 39.0 | 75.5 | 66.5 | 31.9 | 56.0 |
 | [DR-Tulu-8B](https://huggingface.co/rl-research/DR-Tulu-8B) | **86.7** | **43.7** |  **71.1** | **41.8** | **80.1** | **68.0** | **39.1** | **61.5** |
+For more baselines, explanations of this table, and analysis of results, check out the [Dr Tulu paper](https://arxiv.org/abs/TODO)!
 # Intended uses & limitations
 ```
 @article{drtulu,
   title = {{DR Tulu:  Reinforcement Learning with Evolving Rubrics for Deep Research}},
+  author = {{Rulin Shao, Akari Asai, Shannon Shen, Hamish Ivison, Varsha Kishore, Jingming Zhuo, Xinran Zhao, Molly Park, Sam Finlayson, David Sontag, Tyler Murray, Sewon Min, Pradeep Dasigi, Luca Soldani, Faeze Brahman, Scott Yih, Sherry Tongshuang Wu, Luke Zettlemoyer, Yoon Kim, Hanna Hajishirzi, Pang Wei Koh}},
   journal={arXiv preprint TODO}
   year = {2025},
 }