hamishivi commited on
Commit
873f14d
·
verified ·
1 Parent(s): 528281f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -22
README.md CHANGED
@@ -6,33 +6,54 @@ tags:
6
  - llama-factory
7
  - full
8
  - generated_from_trainer
9
- model-index:
10
- - name: qwen3-8B-sft-mix-v20250921
11
- results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
- # qwen3-8B-sft-mix-v20250921
18
 
19
- This model is a fine-tuned version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on the rl-rag/sft-mix-v20250921 dataset.
20
 
21
- ## Model description
 
22
 
23
- More information needed
24
 
25
- ## Intended uses & limitations
26
 
27
- More information needed
 
 
28
 
29
- ## Training and evaluation data
 
 
30
 
31
- More information needed
 
32
 
33
- ## Training procedure
34
 
35
- ### Training hyperparameters
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
  The following hyperparameters were used during training:
38
  - learning_rate: 4e-05
@@ -49,13 +70,21 @@ The following hyperparameters were used during training:
49
  - lr_scheduler_warmup_ratio: 0.1
50
  - num_epochs: 5
51
 
52
- ### Training results
53
-
54
 
 
 
 
 
 
55
 
56
- ### Framework versions
57
 
58
- - Transformers 4.52.4
59
- - Pytorch 2.8.0+cu128
60
- - Datasets 3.6.0
61
- - Tokenizers 0.21.1
 
 
 
 
 
 
6
  - llama-factory
7
  - full
8
  - generated_from_trainer
9
+ datasets:
10
+ - rl-research/dr-tulu-sft-data
 
11
  ---
12
 
13
+ > [!NOTE]
14
+ > For full information, go check out the Dr Tulu paper [here](https://arxiv.org/abs/TODO).
15
 
16
+ # DR Tulu SFT 8B
17
 
18
+ This is the SFT checkpoint of DR Tulu, an open deep research agent trained on top of [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B).
19
 
20
+ This model has undergone SFT training on [this dataset](https://huggingface.co/datasets/rl-research/dr-tulu-sft-data).
21
+ For more details on DR Tulu, check out the figure below and **read our [paper]()**!
22
 
23
+ <img src="figure TODO" alt="Figure 1" width="1000"/>
24
 
25
+ # Inference and Usage
26
 
27
+ **This model has been trained for tool-use using the dr-agent-lib framework**.
28
+ As such, running it out of the box with HuggingFace or vLLM will not work!
29
+ Instead, you can run it like so:
30
 
31
+ ```python
32
+ TODO code snippet showing how to run model
33
+ ```
34
 
35
+ See the [dr-agent-lib github](TODO) for more details on installation and how to run our model.
36
+ Or check out our [live demo](TODO)!
37
 
38
+ # Evaluation Results
39
 
40
+ We provide evaluation instructions in the [dr-agent-lib github](TODO).
41
+
42
+
43
+ | Benchmark | SQAv2 | HealthBench | ResearchQA | DeepResearch Bench | SimpleQA | 2Wiki | WebWalker | Average |
44
+ |:----------|:------:|:----------:|:---------:|:-------------------:|:------:|:-------:|-------:|-------:|
45
+ | [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) (naive rag; starting point model) | 40.4 | 16.5 | 56.1 | 33.3 | 52.6 | 18.9 | 8.8 | 32.4 |
46
+ | [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) (our pipeline; starting point model) | 57.2 | 5.9 | 46.3 | 18.2 | 70.5 | 44.0 | 27.9 | 38.6 |
47
+ | [DR-Tulu-SFT-8B](https://huggingface.co/rl-research/DR-Tulu-SFT-8B) (**this model**) | 72.3 | 38.1 | 68.5 | 39.0 | 75.5 | 66.5 | 31.9 | 56.0 |
48
+ | [DR-Tulu-8B](https://huggingface.co/rl-research/DR-Tulu-8B) | **86.7** | **43.7** | **71.1** | **41.8** | **80.1** | **68.0** | **39.1** | **61.5** |
49
+
50
+ For more baselines, explanations of this table, and analysis of reesults, check out the [Dr Tulu paper](https://arxiv.org/abs/TODO)!
51
+
52
+ # Intended uses & limitations
53
+
54
+ This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with [Ai2's Responsible Use Guidelines](https://allenai.org/responsible-use).
55
+
56
+ ## Training
57
 
58
  The following hyperparameters were used during training:
59
  - learning_rate: 4e-05
 
70
  - lr_scheduler_warmup_ratio: 0.1
71
  - num_epochs: 5
72
 
73
+ For futher details, check out the [Dr Tulu paper](https://arxiv.org/abs/TODO).
 
74
 
75
+ # Links
76
+ - 📝 [DR Tulu Paper](https://arxiv.org/abs/TODO)
77
+ - ⚙️ [DR Tulu demo](https://dr-tulu.github.io/)
78
+ - 💻 [DR Tulu code](https://github.com/rlresearch/DR-Tulu)
79
+ - 🤖 [DR Tulu collection](https://huggingface.co/collections/rl-research/dr-tulu)
80
 
 
81
 
82
+ # Citation
83
+ ```
84
+ @article{drtulu,
85
+ title = {{DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research}},
86
+ author = {{Rulin Shao, Akari Asai, Shannon Shen, Hamish Ivison, Varsha Kishore, Jingming Zhuo, Xinran Zhao, Molly Park, David Sontag, Tyler Murray, Sam Finlayson, Sewon Min, Pradeep Dasigi, Luca Soldani, Faeze Brahman, Scott Yih, Sherry Tongshuang Wu, Luke Zettlemoyer, Yoon Kim, Hanna Hajishirzi, Pang Wei Koh}},
87
+ journal={arXiv preprint TODO}
88
+ year = {2025},
89
+ }
90
+ ```