This is the official model from the publication "Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models" (arXiv, 2024).

TLDR: Divergent Chain of Thought (DCoT) consists of requiring models to generate multiple CoTs before choosing an answer. Adding DCoT data to instruction tuning allows models to improve performance through self-correction.

Load the Model

from peft import LoraConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch


base_model_path = "microsoft/phi-2"
model = AutoModelForCausalLM.from_pretrained(
            base_model_path,
            torch_dtype=torch.bfloat16,
            device_map="auto",
        )
peft_model_id = "haritzpuerto/phi-2-dcot/"
model.load_adapter(peft_model_id)

tokenizer = AutoTokenizer.from_pretrained(base_model_path)

Run the model

Prompt Template

[Question] {question} [Context] {document} [Options] {answer_options} [Number of answers] {k}

Note, that not all commands (text in brackets) are mandatory. [Context] and [Options] are optional.

[Context] refers to a paragraph that contains the answer to a question (for span-extraction QA).
[Options] refers to a list of candidate answers (for multiple-choice QA). The format is A) {answer option 1} B) {answer option 2}, ...

The minimal template is

[Question] {question} [Number of answers] {k}

The inclusion of context and options depends on your tasks.

Response format

You should expect the model returning the following type of text

[Answer 1]CoT_1
[Answer 2]CoT_2
...
[Final answer] answer

You should get as many answers as requested with the command [Number of answers] {k}

Run Example

prompt = "[Question] Juan and LaKeisha roll a few objects down a ramp. They want to see which object rolls the farthest. What should they do so they can repeat their investigation?\n[Options] A) Put the objects in groups. B) Change the height of the ramp. C) Choose different objects to roll. D) Record the details of the investigation.\n[Number of answers] 2\n[Answer 1] "
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs.to("cuda"), max_length=1024)
print(tokenizer.decode(output[0]))

You should get an output similar to:

<s> [Question] Juan and LaKeisha roll a few objects down a ramp. They want to see which object rolls the farthest. What should they do so they can repeat their investigation?
[Options] A) Put the objects in groups. B) Change the height of the ramp. C) Choose different objects to roll. D) Record the details of the investigation.
[Number of answers] 2
[Answer 1] 1. Juan and LaKeisha want to see which object rolls the farthest.
2. They have already rolled a few objects down the ramp.
3. To repeat their investigation, they need to do something that will affect the outcome of the experiment.
4. Putting the objects in groups will not affect the outcome of the experiment.
5. Changing the height of the ramp may affect the outcome, but it is not the best option as it requires changing the setup of the experiment.
6. Choosing different objects to roll may also affect the outcome, but it is not the best option as it does not address the issue of repeating the experiment.
7. The best option is to record the details of the investigation. This includes the objects used, the height of the ramp, and any other relevant information. By recording the details, Juan and LaKeisha can repeat the experiment with the same conditions and compare the results.
[Answer 2] Step 1: Identify the problem and the question.

Problem: Juan and LaKeisha want to see which object rolls the farthest.

Question: What should they do to repeat their investigation?

Step 2: Evaluate the options.

A) Put the objects in groups. - This option does not directly relate to the question of which object rolls the farthest, so it can be eliminated.

B) Change the height of the ramp. - This option also does not directly relate to the question of which object rolls the farthest, so it can be eliminated.

C) Choose different objects to roll. - This option is a possible solution to the question, but it does not guarantee that the object will roll the farthest.

D) Record the details of the investigation. - This option is a necessary step to repeat the investigation.

Step 3: Choose the best option.

The best option to repeat the investigation is to record the details of the investigation. This will allow them to replicate the conditions of the original experiment and compare the results.

[Final answer] D) Record the details of the investigation.</s>

Training details

We train all models using LoRA with the PEFT library. The main parameters are:

Param. name	Value
lora_r	64
lora_alpha	16
lora_dropout	0.1
batch size	4
learning_rate	2e-4
weight_decay	0.001
optim	paged_adamw_32bit
lr_scheduler_type	constant

Please check Appendix B of the paper for more details.

Cite

If you find our work useful, please consider citing it using the following citation:

@misc{puerto2024dcot,
      title={Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models}, 
      author={Haritz Puerto and Tilek Chubakov and Xiaodan Zhu and Harish Tayyar Madabushi and Iryna Gurevych},
      year={2024},
      eprint={2407.03181},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.03181}, 
}

Downloads last month: 1

Model tree for haritzpuerto/phi-2-dcot

Base model

microsoft/phi-2

Adapter

(912)

this model

Datasets used to train haritzpuerto/phi-2-dcot

Collection including haritzpuerto/phi-2-dcot

DCoT

Collection

Models from the ACL 2025 paper "Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs" " • 6 items • Updated Jun 10 • 1