The Markovian Thinker
					Collection
				
Reformulating the  RL of reasoning LLMs through Markovian Thinking paradigm.
					• 
				7 items
				• 
				Updated
					
				•
					
					10
  
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, trained with the Delethink RL paradigm. See the paper for full details.deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B\boxed{}.import asyncio
import sglang as sgl
def main():
    llm = sgl.Engine(
        model_path="McGill-NLP/longcot-8k-1.5b",
        dtype="bfloat16",
        attention_backend="flashinfer",
        mem_fraction_static=0.8,
        log_level="WARNING",
    )
    prompt = (
        r"There exist real numbers $x$ and $y$, both greater than 1, such that "
        r"$\log_x\left(y^x\right)=\log_y\left(x^{4y}\right)=10$. Find $xy$."
        "\n\nPlease reason step by step, and put your final answer within \\boxed{}."
    )
    tok = llm.tokenizer_manager.tokenizer
    query_ids = tok.apply_chat_template(
        [{"role": "user", "content": prompt}],
        tokenize=True,
        add_generation_prompt=True,
    )
    params = {"temperature": 0.6, "max_new_tokens": 8192}
    ids = llm.generate(input_ids=query_ids, sampling_params=params, return_logprob=True)
    print(tok.decode(ids, skip_special_tokens=False))
if __name__ == "__main__":
    main()
@misc{Aghajohari2025:TheMarkovianThinker,
      title={The Markovian Thinker}, 
      author={Milad Aghajohari and Kamran Chitsaz and Amirhossein Kazemnejad and Sarath Chandar and Alessandro Sordoni and Aaron Courville and Siva Reddy},
      year={2025},
      eprint={2510.06557},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.06557}, 
}
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B