You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Chem-R: Learning to Reason as a Chemist

Code   Model   License

Although large language models (LLMs) have significant potential to advance chemical discovery, current LLMs lack core chemical knowledge, produce unreliable reasoning trajectories, and exhibit suboptimal performance across diverse chemical tasks. To address these challenges, we propose Chem-R, a generalizable Chemical Reasoning model designed to emulate the deliberative processes of chemists. Chem-R is trained through a three-phase framework that progressively builds advanced reasoning capabilities, including: 1) Chemical Foundation Training, which establishes core chemical knowledge. 2) Chemical Reasoning Protocol Distillation, incorporating structured, expert-like reasoning traces to guide systematic and reliable problem solving. 3) Multi-task Group Relative Policy Optimization that optimizes the model for balanced performance across diverse molecular- and reaction-level tasks. This structured pipeline enables Chem-R to achieve state-of-the-art performance on comprehensive benchmarks, surpassing leading large language models, including Gemini-2.5-Pro and DeepSeek-R1, by up to 46% on molecular tasks and 66% on reaction tasks. Meanwhile, Chem-R also consistently outperforms the existing chemical foundation models across both molecular and reaction level tasks. These results highlight Chem-Rโ€™s robust generalization, interpretability, and potential as a foundation for next-generation AI-driven chemical discovery.

Datasets

Our multi-task training incorporates the following datasets across several categories:

Task category Datasets
Name Prediction PubChem920k
Property Prediction BACE, BBBP, ClinTox, HIV, Tox21
Molecule Design ChEBI-20
Molecule Captioning ChEBI-20
Text-based Open Molecule Generation TOMG-Bench
Yield prediction Buchwald-Hartwig, Suzuki-Miyaura
Reagents selection Suzuki-Miyaura
Reaction prediction USPTO-Mixed
Retrosynthesis USPTO-50k

To support multi-task learning, we have extended the data format used in EasyR1 by adding a task field. This modification allows for task-specific identification and enables granular accuracy evaluation, facilitating detailed multi-task comparison and analysis.

Prompt Template

Name Prediction

IUPAC to SMILES

You are an expert chemist. 
Your task is to solve the given problem step by step. 
You should put your reasoning in <think> </think> tags. 
The final answer MUST BE put in  <answer> </answer> tags. 
Please strictly follow the format.
Now predict the SMILES for the following IUPAC name:

SMILES to IUPAC

You are an expert chemist. 
Your task is to solve the given problem step by step. 
You should put your reasoning in <think> </think> tags. 
The final answer MUST BE put in  <answer> </answer> tags. 
Please strictly follow the format.
Now predict the IUPAC name for the following SMILES:

Property Prediction

BACE1

You are an expert chemist. 
Your task is to solve the given problem step by step. 
Given the SMILES string of a molecule, predict the molecular properties of a given chemical compound based on its structure, by analyzing wether it can inhibit(Yes) the Beta-site Amyloid Precursor Protein Cleaving Enzyme 1 (BACE1) or cannot inhibit(No) BACE1. 
Please answer with only Yes or No. 
You should put your reasoning in <think> </think> tags. 
The final answer MUST BE put in  <answer> </answer> tags. 
Please strictly follow the format.
Now predict the BACE1 inhibition potential (Yes or No) for the following molecule:
SMILES: 

BBBP

You are an expert chemist. 
Your task is to solve the given problem step by step. 
Given the SMILES string of a molecule, the task focuses on predicting molecular properties, specifically penetration/non-penetration to the brain-blood barrier, based on the SMILES string representation of each molecule. 
The task is to predict the binary label for a given molecule. 
You should put your reasoning in <think> </think> tags. 
The final answer MUST BE put in  <answer> </answer> tags. 
Please strictly follow the format.
Now predict the Blood-Brain Barrier (BBB) penetration potential (Yes or No) for the following molecule: 
SMILES:

ClinTox

You are an expert chemist. 
Your task is to solve the given problem step by step. 
Given the SMILES string of a molecule, the task focuses on predicting molecular properties, specifically wether a molecule is Clinically-trail-Toxic(Yes) or Not Clinically-trail-toxic (No) based on the SMILES string representation of each molecule. 
The FDA-approved status will specify if the drug is approved by the FDA for clinical trials(Yes) or Not approved by the FDA for clinical trials(No). You should put your reasoning in <think> </think> tags. The final answer MUST BE put in  <answer> </answer> tags. 
Please strictly follow the format.
Now predict the Clinical Trial Toxicity (CT_TOX) for the following molecule:
SMILES: 
FDA_APPROVED: 

HIV

You are an expert chemist. 
Your task is to solve the given problem step by step. 
Given the SMILES string of a molecule, the task focuses on predicting molecular properties, specifically its ability to inhibit HIV replication based on the SMILES string representation of each molecule. 
For this property, you just need to answer \"Yes\" or \"No\". 
Additionally, the activity test results of the molecules are provided. There are three classes of the activity test: 1). CA: confirmed active, 2). CM: Confirmed moderately active 3.) CI: Confirmed inactive. The task is to precisely predict the binary label for a given molecule and its HIV activity test, considering its properties and its potential to impede HIV replication.
You should put your reasoning in <think> </think> tags. 
The final answer MUST BE put in  <answer> </answer> tags. 
Please strictly follow the format.
Now predict the HIV replication inhibition potential (Yes or No) for the following molecule:
SMILES: 
activity: 

TOX21

You are an expert chemist. 
Your task is to solve the given problem step by step. 
Given the SMILES string of a molecule, the task focuses on predicting molecular properties, specifically wether a molecule is toxic(Yes) or Not toxic(No), based on the SMILES string representation of each molecule. 
You should put your reasoning in <think> </think> tags. 
The final answer MUST BE put in  <answer> </answer> tags. 
Please strictly follow the format.\nNow predict the toxicity (Yes or No) for the following molecule:
SMILES: 

Molecule Desgin

You are an expert chemist. 
Your task is to solve the given problem step by step. 
You should put your reasoning in <think> </think> tags. 
The final answer MUST BE put in  <answer> </answer> tags. 
Please strictly follow the format.
Now predict the SMILES representation for the following molecular design requirement:
Description: 

Molecule Captioning

You are an expert chemist. 
Your task is to solve the given problem step by step. 
You should put your reasoning in <think> </think> tags. 
The final answer MUST BE put in  <answer> The molecule is ... </answer> tags. 
Please strictly follow the format.
Now describe the following molecule based on its SMILES representation:
SMILES: 

Text-based Open Molecule Generation

MolCustom, MolEdit and MolOpt.

You are an expert chemist. 
Your task is to solve the given problem step by step.
You should explain your reasoning in <think> </think> tags. 
The final answer MUST BE a SMILES string and put in <answer> </answer> tags. 
Please strictly follow the format.
Now, solve the following problem:

Yield Prediction

Buchwald-Hartwig and Suzuki-Miyaura

You need to replace the {TASK_NAME} with the specific task name.

You are an expert chemist. 
Your task is to solve the given problem step by step. 
Given the SMILES string of a {TASK_NAME} reaction, the task focuses on predicting reaction yield, specifically whether a reaction is High-yielding (Yes) or Not High-yielding (No), based on the SMILES string representation of each Buchwald-Hartwig reaction. The reactants are separated by '.' and product are separated by '>>'. 
High-yielding reaction means the yield rate of the reaction is above 70. 
You should put your reasoning in <think> </think> tags. 
The final answer MUST BE put in  <answer> </answer> tags. 
Please strictly follow the format.
Now predict the yield classification (Yes for High-yielding, No for Not High-yielding) for the following Buchwald-Hartwig reaction:
Reaction: 

Reagent Selection

Reactant Selection, Solvent Selection and Ligand Selection

You need to replace the {TASK_NAME} with the specific task name.

You are an expert chemist. 
Your task is to solve the given problem step by step.
You should put your reasoning in <think> </think> tags. 
The final answer MUST BE put in  <answer> </answer> tags. 
Please strictly follow the format.
Now select the optimal {TASK_NAME} (from the given reactant list) that would maximize the yield for the following Suzuki reaction setup:
reactant: 
ligand: 
reagent: 
solvent: 
list of reactants for selection: []

Reaction Prediction

You are an expert chemist. 
Your task is to solve the given problem step by step. 
You should put your reasoning in <think> </think> tags. 
The final answer MUST BE put in  <answer> </answer> tags. 
Please strictly follow the format.
Note: If multiple products are predicted, they MUST be separated by a period `.` instead of commas. 
Now predict the product for the following reaction:  
Reactants: 

Retrosynthesis

You are an expert chemist. 
Your task is to solve the given problem step by step. 
You should put your reasoning in <think> </think> tags. 
The final answer MUST BE put in  <answer> </answer> tags. 
Please strictly follow the format.
Note: If multiple reactants are predicted, they MUST be separated by a period `.` instead of commas. 
Now predict the reactants for the following product: 
Product: 
Downloads last month
25
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for weidawang/Chem-R-8B

Finetuned
(1852)
this model