khaaaaaan/CoCoDet · Hugging Face

1 Introduction

Model Description

CoCoDet (Content-Concentrated Detector) is a multi-task text classification model designed to detect AI-generated content in academic peer reviews. It is trained to focus on the substantive content of a review rather than superficial stylistic cues, enabling more robust and equitable detection. The growing use of LLMs in academic peer review poses risks to scholarly integrity. Existing detectors rely on stylistic patterns, making them vulnerable to paraphrasing and prone to misclassifying permissibly AI-assisted writing. CoCoDet addresses this by shifting the detection paradigm from style to content.

Sources

Model https://huggingface.co/khaaaaaan/CoCoDet
Dataset https://huggingface.co/datasets/khaaaaaan/CoCoNUTS
Script: https://github.com/Y1hanChen/COCONUTS
Paper: https://arxiv.org/abs/2509.04460

2 Uses

Direct Use

The primary intended use of this model is to classify academic peer reviews based on their content origin. Given a review text, it outputs logits for four different tasks. The main task identifies whether the substantive content is human-written, AI-generated, or a mix of both. The recommended way to use this model is via the official inference script provided in the GitHub repository.

Out-of-Scope Use

This model is specifically trained on academic peer reviews from computer science conferences. Its performance is not guaranteed on other domains. This model should not be used as the sole arbiter of academic misconduct. It is a tool to provide a strong signal for further human review, not a final verdict.

Bias, Risks, and Limitations

Domain Bias: The training data is sourced from computer science conferences (ICLR, NeurIPS, etc.). The model may perform differently on reviews from other academic disciplines.
Risk of Misclassification: Misclassifying reviews can lead to unfair consequences. All high-confidence AI predictions should be manually verified by qualified human experts.
Robustness Limitations: As LLMs evolve, they may produce text that can evade this detector. The model is a snapshot in time and is not guaranteed to be robust against all future adversarial attacks.

How to Get Started with the Model

The recommended way to use this model is by cloning the official GitHub repository and using the provided inference and evaluation scripts.

Step 1: Clone the GitHub repository

git clone https://github.com/Y1hanChen/COCONUTS.git
cd COCONUTS

Step 2: Create directories and download files Manually create model/ and dataset/ directories. Download this model's checkpoint (CoCoDet.bin) into model/ and the test data (test.jsonl) from the companion dataset repository into dataset/.

Step 3: Run the inference script

python script/inference.py \
    --model_name_or_path answerdotai/ModernBERT-base \
    --trained_model_path model/CoCoDet.bin \
    --test_file dataset/test.jsonl \
    --output_file results/predictions.jsonl \
    --batch_size 16

3 Training & Evaluation

Training Data

The model was trained on the CoCoNUTS benchmark, a comprehensive dataset of 315,535 academic peer reviews simulating six realistic modes of human-AI collaboration. The official test split of this benchmark is publicly available on the Hugging Face Hub to allow for the reproduction of our evaluation results: khaaaaaan/CoCoNUTS.

Training Procedure

The answerdotai/ModernBERT-base model was fine-tuned using a multi-task learning framework. The training was performed with the AdamW optimizer, fp16 mixed precision, and a composite loss function that combines a primary content-composition task with three auxiliary tasks (collaboration mode, content source, and style attribution).

Evaluation Metrics

The primary metric for the main 3-class task is the Macro F1-score, which provides a balanced measure of performance across the Human, Mix, and AI classes. For comparison with binary detectors, Accuracy (mean of Human and AI sets) and Predicted AI Rate are used.

Evaluation Results

The model was evaluated on a held-out test split of the CoCoNUTS benchmark. CoCoDet demonstrates state-of-the-art performance, significantly outperforming both LLM-based and general-purpose detectors.

Main Task Performance (3-Class F1-score %)

This table compares CoCoDet against various Large Language Models (LLMs) in both zero-shot and few-shot settings.

Detector	Human	Mix	AI	Average
LLMs (zero-shot)
DeepSeek-R1-0528	50.04	3.29	3.63	18.98
Gemini-2.5-flash-0520 (CoT)	56.01	2.81	47.87	35.56
Gemini-2.5-flash-0520	57.28	12.37	49.80	39.82
Qwen2.5-72B-Instruct	48.47	3.05	16.82	22.78
Qwen3-32B	50.30	0.11	4.89	18.43
LLMs (few-shot)
DeepSeek-R1-0528	51.81	5.65	17.93	25.13
Gemini-2.5-flash-0520 (CoT)	64.95	10.87	61.42	45.75
Gemini-2.5-flash-0520	74.05	39.90	62.97	58.97
Qwen2.5-72B-Instruct	47.17	16.85	14.61	26.21
Qwen3-32B	53.64	0.02	38.39	30.68
PLM (SFT)
CoCoDet	98.94	97.41	98.37	98.24

Comparison with General Detectors (Binary Task)

This table shows CoCoDet's performance in a binary (Human vs. AI) setting compared to other general-purpose AI text detectors.

Detector	Predicted AI Rate (Human↓)	Predicted AI Rate (Mix)	Predicted AI Rate (AI↑)	Acc↑	Sty-Rob
Radar	24.91	26.33	34.93	55.01	✔️
LLMDet	98.82	98.45	99.26	50.22	❌
FastDetectGPT	53.09	92.98	92.56	69.74	❌
Binoculars (accuracy)	15.86	66.96	74.32	79.23	✔️
Binoculars (low-fpr)	3.30	34.78	49.81	73.26	✔️
LLM-DetectAIve	3.92	33.89	83.52	89.80	✔️
CoCoDet	1.31	--	96.90	97.80	--

Citation

@misc{chen2025coconutsconcentratingcontentneglecting,
      title={{CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection}}, 
      author={Yihan Chen and Jiawei Chen and Guozhao Mo and Xuanang Chen and Ben He and Xianpei Han and Le Sun},
      year={2025},
      eprint={2509.04460},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.04460}, 
}

khaaaaaan
/

CoCoDet