1 Introduction
Model Description
CoCoDet (Content-Concentrated Detector) is a multi-task text classification model designed to detect AI-generated content in academic peer reviews. It is trained to focus on the substantive content of a review rather than superficial stylistic cues, enabling more robust and equitable detection. The growing use of LLMs in academic peer review poses risks to scholarly integrity. Existing detectors rely on stylistic patterns, making them vulnerable to paraphrasing and prone to misclassifying permissibly AI-assisted writing. CoCoDet addresses this by shifting the detection paradigm from style to content.
Sources
- Model https://huggingface.co/khaaaaaan/CoCoDet
- Dataset https://huggingface.co/datasets/khaaaaaan/CoCoNUTS
- Script: https://github.com/Y1hanChen/COCONUTS
- Paper: https://arxiv.org/abs/2509.04460
2 Uses
Direct Use
The primary intended use of this model is to classify academic peer reviews based on their content origin. Given a review text, it outputs logits for four different tasks. The main task identifies whether the substantive content is human-written, AI-generated, or a mix of both. The recommended way to use this model is via the official inference script provided in the GitHub repository.
Out-of-Scope Use
This model is specifically trained on academic peer reviews from computer science conferences. Its performance is not guaranteed on other domains. This model should not be used as the sole arbiter of academic misconduct. It is a tool to provide a strong signal for further human review, not a final verdict.
Bias, Risks, and Limitations
- Domain Bias: The training data is sourced from computer science conferences (ICLR, NeurIPS, etc.). The model may perform differently on reviews from other academic disciplines.
- Risk of Misclassification: Misclassifying reviews can lead to unfair consequences. All high-confidence AI predictions should be manually verified by qualified human experts.
- Robustness Limitations: As LLMs evolve, they may produce text that can evade this detector. The model is a snapshot in time and is not guaranteed to be robust against all future adversarial attacks.
How to Get Started with the Model
The recommended way to use this model is by cloning the official GitHub repository and using the provided inference and evaluation scripts.
Step 1: Clone the GitHub repository
git clone https://github.com/Y1hanChen/COCONUTS.git
cd COCONUTS
Step 2: Create directories and download files Manually create model/ and dataset/ directories. Download this model's checkpoint (CoCoDet.bin) into model/ and the test data (test.jsonl) from the companion dataset repository into dataset/.
Step 3: Run the inference script
python script/inference.py \
--model_name_or_path answerdotai/ModernBERT-base \
--trained_model_path model/CoCoDet.bin \
--test_file dataset/test.jsonl \
--output_file results/predictions.jsonl \
--batch_size 16
3 Training & Evaluation
Training Data
The model was trained on the CoCoNUTS benchmark, a comprehensive dataset of 315,535 academic peer reviews simulating six realistic modes of human-AI collaboration. The official test split of this benchmark is publicly available on the Hugging Face Hub to allow for the reproduction of our evaluation results: khaaaaaan/CoCoNUTS.
Training Procedure
The answerdotai/ModernBERT-base
model was fine-tuned using a multi-task learning framework. The training was performed with the AdamW optimizer, fp16 mixed precision, and a composite loss function that combines a primary content-composition task with three auxiliary tasks (collaboration mode, content source, and style attribution).
Evaluation Metrics
The primary metric for the main 3-class task is the Macro F1-score, which provides a balanced measure of performance across the Human, Mix, and AI classes. For comparison with binary detectors, Accuracy (mean of Human and AI sets) and Predicted AI Rate are used.
Evaluation Results
The model was evaluated on a held-out test split of the CoCoNUTS benchmark. CoCoDet demonstrates state-of-the-art performance, significantly outperforming both LLM-based and general-purpose detectors.
Main Task Performance (3-Class F1-score %)
This table compares CoCoDet against various Large Language Models (LLMs) in both zero-shot and few-shot settings.
Detector | Human | Mix | AI | Average |
---|---|---|---|---|
LLMs (zero-shot) | ||||
DeepSeek-R1-0528 | 50.04 | 3.29 | 3.63 | 18.98 |
Gemini-2.5-flash-0520 (CoT) | 56.01 | 2.81 | 47.87 | 35.56 |
Gemini-2.5-flash-0520 | 57.28 | 12.37 | 49.80 | 39.82 |
Qwen2.5-72B-Instruct | 48.47 | 3.05 | 16.82 | 22.78 |
Qwen3-32B | 50.30 | 0.11 | 4.89 | 18.43 |
LLMs (few-shot) | ||||
DeepSeek-R1-0528 | 51.81 | 5.65 | 17.93 | 25.13 |
Gemini-2.5-flash-0520 (CoT) | 64.95 | 10.87 | 61.42 | 45.75 |
Gemini-2.5-flash-0520 | 74.05 | 39.90 | 62.97 | 58.97 |
Qwen2.5-72B-Instruct | 47.17 | 16.85 | 14.61 | 26.21 |
Qwen3-32B | 53.64 | 0.02 | 38.39 | 30.68 |
PLM (SFT) | ||||
CoCoDet | 98.94 | 97.41 | 98.37 | 98.24 |
Comparison with General Detectors (Binary Task)
This table shows CoCoDet's performance in a binary (Human vs. AI) setting compared to other general-purpose AI text detectors.
Detector | Predicted AI Rate (Human↓) | Predicted AI Rate (Mix) | Predicted AI Rate (AI↑) | Acc↑ | Sty-Rob |
---|---|---|---|---|---|
Radar | 24.91 | 26.33 | 34.93 | 55.01 | ✔️ |
LLMDet | 98.82 | 98.45 | 99.26 | 50.22 | ❌ |
FastDetectGPT | 53.09 | 92.98 | 92.56 | 69.74 | ❌ |
Binoculars (accuracy) | 15.86 | 66.96 | 74.32 | 79.23 | ✔️ |
Binoculars (low-fpr) | 3.30 | 34.78 | 49.81 | 73.26 | ✔️ |
LLM-DetectAIve | 3.92 | 33.89 | 83.52 | 89.80 | ✔️ |
CoCoDet | 1.31 | -- | 96.90 | 97.80 | -- |
Citation
@misc{chen2025coconutsconcentratingcontentneglecting,
title={{CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection}},
author={Yihan Chen and Jiawei Chen and Guozhao Mo and Xuanang Chen and Ben He and Xianpei Han and Le Sun},
year={2025},
eprint={2509.04460},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.04460},
}
Model tree for khaaaaaan/CoCoDet
Base model
answerdotai/ModernBERT-base