SeRe: A Security-Related Code Review Dataset Aligned with Real-World Review Activities
Abstract
A security-focused code review dataset named SeRe was developed using active learning and ensemble classification to improve automated security vulnerability detection in software development.
Software security vulnerabilities can lead to severe consequences, making early detection essential. Although code review serves as a critical defense mechanism against security flaws, relevant feedback remains scarce due to limited attention to security issues or a lack of expertise among reviewers. Existing datasets and studies primarily focus on general-purpose code review comments, either lacking security-specific annotations or being too limited in scale to support large-scale research. To bridge this gap, we introduce SeRe, a security-related code review dataset, constructed using an active learning-based ensemble classification approach. The proposed approach iteratively refines model predictions through human annotations, achieving high precision while maintaining reasonable recall. Using the fine-tuned ensemble classifier, we extracted 6,732 security-related reviews from 373,824 raw review instances, ensuring representativeness across multiple programming languages. Statistical analysis indicates that SeRe generally aligns with real-world security-related review distribution. To assess both the utility of SeRe and the effectiveness of existing code review comment generation approaches, we benchmark state-of-the-art approaches on security-related feedback generation. By releasing SeRe along with our benchmark results, we aim to advance research in automated security-focused code review and contribute to the development of more effective secure software engineering practices.
Get this paper in your agent:
hf papers read 2601.01042 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper