BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks
•
52
None defined yet.
Compare two AI models by sending them code
Evaluate code samples using specified parameters
Evaluate code samples using specified parameters
Explore and analyze code completion benchmarks
Display a PDF file in a web app
Submit code models for evaluation and view leaderboard
Check if your GitHub repo is in The Stack dataset
Search for code snippets in a dataset
Search for code snippets in a dataset
Generate code and answers from questions
Start a web server for an app
Generate code snippets in Python, Java, JavaScript