Eval Quadratic Python Code Tutorial

Model-Agnostic Empirical Evaluation of Test-Driven Prompt Engineering on Improving Accuracy and Efficiency in Large Language Models Python Code Generation

Abstract: Although Large Language Models (LLMs) are widely adopted for code generation, the generated code can be semantically incorrect, requiring iterations of evaluation and refinement. Test-driven ...

IEEE

Evaluating Python Static Code Analysis Tools Using FAIR Principles

Abstract: The quality of modern software relies heavily on the effective use of static code analysis tools. To improve their usefulness, these tools should be evaluated using a framework that ...

GitHub

Provider-agnostic, open-source evaluation infrastructure for language models

openbench provides standardized, reproducible benchmarking for LLMs across 30+ evaluation suites (and growing) spanning knowledge, math, reasoning, coding, science, reading comprehension, health, long ...

10 News

Dolly the Python gets full health evaluation for the first time in 5 years ahead of Snake Day event

KNOXVILLE, Tenn. — Officials with Zoo Knoxville said Dolly, the giant reticulated python, got a comprehensive health evaluation for the first time in five years. Dolly got a full physical assessment, ...

GitHub

ojbench/oj-eval-claude-code-017-20260128052249

This assignment requires implementing a train ticket booking system similar to 12306. The system must store user data, ticket data, and train data locally and perform efficient operations on them.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results