Quantum-Audit

Evaluating the Reasoning Limits of LLMs on Quantum Computing

Model Leaderboard

# Model Provider Size Expert Written LLM Extracted Complete Dataset

Performance by Topic

Models handle foundational concepts well but decline sharply on advanced topics. Security questions see the steepest drop, with performance falling to 76%.

Sample Questions

Examples from the Quantum-Audit benchmark illustrating the depth and breadth of questions.

Data & Code

Download the Quantum-Audit benchmark dataset and evaluation code.

1,000
QA 1000 Expert Written

Multiple choice questions developed by quantum computing researchers.

Download JSON
1,000
QA 1000 LLM Extracted

Questions extracted from research papers using LLMs and validated by domain experts.

Download JSON
2,000
QA 2000 Complete Dataset

The full benchmark combining expert-written and LLM-extracted questions across all topics.

Download JSON
350
Open-Ended

Questions requiring detailed explanations without answer options.

Download JSON
350
False Premise

Questions with intentionally incorrect assumptions to test error detection.

Download JSON
500
QA 500 Expert Subset

A curated subset of 500 expert-written multiple choice questions.

Download JSON
500
QA 500 Spanish

The QA500 subset translated into Spanish for cross-lingual evaluation.

Download JSON
500
QA 500 French

The QA500 subset translated into French for cross-lingual evaluation.

Download JSON
</>
Benchmarking Code

Scripts and evaluation code used to run the Quantum-Audit benchmark on all models.

Download