Quantum-Audit | LLM Benchmark for Quantum Computing

Model Leaderboard

#	Model	Provider	Size	Expert Written	LLM Extracted	Complete Dataset

Performance by Topic

Models handle foundational concepts well but decline sharply on advanced topics. Security questions see the steepest drop, with performance falling to 76%.

Sample Questions

Examples from the Quantum-Audit benchmark illustrating the depth and breadth of questions.

Data & Code

Download the Quantum-Audit benchmark dataset and evaluation code.

1,000

QA 1000 Expert Written

Multiple choice questions developed by quantum computing researchers.

Download JSON

1,000

QA 1000 LLM Extracted

Questions extracted from research papers using LLMs and validated by domain experts.

Download JSON

2,000

QA 2000 Complete Dataset

The full benchmark combining expert-written and LLM-extracted questions across all topics.

Download JSON

350

Open-Ended

Questions requiring detailed explanations without answer options.

Download JSON

350

False Premise

Questions with intentionally incorrect assumptions to test error detection.

Download JSON

500

QA 500 Expert Subset

A curated subset of 500 expert-written multiple choice questions.

Download JSON

500

QA 500 Spanish

The QA500 subset translated into Spanish for cross-lingual evaluation.

Download JSON

500

QA 500 French

The QA500 subset translated into French for cross-lingual evaluation.

Download JSON

</>

Benchmarking Code

Scripts and evaluation code used to run the Quantum-Audit benchmark on all models.

Download