Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
The non-profit Center for AI Security (CAIS) and Scale AI, a company that provides a range of data tagging and AI development services difficult new benchmark for frontier AI systems.
The benchmark, called the Humanity Final Exam, consists of thousands of multiple-choice questions covering subjects such as mathematics, humanities and natural sciences. To make the assessment more rigorous, the questions are in multiple formats, including formats that incorporate diagrams and pictures.
a preliminary researchno publicly available advanced AI system has scored better than 10% on the Humanity Final Exam.
CAIS and Scale AI said they plan to open the benchmark to the research community so researchers can “dig deeper into the changes” and evaluate new AI models.