OpenAI’s deep research can complete 26% of Humanity’s Last Exam—a benchmark for the frontier of human knowledge
OpenAI’s o1 and DeepSeek’s R1 models, which previously sat atop the leaderboard, could only get through roughly 9% of the exam.

Apr 17, 2025 0
Apr 16, 2025 0
Apr 16, 2025 0
Apr 16, 2025 0
Feb 9, 2025 0
Feb 9, 2025 0
Feb 9, 2025 0
Feb 9, 2025 0
Apr 19, 2025 0
Apr 19, 2025 0
Apr 19, 2025 0
Apr 19, 2025 0
Apr 17, 2025 0
Apr 17, 2025 0
Apr 17, 2025 0
Apr 17, 2025 0
Apr 19, 2025 0
Apr 19, 2025 0
Apr 19, 2025 0
Apr 19, 2025 0
Apr 18, 2025 0
Apr 18, 2025 0
Apr 17, 2025 0
Apr 18, 2025 0
Apr 18, 2025 0
Apr 18, 2025 0
Apr 18, 2025 0
Mar 7, 2025 0
Apr 18, 2025 0
Apr 17, 2025 0
Apr 16, 2025 0
Apr 16, 2025 0
Apr 19, 2025 0
Apr 18, 2025 0
Apr 18, 2025 0
Apr 18, 2025 0
Apr 19, 2025 0
Apr 18, 2025 0
Apr 19, 2025 0
Apr 18, 2025 0
Apr 17, 2025 0
Apr 16, 2025 0
Apr 18, 2025 0
Apr 17, 2025 0
Apr 17, 2025 0
Apr 16, 2025 0
Apr 18, 2025 0
Apr 11, 2025 0
Or register with email
Feb 9, 2025 0
Feb 9, 2025 0
Feb 9, 2025 0
Feb 9, 2025 0
Mar 20, 2025 0
This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.