OpenAI: OpenAI's GPT-5 tested, found to be as capable as experts in many tasks..

By Shikha Saxena | Sep 27, 2025, 11:11 IST

OpenAI recently introduced GDPval, a new benchmark that tests its GPT-5 AI model. This test assesses how well an AI model can perform compared to professional humans in various industries and jobs. GDPval aims to understand whether OpenAI's systems have reached the level of human experts in economically important tasks.

What does OpenAI say?
According to OpenAI, GPT-5 and Anthropic's Cloud Opus 4.1 models are now close to the performance of industry experts. The GPT-5-High version was found to be equal to or better than experts in 40.6% of 44 jobs. Cloud Opus 4.1 was comparable to human experts in approximately 49% of tasks.

Scope of the benchmark model GDPval
GDPval tests AI models in nine major US industries that contribute the most to the country's total GDP. These include healthcare, finance, manufacturing, and government services. The benchmark compares the work of software engineers, nurses, journalists, and other professionals with work performed by AI. In this benchmark, OpenAI asked professionals to compare AI-generated reports with human-generated reports. The average AI performance was calculated to determine how often it equaled or surpassed humans.

What is the significance of this test?
Although GPT-5's performance is impressive, this test only evaluates a few specific tasks. Real professional work involves much more complexity than reporting. OpenAI has stated that in the future, they will expand the test to include more industries and interactive workflows.

What do experts say?
Dr. Aaron Chatterjee, OpenAI's chief economist, says that reaching this level of AI models will allow professionals to focus their time on more important tasks and delegate some tasks to AI. The GPT-4o model was comparable to human experts in only 13.7% of cases 15 months ago, while GPT-5 has nearly tripled this figure. This is a sign of the rapid progress of AI.

Conclusion
Benchmarks like GDPval could become an important way to measure the efficiency of AI models in real-world tasks. However, to definitively say that AI can outperform humans, OpenAI will need to conduct more extensive testing.

Disclaimer: This content has been sourced and edited from Amar Ujala. While we have made modifications for clarity and presentation, the original content belongs to its respective authors and website. We do not claim ownership of the content.

OpenAI: OpenAI's GPT-5 tested, found to be as capable as experts in many tasks..

Around The Web

You May Also Like

Trending