Claude Opus 4.8 prioritises honesty over overconfidence, says Anthropic

3 min readNew DelhiUpdated: May 29, 2026 01:31 PM IST

Large language models (LLMs) are often known to make claims they cannot support. Regardless of their size and prowess, LLMs are prone to making statements with complete confidence even when they are incorrect. While this has been a persistent problem, AI companies have been working on reducing these instances.

In this direction, Frontier AI lab, Anthropic, on Thursday, May 28, introduced its latest model – the Claude Opus 4.8 – which it claims to have made Claude more honest. The AI startup said that the model is more honest even with telling the user what they don’t understand.

An upgrade to Claude Opus 4.7, the Opus 4.8 is now Anthropic’s most powerful generally available model. While the improvements seem incremental, the early testers reported that the model is more likely to flag uncertainties about its work and less likely to make unsupported claims.

The company said that the improvement was possible owing to its evaluations that showed Opus 4.8 is around four times less likely than Opus 4.7 to let flaws in code written by it to pass unremarked.

Before release, Anthropic conducted a comprehensive alignment and safety evaluation of Opus 4.8, where it found that the model performed better than the earlier editions. It supported user autonomy and acted in the best interests of the user. The model also showed considerably lower rates of harmful behaviours, such as deception or assisting misuse, when compared to Claude Opus 4.7.

Moreover, its alignment levels were reportedly comparable to the company’s best-aligned model – Claude Mythos Preview, Anthropic’s frontier model that is so powerful that the company has given its access to a motley group of trusted partners.

“The assessment also showed Opus 4.8 to have rates of misaligned behaviour (such as deception or cooperation with misuse) that are substantially lower than Opus 4.7 and similar to our best-aligned model, Claude Mythos Preview. The full alignment assessment, accompanied by a suite of pre-deployment safety tests, is reported in the Claude Opus 4.8 System Card,” the company said in its blog.

Story continues below this ad

When it comes to benchmarking, Anthropic said that Opus 4.8 achieved the highest score on its Harvey’s Legal Agent Benchmark, which evaluates legal reasoning, becoming the first model to cross an overall 10 per cent on the benchmark. On computer use and browser agents, the model reportedly secured 84 per cent on Online-Mind2Web. The model demonstrated improvements in enterprise work and agentic reasoning.

Anthropic emphasised reduced unsupported claims and improved uncertainty reporting. These are the scores shared by the company; however, a thorough review by third-party testers may offer more objective results.

Claude Opus 4.8 is available immediately through Claude.ai, Claude Code, and its API. The model retains the same pricing as Opus 4.7, costing $5 per million input tokens and $25 per million output tokens. The AI lab has also introduced a Fast Mode priced at $10 per million input tokens and $50 per million output tokens. The company noted that prompt caching and batch processing can further reduce costs for developers and enterprise users.

What's Hot

AAI Recruitment 2026 notification released at aai.aero; check category-wise announced vacancies

Studying in the US? Universities issue urgent advisory ahead of major DHS visa policy shift

‘Bad health not their fault’: Jharkhand High Court denies divorce on wife’s mental illness

AAI Recruitment 2026 notification released at aai.aero; check category-wise announced vacancies

‘Bad health not their fault’: Jharkhand High Court denies divorce on wife’s mental illness

Suvendu Adhikari seeks death penalty for gateman in Bengal train accident

Punjab man finds ‘fungus’ inside instant noodles pack, wins Rs 10,000

Samsung Galaxy Z Fold8 Ultra, Fold8, Flip8 now available in India: Check price and offers

‘The India I wanted to see’: Protesters at Jantar Mantar distribute surplus food to hospital bystanders

AAI Recruitment 2026 notification released at aai.aero; check category-wise announced vacancies

Studying in the US? Universities issue urgent advisory ahead of major DHS visa policy shift

‘Bad health not their fault’: Jharkhand High Court denies divorce on wife’s mental illness

AAI Recruitment 2026 notification released at aai.aero; check category-wise announced vacancies

Studying in the US? Universities issue urgent advisory ahead of major DHS visa policy shift

‘Bad health not their fault’: Jharkhand High Court denies divorce on wife’s mental illness

News

Company

Services

What's Hot

Claude Opus 4.8 prioritises honesty over overconfidence, says Anthropic

Related Posts

News

Company

Services

Subscribe to Updates