Why Anthropic calls the new Claude 3 its ‘most intelligent’ AI model yet

Artificial Intelligence start-up Anthropic announced its latest family of AI models called Claude 3 on Monday (March 4), saying it “sets new industry benchmarks across a wide range of cognitive tasks”.

The family includes three state-of-the-art AI models in the ascending order of capabilities – Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. The company claims that each model offers an increasingly powerful performance, offering a balance between intelligence, speed, and cost based on their specific use case.

Anthropic was founded by former members of OpenAI, the company behind ChatGPT. Its co-founder and president Daniela Amodei said in an interview with Bloomberg Technology that the new models are twice as likely to answer questions correctly. This refers to how often models generate incorrect information as answers, as happens with similar AI chatbots.

Additionally, she said Anthropic was working on the challenges that businesses face when integrating AI into their workflows.

What is Claude 3?

Claude is a group of large language models (LLMs) developed by Anthropic. The chatbot is capable of handling text, voice messages, and documents. Reviews by The Indian Express have shown that the chatbot is capable of generating faster, contextual responses compared to its peers.

Explained | Check out Claude, the AI chatbot from Anthropic, and how to use it

Among the new releases, Claude 3 Opus is the most powerful model, Claude 3 Sonnet is the middle model that is capable and price competitive, and Claude 3 Haiku is relevant for any use case that requires instant responses.

Claude Sonnet powers the Claude.ai chatbot for free at present and users only need an email sign-in. However, Opus is only available through Anthropic’s web chat interface and if a user is subscribed to the Claude Pro service on the Anthropic website. It is available for $20 a month.

Story continues below this ad

All new models come with a 2,00,000-token window, signifying possibly better performance, accuracy and the capacity to input more information in a user prompt.

Explained | Context windows in AI chatbots and how they help in prompt recall

How did Claude 3 perform?

Based on the comparison of Claude 3 with its peers, it seems the Anthropic may have caught up with OpenAI. It had surpassed many AI models with the launch of its GPT-4 Turbo.

However, this analysis is purely based on the benchmark scores that Anthropic shared in its announcement. Several experts have suggested that the presentation of AI benchmarks is quite likely to be cherry-picked by the makers.

Claude 3 reportedly demonstrates advanced performance across cognitive tasks such as reasoning, expert knowledge, mathematics, and language fluency. Despite the lack of consensus over whether LLMs can really “know” or “reason,” the AI research community commonly uses these terms.

Story continues below this ad

The company says that the Opus model exhibits “near-human levels of comprehension and fluency on complex tasks.”

Also Read | Google introduces Gemma open source AI models: What does it mean for responsible AI?

While this is a big claim, the scores show that Claude 3 Opus has shown some near-human performance on specific benchmarks. However, this doesn’t mean that Opus possesses general intelligence like humans.

Claude 3 vs GPT-4

Claude 3 Opus has surpassed GPT-4 on as many as 10 AI benchmarks, which include MMLU (undergraduate level knowledge), HumanEval (Coding), HellaSwag (common knowledge), and GSM8K (grade school maths).

On the benchmark scores, Claude 3 beats its peers narrowly. For example, in the five-shot MMLU trial, Claude 3 secured 86.8 per cent while GPT-4 obtained 86.4 per cent.

Story continues below this ad

Benchmark scores. (Image source: Anthropic)

On the other hand, significant gaps can also be seen like in Multilingual Maths (MGSM) Claude 3 scored 90.7 per cent, while GPT-4 closed at 74.5 per cent.

While these scores look great on paper, how they translate for users is difficult to predict. Moreover, experts say that LLM benchmarks should be treated with caution. Even though one cannot gauge their ease of use, the scores in themselves are significant as they overtake GPT-4.

Claude 3 has also shown improvements in terms of analysis, forecasting, content creation, multilingual conversations, code generation, etc. Anthropic claimed that the new model family also comes with enhanced vision capabilities, allowing Claude 3 to process photos, charts, and diagrams, much like GPT-4V.

Limitations of Claude 3

According to those who had early access to the model, Claude 3 performs well in tasks such as answering factual questions and optical character recognition (OCR), meaning the ability to extract text from images. Reportedly, the new model is good at following instructions and completing tasks like writing Shakespearean sonnets.

Story continues below this ad

However, it struggles with complex reasoning and mathematical problems at times. It also exhibited biases in its responses, such as favouring a certain racial group over others.

In the past too, other AI models have faced similar problems. Google’s AI chatbot Gemini was criticised after it showed racial bias and historical inaccuracies. It refused to generate images of white individuals and depicted those individuals as people of colour.

Anthropic has emphasised the safety features of Claude 3, especially its refusal to generate harmful or illegal content.

The company was also among the first to bring about Constitutional AI. Developers laid down a set of values that the system must follow so that it undertakes politically and socially responsible actions.

Story continues below this ad

As of now the Claude 3 is the most expensive model on the market, but Anthropic has plans to release affordable versions soon. Based on the early reports, benchmarks, and confidence from the AI community, Claude 3 seems to be a significant step forward in the development of LLMs.

Bijin Jose

Bijin Jose, an Assistant Editor at Indian Express Online in New Delhi, is a technology journalist with a portfolio spanning various prestigious publications. Starting as a citizen journalist with The Times of India in 2013, he transitioned through roles at India Today Digital and The Economic Times, before finding his niche at The Indian Express. With a BA in English from Maharaja Sayajirao University, Vadodara, and an MA in English Literature, Bijin's expertise extends from crime reporting to cultural features. With a keen interest in closely covering developments in artificial intelligence, Bijin provides nuanced perspectives on its implications for society and beyond. ... Read More

Tags:

artificial intelligence Explained Sci-Tech Express Explained

Why Anthropic calls the new Claude 3 its ‘most intelligent’ AI model yet

Anthropic was founded by former members of OpenAI, the company behind ChatGPT. It says its new family of AI models is capable of advanced performance, beating the likes of GPT-4 on some parameters.

What is Claude 3?

How did Claude 3 perform?

Claude 3 vs GPT-4

Limitations of Claude 3