OpenAI’s ‘Strawberry' Model Would Be Capable of ‘Deep Research’

ChatGPT-maker OpenAI is working on a groundbreaking new artificial intelligence model – dubbed ‘Strawberry’ – which would be capable of “deep research”.

The new model would be able to not just generate answers to queries but also to plan ahead enough to navigate the internet autonomously, according to a document seen by Reuters.

The project, details of which have not been previously reported, comes as the Microsoft-backed startup races to develop models capable of delivering advanced reasoning capabilities.

It’s not clear how close Strawberry is to being publicly available and how it works is a tightly kept secret even within OpenAI, a source said.

Also on AF: Cambo-Chinese Firm Tied to Crypto Scams, Money Laundering

Asked about Strawberry, an OpenAI company spokesperson said in a statement: “We want our AI models to see and understand the world more like we do. Continuous research into new AI capabilities is a common practice in the industry, with a shared belief that these systems will improve in reasoning over time.”

The spokesperson did not directly address questions about the Strawberry project, which was formerly known as Q*, and was first reported last year.

Two sources described viewing earlier this year what OpenAI staffers told them were Q* demos, capable of answering tricky science and math questions out of reach of today’s commercially-available models.

A different source briefed on the matter said OpenAI has tested AI internally that scored over 90% on a MATH dataset, a benchmark of championship math problems.

But while large language models can already summarise dense texts and compose elegant prose far more quickly than any human, the technology often falls short on common sense problems whose solutions seem intuitive to people, like recognising logical fallacies and playing tic-tac-toe.

When the model encounters these kinds of problems, it often “hallucinates” bogus information.

OpenAI CEO Sam Altman said earlier this year that in AI “the most important areas of progress will be around reasoning ability”.

Other companies like Google, Meta and Microsoft are likewise experimenting with different techniques to improve reasoning in AI models, as are most academic labs that perform AI research.

‘Long-Horizon Tasks’

Researchers differ, however, on whether large language models (LLMs) are capable of incorporating ideas and long-term planning into how they do prediction. For instance, one of the pioneers of modern AI, Yann LeCun, who works at Meta, has frequently said that LLMs are not capable of humanlike reasoning.

Strawberry is a key component of OpenAI’s plan to overcome those challenges, the source familiar with the matter said.

Strawberry includes a specialised way of what is known as “post-training” OpenAI’s generative AI models, or adapting the base models to hone their performance in specific ways after they have already been “trained” on reams of generalised data, one of the sources said.

The post-training phase of developing a model involves methods like “fine-tuning,” a process used on nearly all language models today that comes in many flavours, such as having humans give feedback to the model based on its responses and feeding it examples of good and bad answers.

Among the capabilities OpenAI is aiming Strawberry at is performing long-horizon tasks (LHT), the document says, referring to complex tasks that require a model to plan ahead and perform a series of actions over an extended period of time.

To do so, OpenAI is creating, training and evaluating the models on what the company calls a “deep-research” dataset, according to the OpenAI internal documentation. Reuters was unable to determine what is in that dataset or how long an extended period would mean.

OpenAI specifically wants its models to use these capabilities to conduct research by browsing the web autonomously with the assistance of a “CUA,” or a computer-using agent, that can take actions based on its findings, according to the document and one of the sources.

OpenAI also plans to test its capabilities on doing the work of software and machine learning engineers.

Reuters with additional editing by Sean O’Meara

AI is ‘Effectively Useless,’ Veteran Analyst Warns – Fortune

China Far Ahead of the US in Generative AI Patents

US Ban on Investment Not Good For AI Sector, China Says

Sean O'Meara

Sean O'Meara is an Editor at Asia Financial. He has been a newspaper man for more than 30 years, working at local, regional and national titles in the UK as a writer, sub-editor, page designer and print editor. A football, cricket and rugby fan, he has a particular interest in sports finance.