Not long after the news that OpenAI could be delaying the release of its next flagship model, due to concerns that it may not be as performant as expected, came out, there have been even more reports of firms experiencing slowed performance gains in their models. Although few people have decided to speak publicly about a possible performance plateau, there have been some exceptions, like OpenAI co-founder and Safe Superintelligence founder Ilya Sutskever.
Recently, Sutskever told Reuters that the performance increases obtained from scaling up pre-training have plateaued. Pre-training is the stage in current training approaches where models are given massive troves of data from which they can extract patterns and structures. The current approaches to pre-training presuppose the truth of the scaling "laws", empirical observations that have led researchers to conclude there is a correlation between giving models access to more resources and the corresponding improvement in their performance.
These observations began to be treated as laws partially because the AI community expects them to hold in the long run. But now, one of Sutskever's statements to Reuters continues to echo: "Scaling the right thing matters more now than ever." In a recent (five-hour!) podcast interview with Lex Fridman, Anthropic CEO Dario Amodei remained optimistic that some version of the scaling laws will continue to hold. However, he quickly clarified that he did not mean scaling only computing power, but also components such as model and dataset sizes.
Notably, Anthropic's Opus 3.5, the next iteration of the largest model in the Claude family, has had its launch delayed for some time now, with the company betting on enhancing Sonnet 3.5, its strongest-performing model, and launching Haiku 3.5, which, Anthropic notes, performs comparably to Opus 3 while being a fraction of its size. Amodei highlighted this during the interview, and only mentioned in passing that Opus 3.5 will be available "at some point".
Like OpenAI, Anthropic and Google seem to be turning to what is shaping as the best ways forward. On the one hand, there is "test-time compute", which roughly amounts to giving the models more time to evaluate several possible outputs before generating a response. OpenAI's o1 is the first model to feature this technique, but other AI chatbots will certainly follow.
On the other hand, there is the so-called 'agentic' AI, where models are expected to take on entire workflows, such as operating computer programs, autonomously. Anthropic recently took a stab at the concept with its computer use API. The API enables Sonnet to perform basic tasks on a computer, like moving a cursor around screens, clicking on appropriate elements, and inputting text through a virtual keyboard.
Comments