Can bigger-is-better ‘scaling laws’ keep AI improving forever? History says we can’t be too sure

Can bigger-is-better ‘scaling laws’ keep AI improving forever? History says we can’t be too sure

2 minutes, 10 seconds Read

OpenAI chief executive Sam Altman – perhaps the most prominent face of the artificial intelligence (AI) boom that accelerated with the launch of ChatGPT in 2022 – loves scaling laws.

These widely admired rules of thumb linking the size of an AI model with its capabilities inform much of the headlong rush among the AI industry to buy up powerful computer chips, build unimaginably large data centres, and re-open shuttered nuclear plants.

As Altman argued in a blog post earlier this year, the thinking is that the “intelligence” of an AI model “roughly equals the log of the resources used to train and run it” – meaning you can steadily produce better performance by exponentially increasing the scale of data and computing power involved.

First observed in 2020 and further refined in 2022, the scaling laws for large language models (LLMs) come from drawing lines on charts of experimental data. For engineers, they give a simple formula that tells you how big to build the next model and what performance increase to expect.

Will the scaling laws keep on scaling as AI models get bigger and bigger? AI companies are betting hundreds of billions of dollars that they will – but history suggests it is not always so simple.

Scaling laws aren’t just for AI

Scaling laws can be wonderful. Modern aerodynamics is built on them, for example.

Using an elegant piece of mathematics called the Buckingham π theorem, engineers discovered how to compare small models in wind tunnels or test basins with full-scale planes and ships by making sure some key numbers matched up.

Those scaling ideas inform the design of almost everything that flies or floats, as well as industrial fans and pumps.

Another famous scaling idea underpinned the boom decades of the silicon chip revolution. Moore’s law – the idea that the number of the tiny switches called transistors on a microchip would double every two years or so – helped designers create the small, powerful computing technology we have today.

But there’s a catch: not all “scaling laws” are laws of nature. Some are purely mathematical and can hold indefinitely. Others are just lines fitted to data that work beautifully until you stray too far from the circumstances where they were measured or designed.

When scaling laws break down

History is littered with painful reminders of scaling laws that broke. A classic example is the collapse of the Tacoma Narrows Bridge in 1940.

The bridge was designed by scaling up what had worked for smaller bridges to something longer and slimmer. Engineers assumed the same scaling arguments would hold: if a certain ratio of stiffness to

Read More

Similar Posts