Can an A.I. Make Plans? – Part 2 of 3
Today’s systems struggle to magine the future—but that may soon change.
By Cal Newport. The New Yorker. March 15, 2024.
It points toward something fundamental about how large language models think. Given the predicted importance of these tools in our lives, it’s worth taking a moment to pull on this thread. To better understand what to expect from A.I. systems in the future, in other words, we should start by better understanding what the dominant systems of today still cannot do.
How does the human brain tackle a math problem like the one that Bubeck used to stump GPT-4? In his M.I.T. talk, he described how our thinking might unfold. Once we recognize that our goal is to increase the sum on the right side of the equation by fourteen, we begin searching for promising options on the left side. “I look at the left, I see a seven,” Bubeck said. “Then I have kind of a eureka moment. Ah! Fourteen is seven times two. O.K., so if it’s seven times two, then I need to turn this four into a six.”
To us, this type of thinking is natural—it’s just how we figure things out. We might overlook, therefore, the degree to which such reasoning depends on anticipation. To solve our math problem, we have to look into the future and assess the impact of various changes that we might make. The reason the “7 x 4” quickly catches our attention is that we intuitively simulate what will happen if we increase the number of sevens. “It was through some kind of planning,” Bubeck concluded, of his solution process. “I was thinking ahead about what I’m gonna need.”
We deploy this cognitive strategy constantly in our daily lives. When holding a serious conversation, we simulate how different replies might shift the mood—just as, when navigating a supermarket checkout, we predict how slowly the various lines will likely progress. Goal-directed behavior more generally almost always requires us to look into the future to test how much various actions might move us closer to our objectives. This holds true whether we’re pondering life’s big decisions, such as whether to move or have kids, or answering the small but insistent queries that propel our workdays forward, such as which to-do-list item to tackle next.
Presumably, for an artificial intelligence to achieve something like human cognition, it would also need to master this kind of planning. In “2001: A Space Odyssey,” the self-aware supercomputer hal 9000 refuses Dave’s request to “open the pod bay doors” because, we can assume, it simulates the possible consequences of this action and doesn’t like what it discovers. The ability to consider the future is inextricable from our colloquial understanding of real intelligence. All of which points to the importance of GPT-4’s difficulty with Bubeck’s math equation. The A.I.’s struggle here was not a fluke. As it turns out, a growing body of research finds that these cutting-edge systems consistently fail at the fundamental task of thinking ahead.
Consider, for example, the research paper that Bubeck was presenting in his M.I.T. talk. He and his team at Microsoft Research ran a pre-release version of GPT-4 through a series of systematic intelligence tests. In most areas, the model’s performance was “remarkable.” But tasks that involved planning were a notable exception. The researchers provided GPT-4 with the rules of Towers of Hanoi, a simple puzzle game in which you move disks of various sizes between three rods, shifting them one at a time without ever placing a larger disk above a smaller one. They then asked the model to tackle a straightforward instance of the game that can be solved in five moves. GPT-4 provided an incorrect answer. As the researchers noted, success in this puzzle requires you to look ahead, asking whether your current move might lead you to a future dead end.