Episode 1909 - May 29 - Tiếng Anh - Phần 1 của 3 - AI có thể lập kế hoạch không - Vina Technology at AI time

24 Thg05, 24

Vina Technology at AI time - Công nghệ Việt Nam thời AI

Can an A.I. Make Plans? – Part 1 of 3

Today’s systems struggle to magine the future—but that may soon change.

By Cal Newport. The New Yorker. March 15, 2024.

Last summer, AdamYedidia, a user on a Web forum called LessWrong, published a post titled “Chess as a Case Study in Hidden Capabilities in ChatGPT.” He started by noting that the Internet is filled with funny videos of ChatGPT playing bad chess: in one popular clip, the A.I. confidently and illegally moves a pawn backward. But many of these videos were made using the original version of OpenAI’s chatbot, which was released to the public in late November, 2022, and was based on the GPT-3.5 large language model. Last March, OpenAI introduced an enhanced version of ChatGPT based on the more powerful GPT-4. As the post demonstrated, this new model, if prompted correctly, could play a surprisingly decent game of chess, achieving something like an Elo rating of 1000—better than roughly fifty per cent of ranked players. “ChatGPT has fully internalized the rules of chess,” he asserted. It was “not relying on memorization or other, shallower patterns.”

This distinction matters. When large language models first vaulted into the public consciousness, scientists and journalists struggled to find metaphors to help explain their eerie facility with text. Many eventually settled on the idea that these models “mix and match” the incomprehensibly large quantities of text they digest during their training. When you ask ChatGPT to write a poem about the infinitude of prime numbers, you can assume that, during its training, it encountered many examples of both prime-number proofs and rhyming poetry, allowing it to combine information from the former with the patterns observed in the latter. (“I’ll start by noting Euclid’s proof, / Which shows that primes aren’t just aloof.”) Similarly, when you ask a large language model, or L.L.M., to summarize an earnings report, it will know where the main points in such documents can typically be found, and then will rearrange them to create a smooth recapitulation. In this view, these technologies play the role of redactor, helping us to make better use of our existing thoughts.

But after the advent of GPT-4—which was soon followed by other next-generation A.I. models, including Google’s PaLM-2 and Anthropic’s Claude 2.1—the mix-and-match metaphor began to falter. As the LessWrong post emphasizes, a large language model that can play solid novice-level chess probably isn’t just copying moves that it encountered while ingesting books about chess. It seems likely that, in some hard-to-conceptualize sense, it “understands” the rules of the game—a deeper accomplishment. Other examples of apparent L.L.M. reasoning soon followed, including acing SAT exams, solving riddles, programming video games from scratch, and explaining jokes. The implications here are potentially profound. During a talk at M.I.T., Sébastien Bubeck, a Microsoft researcher who was part of a team that systematically studied the abilities of GPT-4, described these developments: “If your perspective is, ‘What I care about is to solve problems, to think abstractly, to comprehend complex ideas, to reason on new elements that arrive at me,’ then I think you have to call GPT-4 intelligent,” he said.

Yet intertwined with this narrative of uneasy astonishment is an intriguing counterpoint. There remain some surprisingly simple tasks that continue to stymie L.L.M.s. In his M.I.T. talk, Bubeck described giving GPT-4 the math equation “7 x 4 + 8 x 8 = 92.” He then asked it to modify exactly one number on the left-hand side so that the equation would instead evaluate to 106. For a person, this problem is straightforward: change “7 x 4” to “7 x 6.” But GPT-4 couldn’t figure it out, and provided an answer that was clearly wrong. “The arithmetic is shaky,” Bubeck said.

How can these powerful systems beat us in chess but falter on basic math? This paradox reflects more than just an idiosyncratic design.

Bình luận