Episode 1907 - May 29 - Tiếng Anh - AI là một hộp đen - Anthropic đã tìm ra cách để nhìn vào bên trong - Vina Technology at AI time - Lê Quang Văn | Podcast - Nhac.vn

Episode 1907 - May 29 - Tiếng Anh - AI là một hộp đen - Anthropic đã tìm ra cách để nhìn vào bên trong - Vina Technology at AI time
24 Thg05, 24

AI Is a Black Box - Anthropic Figured Out a Way to Look Inside

By Steven Levy. WIRED. May 21, 2024 11:00 AM

What goes on in artificial neural networks work is largely a mystery, even to their creators. But researchers from Anthropic have caught a glimpse.

For the past decade, AI researcher Chris Olah has been obsessed with artificial neural networks. One question in particular engaged him, and has been the center of his work, first at Google Brain, then OpenAI, and today at AI startup Anthropic, where he is a cofounder. “What's going on inside of them?” he says. “We have these systems, we don't know what's going on. It seems crazy.”

That question has become a core concern now that generative AI has become ubiquitous. Large language models like ChatGPT, Gemini, and Anthropic’s own Claude have dazzled people with their language prowess and infuriated people with their tendency to make things up. Their potential to solve previously intractable problems enchants techno-optimists. But LLMs are strangers in our midst. Even the people who build them don’t know exactly how they work, and massive effort is required to create guardrails to prevent them from churning out bias, misinformation, and even blueprints for deadly chemical weapons. If the people building the models knew what happened inside these “black boxes,'' it would be easier to make them safer.

Olah believes that we’re on the path to this. He leads an Anthropic team that has peeked inside that black box. Essentially, they are trying to reverse engineer large language models to understand why they come up with specific outputs—and, according to a paper released today, they have made significant progress.

Maybe you’ve seen neuroscience studies that interpret MRI scans to identify whether a human brain is entertaining thoughts of a plane, a teddy bear, or a clock tower. Similarly, Anthropic has plunged into the digital tangle of the neural net of its LLM, Claude, and pinpointed which combinations of its crude artificial neurons evoke specific concepts, or “features.” The company’s researchers have identified the combination of artificial neurons that signify features as disparate as burritos, semicolons in programming code, and—very much to the larger goal of the research—deadly biological weapons. Work like this has potentially huge implications for AI safety: If you can figure out where danger lurks inside an LLM, you are presumably better equipped to stop it.

I met with Olah and three of his colleagues, among 18 Anthropic researchers on the “mechanistic interpretability” team. They explain that their approach treats artificial neurons like letters of Western alphabets, which don’t usually have meaning on their own but can be strung together sequentially to have meaning. “C doesn’t usually mean something,” says Olah. “But car does.” Interpreting neural nets by that principle involves a technique called dictionary learning, which allows you to associate a combination of neurons that, when fired in unison, evoke a specific concept, referred to as a feature.

“It’s sort of a bewildering thing,” says Josh Batson, an Anthropic research scientist. “We’ve got on the order of 17 million different concepts [in an LLM], and they don't come out labeled for our understanding. So we just go look, when did that pattern show up?”

Last year, the team began experimenting with a tiny model that uses only a single layer of neurons. (Sophisticated LLMs have dozens of layers.) The hope was that in the simplest possible setting they could discover patterns that designate features. They ran countless experiments with no success. “We tried a whole bunch of stuff, and nothing was working. It looked like a bunch of random garbage,” says Tom Henighan, a member of Anthropic’s technical staff. Then a run dubbed “Johnny”—each experiment was assigned a random name—began associating neural patterns with concepts that appeared in its outputs.

Bình luận
Danh sách
Episode 2179 - June 28 - Tiếng Anh - Tin Tức Công nghệ Thông tin – Ngày 24 tháng 6, 2024 - Vina Technology at AI time
Episode 2179 - June 28 - Tiếng Anh - Tin Tức Công nghệ Thông tin – Ngày 24 tháng 6, 2024 - Vina Technology at AI time
26 Thg06, 24 • 12ph
Episode 2178 - June 28 - Tin Tức Công nghệ Thông tin – Ngày 24 tháng 6, 2024 - Vina Technology at AI time
Episode 2178 - June 28 - Tin Tức Công nghệ Thông tin – Ngày 24 tháng 6, 2024 - Vina Technology at AI time
26 Thg06, 24 • 13ph
Episode 2177 - June 28 - Tiếng Anh - Phần 2 của 2 - Các bộ lạc mới kỳ lạ trong các nhóm xã hội - Vina Technology at AI time
Episode 2177 - June 28 - Tiếng Anh - Phần 2 của 2 - Các bộ lạc mới kỳ lạ trong các nhóm xã hội - Vina Technology at AI time
26 Thg06, 24 • 10ph
Episode 2176 - June 28 - Phần 2 của 2 - Các bộ lạc mới kỳ lạ trong các nhóm xã hội - Vina Technology at AI time
Episode 2176 - June 28 - Phần 2 của 2 - Các bộ lạc mới kỳ lạ trong các nhóm xã hội - Vina Technology at AI time
26 Thg06, 24 • 12ph
Episode 2175 - June 28 - Phần 4 của 4 – Chương 1 – AI trong Fin Tech - Vina Technology at AI time
Episode 2175 - June 28 - Phần 4 của 4 – Chương 1 – AI trong Fin Tech - Vina Technology at AI time
26 Thg06, 24 • 09ph
Episode 2174 - June 28 - Phần 3 của 4 – Chương 1 – AI trong Fin Tech - Vina Technology at AI time
Episode 2174 - June 28 - Phần 3 của 4 – Chương 1 – AI trong Fin Tech - Vina Technology at AI time
26 Thg06, 24 • 09ph
Episode 2173 - June 28 - Phần 2 của 4 – Chương 1 – AI trong Fin Tech - Vina Technology at AI time
Episode 2173 - June 28 - Phần 2 của 4 – Chương 1 – AI trong Fin Tech - Vina Technology at AI time
26 Thg06, 24 • 09ph
Episode 2172 - June 28 - Phần 1 của 4 – Chương 1 – AI trong Fin Tech - Vina Technology at AI time
Episode 2172 - June 28 - Phần 1 của 4 – Chương 1 – AI trong Fin Tech - Vina Technology at AI time
26 Thg06, 24 • 10ph
Episode 2171 - June 27 - Tiếng Anh - Phần 1 của 2 - Các bộ lạc mới kỳ lạ trong các nhóm xã hội - Vina Technology at AI time
Episode 2171 - June 27 - Tiếng Anh - Phần 1 của 2 - Các bộ lạc mới kỳ lạ trong các nhóm xã hội - Vina Technology at AI time
24 Thg06, 24 • 11ph
Episode 2170 - June 27 - Phần 1 của 2 - Các bộ lạc mới kỳ lạ trong các nhóm xã hội - Vina Technology at AI time
Episode 2170 - June 27 - Phần 1 của 2 - Các bộ lạc mới kỳ lạ trong các nhóm xã hội - Vina Technology at AI time
24 Thg06, 24 • 13ph
Episode 2169 - June 27 - Phần 2 của 2 – Chương 0 – AI trong Fin Tech - Vina Technology at AI time
Episode 2169 - June 27 - Phần 2 của 2 – Chương 0 – AI trong Fin Tech - Vina Technology at AI time
24 Thg06, 24 • 08ph
Episode 2168 - June 27 - Phần 1 của 2 – Chương 0 – AI trong Fin Tech - Vina Technology at AI time
Episode 2168 - June 27 - Phần 1 của 2 – Chương 0 – AI trong Fin Tech - Vina Technology at AI time
24 Thg06, 24 • 08ph
Episode 2167 - June 27 - Phụ nữ trong AI - Charlette N’Guessan - Vina Technology at AI time
Episode 2167 - June 27 - Phụ nữ trong AI - Charlette N’Guessan - Vina Technology at AI time
24 Thg06, 24 • 11ph
Episode 2166 - June 27 - Phụ nữ trong AI - Irene Solaiman - Vina Technology at AI time
Episode 2166 - June 27 - Phụ nữ trong AI - Irene Solaiman - Vina Technology at AI time
24 Thg06, 24 • 07ph
Episode 2165 - June 27 - Tiếng Anh - Bản Tin Công nghệ Thông tin – Ngày 23 tháng 6, 2024 - Vina Technology at AI time
Episode 2165 - June 27 - Tiếng Anh - Bản Tin Công nghệ Thông tin – Ngày 23 tháng 6, 2024 - Vina Technology at AI time
24 Thg06, 24 • 10ph
Nâng cấp tài khoản
Quý khách vui lòng nâng cấp tài khoản để nghe podcast này