LW - The Checklist: What Succeeding at AI Safety Will Involve by Sam Bowman

The Nonlinear Library

Контент предоставлен The Nonlinear Fund. Весь контент подкастов, включая эпизоды, графику и описания подкастов, загружается и предоставляется непосредственно компанией The Nonlinear Fund или ее партнером по платформе подкастов. Если вы считаете, что кто-то использует вашу работу, защищенную авторским правом, без вашего разрешения, вы можете выполнить процедуру, описанную здесь https://ru.player.fm/legal.

12d ago 35:25

MP3•Главная эпизода

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Checklist: What Succeeding at AI Safety Will Involve, published by Sam Bowman on September 3, 2024 on LessWrong.
Crossposted by habryka with Sam's permission. Expect lower probability for Sam to respond to comments here than if he had posted it.
Preface
This piece reflects my current best guess at the major goals that Anthropic (or another similarly positioned AI developer) will need to accomplish to have things go well with the development of broadly superhuman AI. Given my role and background, it's disproportionately focused on technical research and on averting emerging catastrophic risks.
For context, I lead a technical AI safety research group at Anthropic, and that group has a pretty broad and long-term mandate, so I spend a lot of time thinking about what kind of safety work we'll need over the coming years.
This piece is my own opinionated take on that question, though it draws very heavily on discussions with colleagues across the organization: Medium- and long-term AI safety strategy is the subject of countless leadership discussions and Google docs and lunch-table discussions within the organization, and this piece is a snapshot (shared with permission) of where those conversations sometimes go.
To be abundantly clear: Nothing here is a firm commitment on behalf of Anthropic, and most people at Anthropic would disagree with at least a few major points here, but this can hopefully still shed some light on the kind of thinking that motivates our work.
Here are some of the assumptions that the piece relies on. I don't think any one of these is a certainty, but all of them are plausible enough to be worth taking seriously when making plans:
Broadly human-level AI is possible. I'll often refer to this as transformative AI (or TAI), roughly defined as AI that could form as a drop-in replacement for humans in all remote-work-friendly jobs, including AI R&D.[1]
Broadly human-level AI (or TAI) isn't an upper bound on most AI capabilities that matter, and substantially superhuman systems could have an even greater impact on the world along many dimensions.
If TAI is possible, it will probably be developed this decade, in a business and policy and cultural context that's not wildly different from today.
If TAI is possible, it could be used to dramatically accelerate AI R&D, potentially leading to the development of substantially superhuman systems within just a few months or years after TAI.
Powerful AI systems could be extraordinarily destructive if deployed carelessly, both because of new emerging risks and because of existing issues that become much more acute. This could be through misuse of weapons-related capabilities, by disrupting important balances of power in domains like cybersecurity or surveillance, or by any of a number of other means.
Many systems at TAI and beyond, at least under the right circumstances, will be capable of operating more-or-less autonomously for long stretches in pursuit of big-picture, real-world goals. This magnifies these safety challenges.
Alignment - in the narrow sense of making sure AI developers can confidently steer the behavior of the AI systems they deploy - requires some non-trivial effort to get right, and it gets harder as systems get more powerful.
Most of the ideas here ultimately come from outside Anthropic, and while I cite a few sources below, I've been influenced by far more writings and people than I can credit here or even keep track of.
Introducing the Checklist
This lays out what I think we need to do, divided into three chapters, based on the capabilities of our strongest models:
Chapter 1: Preparation
You are here. In this period, our best models aren't yet TAI. In the language of Anthropic's RSP, they're at AI Safety Level 2 (ASL-2), ASL-3, or maybe the early stages of ASL-4. Most of the work that we hav...

2436 эпизодов

#Podcasting Education #The Nonlinear Fund