Лучшие подкасты про LessWrong (2025)

1
“‘Sharp Left Turn’ discourse: An opinionated review” by Steven Byrnes 1:01:13

2d ago1:01:13

1:01:13

Summary and Table of Contents The goal of this post is to discuss the so-called “sharp left turn”, the lessons that we learn from analogizing evolution to AGI development, and the claim that “capabilities generalize farther than alignment” … and the competing claims that all three of those things are complete baloney. In particular, Section 1 talks…

1
“Ten people on the inside” by Buck 7:06

3d ago7:06

7:06

(Many of these ideas developed in conversation with Ryan Greenblatt) In a shortform, I described some different levels of resources and buy-in for misalignment risk mitigations that might be present in AI labs: *The “safety case” regime.* Sometimes people talk about wanting to have approaches to safety such that if all AI developers followed these …

1
“Anomalous Tokens in DeepSeek-V3 and r1” by henry 18:37

3d ago18:37

18:37

“Anomalous”, “glitch”, or “unspeakable” tokens in an LLM are those that induce bizarre behavior or otherwise don’t behave like regular text. The SolidGoldMagikarp saga is pretty much essential context, as it documents the discovery of this phenomenon in GPT-2 and GPT-3. But, as far as I was able to tell, nobody had yet attempted to search for these…

1
“Tell me about yourself:LLMs are aware of their implicit behaviors” by Martín Soto, Owain_Evans 14:06

4d ago14:06

14:06

This is the abstract and introduction of our new paper, with some discussion of implications for AI Safety at the end. Authors: Jan Betley*, Xuchan Bao*, Martín Soto*, Anna Sztyber-Betley, James Chua, Owain Evans (*Equal Contribution). Abstract We study behavioral self-awareness — an LLM's ability to articulate its behaviors without requiring in-co…

1
“Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals” by johnswentworth, David Lorell 9:53

4d ago9:53

9:53

The Cake Imagine that I want to bake a chocolate cake, and my sole goal in my entire lightcone and extended mathematical universe is to bake that cake. I care about nothing else. If the oven ends up a molten pile of metal ten minutes after the cake is done, if the leftover eggs are shattered and the leftover milk spilled, that's fine. Baking that c…

1
“A Three-Layer Model of LLM Psychology” by Jan_Kulveit 18:04

5d ago18:04

18:04

This post offers an accessible model of psychology of character-trained LLMs like Claude. Epistemic Status This is primarily a phenomenological model based on extensive interactions with LLMs, particularly Claude. It's intentionally anthropomorphic in cases where I believe human psychological concepts lead to useful intuitions. Think of it as close…

1
“Training on Documents About Reward Hacking Induces Reward Hacking” by evhub 4:47

7d ago4:47

4:47

This is a link post.This is a blog post reporting some preliminary work from the Anthropic Alignment Science team, which might be of interest to researchers working actively in this space. We'd ask you to treat these results like those of a colleague sharing some thoughts or preliminary experiments at a lab meeting, rather than a mature paper. We r…

1
“AI companies are unlikely to make high-assurance safety cases if timelines are short” by ryan_greenblatt 24:33

8d ago24:33

24:33

One hope for keeping existential risks low is to get AI companies to (successfully) make high-assurance safety cases: structured and auditable arguments that an AI system is very unlikely to result in existential risks given how it will be deployed.[1] Concretely, once AIs are quite powerful, high-assurance safety cases would require making a thoro…

1
“Mechanisms too simple for humans to design” by Malmesbury 28:36

8d ago28:36

28:36

Cross-posted from Telescopic Turnip As we all know, humans are terrible at building butterflies. We can make a lot of objectively cool things like nuclear reactors and microchips, but we still can't create a proper artificial insect that flies, feeds, and lays eggs that turn into more butterflies. That seems like evidence that butterflies are incre…

1
“The Gentle Romance” by Richard_Ngo 0:34

9d ago0:34

0:34

This is a link post.A story I wrote about living through the transition to utopia. This is the one story that I've put the most time and effort into; it charts a course from the near future all the way to the distant stars. --- First published: January 19th, 2025 Source: https://www.lesswrong.com/posts/Rz4ijbeKgPAaedg3n/the-gentle-romance --- Narra…

1
“Quotes from the Stargate press conference” by Nikola Jurkovic 3:15

9d ago3:15

3:15

This is a link post.Present alongside President Trump: Sam Altman Larry Ellison (Oracle executive chairman and CTO) Masayoshi Son (Softbank CEO who believes he was born to realize ASI) President Trump: What we want to do is we want to keep [AI datacenters] in this country. China is a competitor and others are competitors. President Trump: I'm going…

1
“The Case Against AI Control Research” by johnswentworth 13:20

10d ago13:20

13:20

The AI Control Agenda, in its own words: … we argue that AI labs should ensure that powerful AIs are controlled. That is, labs should make sure that the safety measures they apply to their powerful models prevent unacceptably bad outcomes, even if the AIs are misaligned and intentionally try to subvert those safety measures. We think no fundamental…

1
“Don’t ignore bad vibes you get from people” by Kaj_Sotala 3:05

12d ago3:05

3:05

I think a lot of people have heard so much about internalized prejudice and bias that they think they should ignore any bad vibes they get about a person that they can’t rationally explain. But if a person gives you a bad feeling, don’t ignore that. Both I and several others who I know have generally come to regret it if they’ve gotten a bad feelin…

1
“[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty” by tandem 4:32

12d ago4:32

4:32

(Both characters are fictional, loosely inspired by various traits from various real people. Be careful about combining kratom and alcohol.) The original text contained 24 images which were described by AI. --- First published: January 7th, 2025 Source: https://www.lesswrong.com/posts/KfZ4H9EBLt8kbBARZ/fiction-comic-effective-altruism-and-rationali…

1
“Building AI Research Fleets” by bgold, Jesse Hoogland 9:49

13d ago9:49

9:49

From AI scientist to AI research fleet Research automation is here (1, 2, 3). We saw it coming and planned ahead, which puts us ahead of most (4, 5, 6). But that foresight also comes with a set of outdated expectations that are holding us back. In particular, research automation is not just about “aligning the first AI scientist”, it's also about t…

1
“What Is The Alignment Problem?” by johnswentworth 46:26

14d ago46:26

46:26

So we want to align future AGIs. Ultimately we’d like to align them to human values, but in the shorter term we might start with other targets, like e.g. corrigibility. That problem description all makes sense on a hand-wavy intuitive level, but once we get concrete and dig into technical details… wait, what exactly is the goal again? When we say w…

1
“Applying traditional economic thinking to AGI: a trilemma” by Steven Byrnes 5:55

17d ago5:55

5:55

Traditional economics thinking has two strong principles, each based on abundant historical data: Principle (A): No “lump of labor”: If human population goes up, there might be some wage drop in the very short term, because the demand curve for labor slopes down. But in the longer term, people will find new productive things to do, such that human …

1
“Passages I Highlighted in The Letters of J.R.R.Tolkien” by Ivan Vendrov 57:25

18d ago57:25

57:25

All quotes, unless otherwise marked, are Tolkien's words as printed in The Letters of J.R.R.Tolkien: Revised and Expanded Edition. All emphases mine. Machinery is Power is Evil Writing to his son Michael in the RAF: [here is] the tragedy and despair of all machinery laid bare. Unlike art which is content to create a new secondary world in the mind,…

1
“Parkinson’s Law and the Ideology of Statistics” by Benquo 14:50

19d ago14:50

14:50

The anonymous review of The Anti-Politics Machine published on Astral Codex X focuses on a case study of a World Bank intervention in Lesotho, and tells a story about it: The World Bank staff drew reasonable-seeming conclusions from sparse data, and made well-intentioned recommendations on that basis. However, the recommended programs failed, due t…

1
“Capital Ownership Will Not Prevent Human Disempowerment” by beren 25:11

20d ago25:11

25:11

Crossposted from my personal blog. I was inspired to cross-post this here given the discussion that this post on the role of capital in an AI future elicited. When discussing the future of AI, I semi-often hear an argument along the lines that in a slow takeoff world, despite AIs automating increasingly more of the economy, humanity will remain in …

1
“Activation space interpretability may be doomed” by bilalchughtai, Lucius Bushnaq 15:56

21d ago15:56

15:56

TL;DR: There may be a fundamental problem with interpretability work that attempts to understand neural networks by decomposing their individual activation spaces in isolation: It seems likely to find features of the activations - features that help explain the statistical structure of activation spaces, rather than features of the model - the feat…

1
“What o3 Becomes by 2028” by Vladimir_Nesov 8:40

22d ago8:40

8:40

Funding for $150bn training systems just turned less speculative, with OpenAI o3 reaching 25% on FrontierMath, 70% on SWE-Verified, 2700 on Codeforces, and 80% on ARC-AGI. These systems will be built in 2026-2027 and enable pretraining models for 5e28 FLOPs, while o3 itself is plausibly based on an LLM pretrained only for 8e25-4e26 FLOPs. The natur…

1
“What Indicators Should We Watch to Disambiguate AGI Timelines?” by snewman 25:26

22d ago25:26

25:26

(Cross-post from https://amistrongeryet.substack.com/p/are-we-on-the-brink-of-agi, lightly edited for LessWrong. The original has a lengthier introduction and a bit more explanation of jargon.) No one seems to know whether transformational AGI is coming within a few short years. Or rather, everyone seems to know, but they all have conflicting opini…

1
“How will we update about scheming?” by ryan_greenblatt 1:18:48

23d ago1:18:48

1:18:48

I mostly work on risks from scheming (that is, misaligned, power-seeking AIs that plot against their creators such as by faking alignment). Recently, I (and co-authors) released "Alignment Faking in Large Language Models", which provides empirical evidence for some components of the scheming threat model. One question that's really important is how…

1
“OpenAI #10: Reflections” by Zvi 20:22

24d ago20:22

20:22

This week, Altman offers a post called Reflections, and he has an interview in Bloomberg. There's a bunch of good and interesting answers in the interview about past events that I won’t mention or have to condense a lot here, such as his going over his calendar and all the meetings he constantly has, so consider reading the whole thing. Table of Co…

Подкасты, которые стоит послушать

Подкасты на тему LessWrong

Подкасты, которые стоит послушать

1
LessWrong (Curated & Popular)

LessWrong

1
“‘Sharp Left Turn’ discourse: An opinionated review” by Steven Byrnes 1:01:13

1
“Ten people on the inside” by Buck 7:06

1
“Anomalous Tokens in DeepSeek-V3 and r1” by henry 18:37

1
“Tell me about yourself:LLMs are aware of their implicit behaviors” by Martín Soto, Owain_Evans 14:06

1
“Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals” by johnswentworth, David Lorell 9:53

1
“A Three-Layer Model of LLM Psychology” by Jan_Kulveit 18:04

1
“Training on Documents About Reward Hacking Induces Reward Hacking” by evhub 4:47

1
“AI companies are unlikely to make high-assurance safety cases if timelines are short” by ryan_greenblatt 24:33

1
“Mechanisms too simple for humans to design” by Malmesbury 28:36

1
“The Gentle Romance” by Richard_Ngo 0:34

1
“Quotes from the Stargate press conference” by Nikola Jurkovic 3:15

1
“The Case Against AI Control Research” by johnswentworth 13:20

1
“Don’t ignore bad vibes you get from people” by Kaj_Sotala 3:05

1
“[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty” by tandem 4:32

1
“Building AI Research Fleets” by bgold, Jesse Hoogland 9:49

1
“What Is The Alignment Problem?” by johnswentworth 46:26

1
“Applying traditional economic thinking to AGI: a trilemma” by Steven Byrnes 5:55

1
“Passages I Highlighted in The Letters of J.R.R.Tolkien” by Ivan Vendrov 57:25

1
“Parkinson’s Law and the Ideology of Statistics” by Benquo 14:50

1
“Capital Ownership Will Not Prevent Human Disempowerment” by beren 25:11

1
“Activation space interpretability may be doomed” by bilalchughtai, Lucius Bushnaq 15:56

1
“What o3 Becomes by 2028” by Vladimir_Nesov 8:40

1
“What Indicators Should We Watch to Disambiguate AGI Timelines?” by snewman 25:26

1
“How will we update about scheming?” by ryan_greenblatt 1:18:48

1
“OpenAI #10: Reflections” by Zvi 20:22

Краткое руководство