Looks like the publisher may have taken this series offline or changed its URL. Please contact support if you believe it should be working, the feed URL is invalid, or you have any other concerns about it.
Работайте офлайн с приложением Player FM !
Toy Models of Superposition
Архивные серии ("Канал не активен" status)
When? This feed was archived on February 21, 2025 21:08 (
Why? Канал не активен status. Нашим серверам не удалось получить доступ к каналу подкаста в течении длительного периода времени.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 424087973 series 3498845
It would be very convenient if the individual neurons of artificial neural networks corresponded to cleanly interpretable features of the input. For example, in an “ideal” ImageNet classifier, each neuron would fire only in the presence of a specific visual feature, such as the color red, a left-facing curve, or a dog snout. Empirically, in models we have studied, some of the neurons do cleanly map to features. But it isn't always the case that features correspond so cleanly to neurons, especially in large language models where it actually seems rare for neurons to correspond to clean features. This brings up many questions. Why is it that neurons sometimes align with features and sometimes don't? Why do some models and tasks have many of these clean neurons, while they're vanishingly rare in others?
In this paper, we use toy models — small ReLU networks trained on synthetic data with sparse input features — to investigate how and when models represent more features than they have dimensions. We call this phenomenon superposition . When features are sparse, superposition allows compression beyond what a linear model would do, at the cost of "interference" that requires nonlinear filtering.
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Разделы
1. Toy Models of Superposition (00:00:00)
2. Definitions and Motivation: Features, Directions, and Superposition (00:00:11)
3. Empirical Phenomena (00:03:59)
4. What are Features? (00:06:08)
5. Features as Directions (00:09:20)
6. Privileged vs Non-privileged Bases (00:13:06)
7. The Superposition Hypothesis (00:15:38)
8. Summary: A Hierarchy of Feature Properties (00:20:08)
9. Demonstrating Superposition (00:21:45)
10. Experiment Setup (00:22:25)
11. Basic Results (00:29:40)
12. Mathematical Understanding (00:35:44)
85 эпизодов
Архивные серии ("Канал не активен" status)
When?
This feed was archived on February 21, 2025 21:08 (
Why? Канал не активен status. Нашим серверам не удалось получить доступ к каналу подкаста в течении длительного периода времени.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 424087973 series 3498845
It would be very convenient if the individual neurons of artificial neural networks corresponded to cleanly interpretable features of the input. For example, in an “ideal” ImageNet classifier, each neuron would fire only in the presence of a specific visual feature, such as the color red, a left-facing curve, or a dog snout. Empirically, in models we have studied, some of the neurons do cleanly map to features. But it isn't always the case that features correspond so cleanly to neurons, especially in large language models where it actually seems rare for neurons to correspond to clean features. This brings up many questions. Why is it that neurons sometimes align with features and sometimes don't? Why do some models and tasks have many of these clean neurons, while they're vanishingly rare in others?
In this paper, we use toy models — small ReLU networks trained on synthetic data with sparse input features — to investigate how and when models represent more features than they have dimensions. We call this phenomenon superposition . When features are sparse, superposition allows compression beyond what a linear model would do, at the cost of "interference" that requires nonlinear filtering.
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Разделы
1. Toy Models of Superposition (00:00:00)
2. Definitions and Motivation: Features, Directions, and Superposition (00:00:11)
3. Empirical Phenomena (00:03:59)
4. What are Features? (00:06:08)
5. Features as Directions (00:09:20)
6. Privileged vs Non-privileged Bases (00:13:06)
7. The Superposition Hypothesis (00:15:38)
8. Summary: A Hierarchy of Feature Properties (00:20:08)
9. Demonstrating Superposition (00:21:45)
10. Experiment Setup (00:22:25)
11. Basic Results (00:29:40)
12. Mathematical Understanding (00:35:44)
85 эпизодов
Все серии
×
1 Introduction to Mechanistic Interpretability 11:45

1 We Need a Science of Evals 20:12

1 Illustrating Reinforcement Learning from Human Feedback (RLHF) 22:32

1 Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback 32:19

1 Constitutional AI Harmlessness from AI Feedback 1:01:49

1 Intro to Brain-Like-AGI Safety 1:02:10

1 Chinchilla’s Wild Implications 24:57

1 Deep Double Descent 8:27

1 Eliciting Latent Knowledge 1:00:27

1 Empirical Findings Generalize Surprisingly Far 11:32

1 Low-Stakes Alignment 13:56

1 Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions 16:39

1 Least-To-Most Prompting Enables Complex Reasoning in Large Language Models 16:08

1 ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation 16:08

1 Imitative Generalisation (AKA ‘Learning the Prior’) 18:14
Добро пожаловать в Player FM!
Player FM сканирует Интернет в поисках высококачественных подкастов, чтобы вы могли наслаждаться ими прямо сейчас. Это лучшее приложение для подкастов, которое работает на Android, iPhone и веб-странице. Зарегистрируйтесь, чтобы синхронизировать подписки на разных устройствах.