Работайте офлайн с приложением Player FM !
LW - How to Give in to Threats (without incentivizing them) by Mikhail Samin
Fetch error
Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on October 09, 2024 12:46 ()
What now? This series will be checked again in the next hour. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.
Manage episode 439715735 series 3314709
TL;DR: using a simple mixed strategy, LDT can give in to threats, ultimatums, and commitments - while incentivizing cooperation and fair[1] splits instead.
This strategy made it much more intuitive to many people I've talked to that smart agents probably won't do weird everyone's-utility-eating things like threatening each other or participating in commitment races.
1. The Ultimatum game
This part is taken from planecrash[2][3].
You're in the Ultimatum game. You're offered 0-10 dollars. You can accept or reject the offer. If you accept, you get what's offered, and the offerer gets $(10-offer). If you reject, both you and the offerer get nothing.
The simplest strategy that incentivizes fair splits is to accept everything 5 and reject everything < 5. The offerer can't do better than by offering you 5. If you accepted offers of 1, the offerer that knows this would always offer you 1 and get 9, instead of being incentivized to give you 5. Being unexploitable in the sense of incentivizing fair splits is a very important property that your strategy might have.
With the simplest strategy, if you're offered 5..10, you get 5..10; if you're offered 0..4, you get 0 in expectation.
Can you do better than that? What is a strategy that you could use that would get more than 0 in expectation if you're offered 1..4, while still being unexploitable (i.e., still incentivizing splits of at least 5)?
I encourage you to stop here and try to come up with a strategy before continuing.
The solution, explained by Yudkowsky in planecrash (children split 12 jellychips, so the offers are 0..12):
When the children return the next day, the older children tell them the correct solution to the original Ultimatum Game.
It goes like this:
When somebody offers you a 7:5 split, instead of the 6:6 split that would be fair, you should accept their offer with slightly less than 6/7 probability. Their expected value from offering you 7:5, in this case, is 7 * slightly less than 6/7, or slightly less than 6. This ensures they can't do any better by offering you an unfair split; but neither do you try to destroy all their expected value in retaliation.
It could be an honest mistake, especially if the real situation is any more complicated than the original Ultimatum Game.
If they offer you 8:4, accept with probability slightly-more-less than 6/8, so they do even worse in their own expectation by offering you 8:4 than 7:5.
It's not about retaliating harder, the harder they hit you with an unfair price - that point gets hammered in pretty hard to the kids, a Watcher steps in to repeat it. This setup isn't about retaliation, it's about what both sides have to do, to turn the problem of dividing the gains, into a matter of fairness; to create the incentive setup whereby both sides don't expect to do any better by distorting their own estimate of what is 'fair'.
[The next stage involves a complicated dynamic-puzzle with two stations, that requires two players working simultaneously to solve. After it's been solved, one player locks in a number on a 0-12 dial, the other player may press a button, and the puzzle station spits out jellychips thus divided.
The gotcha is, the 2-player puzzle-game isn't always of equal difficulty for both players. Sometimes, one of them needs to work a lot harder than the other.]
They play the 2-station video games again. There's less anger and shouting this time. Sometimes, somebody rolls a continuous-die and then rejects somebody's offer, but whoever gets rejected knows that they're not being punished. Everybody is just following the Algorithm.
Your notion of fairness didn't match their notion of fairness, and they did what the Algorithm says to do in that case, but ...
2437 эпизодов
Fetch error
Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on October 09, 2024 12:46 ()
What now? This series will be checked again in the next hour. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.
Manage episode 439715735 series 3314709
TL;DR: using a simple mixed strategy, LDT can give in to threats, ultimatums, and commitments - while incentivizing cooperation and fair[1] splits instead.
This strategy made it much more intuitive to many people I've talked to that smart agents probably won't do weird everyone's-utility-eating things like threatening each other or participating in commitment races.
1. The Ultimatum game
This part is taken from planecrash[2][3].
You're in the Ultimatum game. You're offered 0-10 dollars. You can accept or reject the offer. If you accept, you get what's offered, and the offerer gets $(10-offer). If you reject, both you and the offerer get nothing.
The simplest strategy that incentivizes fair splits is to accept everything 5 and reject everything < 5. The offerer can't do better than by offering you 5. If you accepted offers of 1, the offerer that knows this would always offer you 1 and get 9, instead of being incentivized to give you 5. Being unexploitable in the sense of incentivizing fair splits is a very important property that your strategy might have.
With the simplest strategy, if you're offered 5..10, you get 5..10; if you're offered 0..4, you get 0 in expectation.
Can you do better than that? What is a strategy that you could use that would get more than 0 in expectation if you're offered 1..4, while still being unexploitable (i.e., still incentivizing splits of at least 5)?
I encourage you to stop here and try to come up with a strategy before continuing.
The solution, explained by Yudkowsky in planecrash (children split 12 jellychips, so the offers are 0..12):
When the children return the next day, the older children tell them the correct solution to the original Ultimatum Game.
It goes like this:
When somebody offers you a 7:5 split, instead of the 6:6 split that would be fair, you should accept their offer with slightly less than 6/7 probability. Their expected value from offering you 7:5, in this case, is 7 * slightly less than 6/7, or slightly less than 6. This ensures they can't do any better by offering you an unfair split; but neither do you try to destroy all their expected value in retaliation.
It could be an honest mistake, especially if the real situation is any more complicated than the original Ultimatum Game.
If they offer you 8:4, accept with probability slightly-more-less than 6/8, so they do even worse in their own expectation by offering you 8:4 than 7:5.
It's not about retaliating harder, the harder they hit you with an unfair price - that point gets hammered in pretty hard to the kids, a Watcher steps in to repeat it. This setup isn't about retaliation, it's about what both sides have to do, to turn the problem of dividing the gains, into a matter of fairness; to create the incentive setup whereby both sides don't expect to do any better by distorting their own estimate of what is 'fair'.
[The next stage involves a complicated dynamic-puzzle with two stations, that requires two players working simultaneously to solve. After it's been solved, one player locks in a number on a 0-12 dial, the other player may press a button, and the puzzle station spits out jellychips thus divided.
The gotcha is, the 2-player puzzle-game isn't always of equal difficulty for both players. Sometimes, one of them needs to work a lot harder than the other.]
They play the 2-station video games again. There's less anger and shouting this time. Sometimes, somebody rolls a continuous-die and then rejects somebody's offer, but whoever gets rejected knows that they're not being punished. Everybody is just following the Algorithm.
Your notion of fairness didn't match their notion of fairness, and they did what the Algorithm says to do in that case, but ...
2437 эпизодов
All episodes
×Добро пожаловать в Player FM!
Player FM сканирует Интернет в поисках высококачественных подкастов, чтобы вы могли наслаждаться ими прямо сейчас. Это лучшее приложение для подкастов, которое работает на Android, iPhone и веб-странице. Зарегистрируйтесь, чтобы синхронизировать подписки на разных устройствах.