What I take away from this is that they should have separated the utility from an assumption being true, from the probability/likelihood of an assumption being true, and indeed this shows some calibration problems.
There is slipping into more convenient worlds for reasons based on utility rather than evidence, which is a problem (assuming it’s solvable for you.)
This is an important takeaway, but I don’t think your other takeaways help as much as this one.
That said, this constraint IRL makes almost all real-life problems impossible for humans and AIs:
I have an exercise where I give people the instruction to play a puzzle game (“Baba is You”), but where you normally have the ability to move around and interact with the world to experiment and learn things, instead, you need to make a complete plan for solving the level, and you aim to get it right on your first try.
In particular, if such a constraint exists, then it’s a big red flag that the problem you are solving is impossible to solve, given that constraint.
Almost all plans fail on the first try, even for really competent plans and humans, and outside of very constrained regimes, 0 plans work out on the first try.
Thus, if you are truly in a situation where you are encountering such constraints, you should give up on the problem ASAP, and rest a little to make sure that the constraint actually exists.
So while this is a fun experiment, with real takeaways, I’d warn people that constraining a plan to work on the first try and requiring completeness makes lots of problems impossible to solve for us humans and AIs.
I’m confident Eliezer would agree with you that if you can find a way to do something easier instead, you should absolutely do that. But he also argues that there is no guarantee that something easier exists; the universe isn’t constrained to only placing fair demands on you.
My point isn’t that the easier option always exists, or even that a problem can’t be impossible.
My point is that if you are facing a problem that requires 1-shot complete plans, and there’s no second try, you need to do something else.
There is a line where a problem becomes too difficult to productively work on, and that constraint is a great sign of an impossible problem (if it exists.)
AND your accurate assessment of the difficulty. The overconfidence displayed in this mini-experiment seems to result in part from people massively misestimating the difficulty of this relatively simple problem. That’s why it’s so concerning WRT alignment.
Not really, but they are definitely more few-shot than other areas, but thankfully getting 1 thing wrong isn’t usually an immediate game-ender (though it is still to be avoided, and importantly this is why these 2 areas are harder than a lot of other fields).
Ah- well said. I understand the rest of your comments better now. And I thoroughly agree, with a caveat about the complexity of the problem and the amount of thought and teamwork applied (e.g., I expect that a large team working for a month in effective collaboration would’ve solved the problem in this experiment, but alignment is probably much more difficult than that).
What I take away from this is that they should have separated the utility from an assumption being true, from the probability/likelihood of an assumption being true, and indeed this shows some calibration problems.
There is slipping into more convenient worlds for reasons based on utility rather than evidence, which is a problem (assuming it’s solvable for you.)
This is an important takeaway, but I don’t think your other takeaways help as much as this one.
That said, this constraint IRL makes almost all real-life problems impossible for humans and AIs:
In particular, if such a constraint exists, then it’s a big red flag that the problem you are solving is impossible to solve, given that constraint.
Almost all plans fail on the first try, even for really competent plans and humans, and outside of very constrained regimes, 0 plans work out on the first try.
Thus, if you are truly in a situation where you are encountering such constraints, you should give up on the problem ASAP, and rest a little to make sure that the constraint actually exists.
So while this is a fun experiment, with real takeaways, I’d warn people that constraining a plan to work on the first try and requiring completeness makes lots of problems impossible to solve for us humans and AIs.
One of Eliezer’s essays in The Sequences is called Shut Up and Do the Impossible
I’m confident Eliezer would agree with you that if you can find a way to do something easier instead, you should absolutely do that. But he also argues that there is no guarantee that something easier exists; the universe isn’t constrained to only placing fair demands on you.
My point isn’t that the easier option always exists, or even that a problem can’t be impossible.
My point is that if you are facing a problem that requires 1-shot complete plans, and there’s no second try, you need to do something else.
There is a line where a problem becomes too difficult to productively work on, and that constraint is a great sign of an impossible problem (if it exists.)
The maximum difficulty that is worth attempting depends on the stakes.
AND your accurate assessment of the difficulty. The overconfidence displayed in this mini-experiment seems to result in part from people massively misestimating the difficulty of this relatively simple problem. That’s why it’s so concerning WRT alignment.
Do things like major surgery or bomb defusal have those kinds of constraints?
Not really, but they are definitely more few-shot than other areas, but thankfully getting 1 thing wrong isn’t usually an immediate game-ender (though it is still to be avoided, and importantly this is why these 2 areas are harder than a lot of other fields).
Ah- well said. I understand the rest of your comments better now. And I thoroughly agree, with a caveat about the complexity of the problem and the amount of thought and teamwork applied (e.g., I expect that a large team working for a month in effective collaboration would’ve solved the problem in this experiment, but alignment is probably much more difficult than that).