Pattern comments on What’s Up With Confusingly Pervasive Goal Directedness?

Pattern 21 Jan 2022 1:18 UTC
2 points
Not only that, most of the plans route through “acquire resources in a way that is unfriendly to human values.” Because in the space of all possible plans, while consequentialism doesn’t take that many bits to specify, human values are highly complexand take a lot of bits to specify.
1) It’s easier to build a moon base with money. And*, it’s easier to steal money than earn it.
*This is a hypothetical
2) Even replacing that plan with a one that ‘human values’ says works, is tricky. What is an acceptable way to earn money?

Just listing the plans.
One does not enumerate all of possibility.

Okay, but if I imagine a researcher who is thoughtful but a bit too optimistic, what they might counterargue with is: “Sure, but I’ll just inspect the plans for whether they’re unfriendly, and not do those plans.”
And here you swap out ‘a plan’ for ‘plans’.

Me: Okay, so partly you’re pointing out that hardness of the problem isn’t just about getting the AI to do what I want, it’s that doing what I want is actually just really hard. Or rather, the part where alignment is hard is precisely when the thing I’m trying to accomplish is hard. Because then I need a powerful plan, and it’s hard to specify a search for powerful plans that don’t kill everyone.
The fact that this is being used as a metaphor, disconnects it from the problem.
Suppose, tomorrow, a ‘cure for cancer’ was created. And the solution was surprisingly simple.
It seems clear that say, ‘beating you at chess’ isn’t that hard to plan. Why would ‘cure cancer’ be so very, very hard?

It seems like the tricky bit about a plan is that...maybe a plan wouldn’t work?
You might have to do experiments, and learn from them, and come up with new ideas...you are not sailing somewhere that is on a map, or doing something that has been done before.