Daniel Kokotajlo comments on Definitions of “objective” should be Probable and Predictive

Daniel Kokotajlo 6 Jan 2023 19:06 UTC
LW: 6 AF: 5
0
AF
I was specifically talking about the conclusion that we shouldn’t talk about objectives/goals. That’s the conclusion that I think is absurd (when applied to humans) and also wrong (though less absurd) when applied to AGIs. I do think it’s absurd when applied to humans—it seems pretty obvious to me that theorizing about goals/motives/intentions is an often-useful practice for predicting human behavior.

I agree that typical conversation about goals/objectives/intentions/motives/etc. has an implicit “this isn’t necessarily the only thing they want, and they aren’t necessarily optimizing perfectly rationally towards it” caveat.

I’m happy to also have those implicit caveats in the case of AIs as well, when talking about their goals. The instrumental convergence argument still goes through, I think, despite those caveats. The argument for misaligned AGI being really bad by human-values lights also goes through, I think.

Re your second argument, about introspective experience & historical precedent being useful for predicting humans but not AIs:

OK, so suppose instead of AIs it was some alien species that landed in flying saucers yesterday, or maybe suppose it was some very smart octopi that a mad scientist cult has been selectively breeding for intelligence for the last 100 years. Would you agree that in these cases it would make sense for us to theorize about them having goals/intentions/etc.? Or would you say “We don’t have past experience of goal-talk being useful for understanding these creatures, and also we shouldn’t expect introspection to work well for predicting them either, therefore let’s avoid trying to say that these aliens/octopi have goals/intentions/objectives/etc, and instead talk directly about generalization behavior in novel situations.”
- Rohin Shah 6 Jan 2023 19:28 UTC
  LW: 4 AF: 4
  0
  AF Parent
  I was specifically talking about the conclusion that we shouldn’t talk about objectives/goals.
  Yeah, sorry, I ninja-edited my comment before you replied because I realized I misunderstood you.
  Tbc I think there are times when people say “Alice is clearly trying to do X” and my response is “what do you predict Alice would do in future situation Y” and it is not in fact X, so I do think it is not crazy to say that even for humans you should focus more on predictions of behavior and the reasons for making those predictions. But I agree you wouldn’t want to not talk about objectives / goals entirely.
  Or would you say “We don’t have past experience of goal-talk being useful for understanding these creatures, and also we shouldn’t expect introspection to work well for predicting them either, therefore let’s avoid trying to say that these aliens/octopi have goals/intentions/objectives/etc, and instead talk directly about generalization behavior in novel situations.”
  Yup!
  Though in the octopus case you could have lots of empirical experience, just as we likely will have lots of empirical experience with future AI systems (in the future).
  I do think it’s quite plausible that in these settings we’ll say “well they’ve done X, we know nothing else about them, so probably we should predict they’ll continue to do X”, which looks pretty similar to saying they have a goal of X. I think the main difference is that I’d be way more uncertain about that than it sounds like you would be.