TurnTrout comments on AXRP Episode 11 - Attainable Utility and Power with Alex Turner

TurnTrout 13 Oct 2021 20:21 UTC
LW: 2 AF: 2
AF
I want to clarify something.
Does the notion of “low-impact” break down, though, if humans are eventually going to use the results from these experiments to build high-impact AI?
I think the notion doesn’t break down. The low-impact AI hasn’t changed human attainable utilities by the end of the experiments. If we eventually build a high-impact AI, that seems “on us.” The low-impact AI itself hasn’t done something bad to us. I therefore think the concept I spelled out still works in this situation.
As I mentioned in the other comment, I don’t feel optimistic about actually designing these AIs via explicit low-impact objectives, though.
- Charlie Steiner 13 Oct 2021 21:41 UTC
  LW: 2 AF: 1
  AF Parent
  It seems like evaluating human AU depends on the model. There’s a “black box” sense where you can replace the human’s policy with literally anything in calculating AU for different objectives, and there’s a “transparent box” sense in which you have to choose from a distribution of predicted human behaviors.
  
  The former is closer to what I think you mean by “hasn’t changed the humans’ AU,” but I think it’s the latter that an AI cares about when evaluating the impact of its own actions.
  - TurnTrout 14 Oct 2021 4:04 UTC
    LW: 2 AF: 2
    AF Parent
    I think it’s the latter that an AI cares about when evaluating the impact of its own actions.
    I’m discussing a philosophical framework for understanding low impact. I’m not prescribing how the AI actually accomplishes this.