Existing approaches like impact measures and mild optimization are aiming to define what not to do rather than learn it.
Stuart’s early impact approach was like this, but modern work isn’t. Or maybe by “define what not to do”, you don’t mean “leave these variables alone”, but rather that eg (some ideally formalized variant of) AUP implicitly specifies a way in which the agent interacts with its environment: passivity to significant power changes. But then by this definition, we’re doing the “defining” thing for norm-learning approaches.
I agree that norm-based approaches use learning. I just don’t know whether I agree with your assertion that eg AUP “defines” what not to do.
To my understanding, mild optimization is about how we can navigate a search space intelligently without applying too much internal optimization pressure to find really “amazing” plans. This doesn’t seem to fit either.
Relatedly, learning what not to do imposes a limitation on behavior. If an AI system is goal-directed, then given sufficient intelligence it will likely find a nearest unblocked strategy.
How pessimistic are you about this concern for this idea?
I just don’t know whether I agree with your assertion that eg AUP “defines” what not to do.
I think I mostly meant that it is not learned.
I kind of want to argue that this means the effect of not-learned things can be traced back to researcher’s brains, rather than to experience with the real world. But that’s not exactly right, because the actual impact penalty can depend on properties of the world, even if it doesn’t use learning.
How pessimistic are you about this concern for this idea?
I don’t know; it feels too early to say. I think if the norms end up in some hardcoded form such that they never update over time, nearest unblocked strategies feel very likely. If the norms are evolving over time, then it might be fine. The norms would need to evolve at the same “rate” as the rate at which the world changes.
Stuart’s early impact approach was like this, but modern work isn’t. Or maybe by “define what not to do”, you don’t mean “leave these variables alone”, but rather that eg (some ideally formalized variant of) AUP implicitly specifies a way in which the agent interacts with its environment: passivity to significant power changes. But then by this definition, we’re doing the “defining” thing for norm-learning approaches.
I agree that norm-based approaches use learning. I just don’t know whether I agree with your assertion that eg AUP “defines” what not to do.
To my understanding, mild optimization is about how we can navigate a search space intelligently without applying too much internal optimization pressure to find really “amazing” plans. This doesn’t seem to fit either.
How pessimistic are you about this concern for this idea?
I think I mostly meant that it is not learned.
I kind of want to argue that this means the effect of not-learned things can be traced back to researcher’s brains, rather than to experience with the real world. But that’s not exactly right, because the actual impact penalty can depend on properties of the world, even if it doesn’t use learning.
I don’t know; it feels too early to say. I think if the norms end up in some hardcoded form such that they never update over time, nearest unblocked strategies feel very likely. If the norms are evolving over time, then it might be fine. The norms would need to evolve at the same “rate” as the rate at which the world changes.