If I’m trying to build an AI to help us navigate an increasingly complex and rapidly-changing world, what does “low impact” mean? In what sense do the terrible situations involve higher objective impact than the intended behaviors?
Solving low impact seems like it would allow us to ensure that each low impact agent won’t push the world in a given direction by more than some bounded, (presumably) small amount. If we’re thinking of my new measure in particular, it would also help ensure that we won’t be surprised by the capability gain of any single agent, which might help even if we aren’t expecting the spontaneous arrival of a singleton. A good formulation of low impact would have the property that interactions of multiple such agents doesn’t turn into more than the sum of the constituent impact budgets. In this sense, I think it’s sensible to see measuring and restricting objective impact (implicitly thinking of my approach here) as helpful for slowing down the situation.
I also think that, depending on the specific formulation, a low impact solution would enable a substantial reduction in the problems which we need to solve ourselves. That is, I think solving low impact might make useful technical oracles possible. It might be the case that we only need a portion of the agent foundations agenda + low impact in order to build these oracles, which we could then use to help us solve value alignment/corrigibility/etc.
I am also aware that using these oracles would not (naively) be low impact; I plan to outline how we could maybe get around this in a robust manner as soon as soon as I am able.
Solving low impact seems like it would allow us to ensure that each low impact agent won’t push the world in a given direction by more than some bounded, (presumably) small amount. If we’re thinking of my new measure in particular, it would also help ensure that we won’t be surprised by the capability gain of any single agent, which might help even if we aren’t expecting the spontaneous arrival of a singleton. A good formulation of low impact would have the property that interactions of multiple such agents doesn’t turn into more than the sum of the constituent impact budgets. In this sense, I think it’s sensible to see measuring and restricting objective impact (implicitly thinking of my approach here) as helpful for slowing down the situation.
I also think that, depending on the specific formulation, a low impact solution would enable a substantial reduction in the problems which we need to solve ourselves. That is, I think solving low impact might make useful technical oracles possible. It might be the case that we only need a portion of the agent foundations agenda + low impact in order to build these oracles, which we could then use to help us solve value alignment/corrigibility/etc.
I am also aware that using these oracles would not (naively) be low impact; I plan to outline how we could maybe get around this in a robust manner as soon as soon as I am able.