How about helping a tree? It actually seems pretty straightforward to me how to help a tree
Yes, there is interpretive labor, and yes, things become fuzzy as situations become more and more extreme, but if you want to help an agent-ish thing it shouldn’t be too hard to add some value and not cause massive harm.
I expect MIRI-cluster to agree with this point—think of the sentiment “the AI knows what you want it to do, it just doesn’t care”. The difficulty isn’t in being competent enough to help humans, it’s in being motivated to help humans. (If you thought that we had to formally define everything and prove theorems w.r.t the formal definitions or else we’re doomed, then you might think that the fact that humans aren’t clear agents poses a problem; that might be one way that MIRI-cluster and I disagree.)
I could imagine that for some specific designs for AI systems you could say that they would fail to help humans because they make a false assumption of too-much-agentiness. If the plan was “literally run an optimal strategy pair for an assistance game (CIRL)”, I think that would be a correct critique—most egregiously, CIRL assumes a fixed reward function, but humans change over time. But I don’t see why it would be true for the “default” intelligent AI system.
I share Alex’s intuition in a sibling comment:
Yes, there is interpretive labor, and yes, things become fuzzy as situations become more and more extreme, but if you want to help an agent-ish thing it shouldn’t be too hard to add some value and not cause massive harm.
I expect MIRI-cluster to agree with this point—think of the sentiment “the AI knows what you want it to do, it just doesn’t care”. The difficulty isn’t in being competent enough to help humans, it’s in being motivated to help humans. (If you thought that we had to formally define everything and prove theorems w.r.t the formal definitions or else we’re doomed, then you might think that the fact that humans aren’t clear agents poses a problem; that might be one way that MIRI-cluster and I disagree.)
I could imagine that for some specific designs for AI systems you could say that they would fail to help humans because they make a false assumption of too-much-agentiness. If the plan was “literally run an optimal strategy pair for an assistance game (CIRL)”, I think that would be a correct critique—most egregiously, CIRL assumes a fixed reward function, but humans change over time. But I don’t see why it would be true for the “default” intelligent AI system.