I’m wondering if the Rainforest thing is somehow tied to some other disagreements (between you/me or you/MIRI-cluster).
Where, something like “the fact that it requires some interpretive labor to model the Rainforest as an agent in the first place” is related to why it seems hard to be helpful to humans, i.e. humans aren’t actually agents. You get an easier starting ground since we have the ability to write down goals and notice inconsistencies in them, but that’s not actually that reliable. We are not in fact agents and we need to somehow build AIs that reliable seem good to us anyway.
(Curious if this feels relevant either to Rohin, or other “MIRI cluster” folk)
Well, yes, one way to help some living entity is to (1) interpret it as an agent, and then (2) act in service of the terminal goals of that agent. But that’s not the only way to be helpful. It may also be possible to directly be helpful to a living entity that is not an agent, without getting any agent concepts involved at all.
I definitely don’t know how to do this, but the route that avoids agent models entirely seems more plausible me compared to working hard to interpret everything using some agent model that is often a really poor fit, and then helping on the basis of a that poorly-fitting agent model.
I’m excited about inquiring deeply into what the heck “help” means. (All please reach out to me if you’d like to join a study seminar on this topic)
How about helping a tree? It actually seems pretty straightforward to me how to help a tree
Yes, there is interpretive labor, and yes, things become fuzzy as situations become more and more extreme, but if you want to help an agent-ish thing it shouldn’t be too hard to add some value and not cause massive harm.
I expect MIRI-cluster to agree with this point—think of the sentiment “the AI knows what you want it to do, it just doesn’t care”. The difficulty isn’t in being competent enough to help humans, it’s in being motivated to help humans. (If you thought that we had to formally define everything and prove theorems w.r.t the formal definitions or else we’re doomed, then you might think that the fact that humans aren’t clear agents poses a problem; that might be one way that MIRI-cluster and I disagree.)
I could imagine that for some specific designs for AI systems you could say that they would fail to help humans because they make a false assumption of too-much-agentiness. If the plan was “literally run an optimal strategy pair for an assistance game (CIRL)”, I think that would be a correct critique—most egregiously, CIRL assumes a fixed reward function, but humans change over time. But I don’t see why it would be true for the “default” intelligent AI system.
I’m wondering if the Rainforest thing is somehow tied to some other disagreements (between you/me or you/MIRI-cluster).
Where, something like “the fact that it requires some interpretive labor to model the Rainforest as an agent in the first place” is related to why it seems hard to be helpful to humans, i.e. humans aren’t actually agents. You get an easier starting ground since we have the ability to write down goals and notice inconsistencies in them, but that’s not actually that reliable. We are not in fact agents and we need to somehow build AIs that reliable seem good to us anyway.
(Curious if this feels relevant either to Rohin, or other “MIRI cluster” folk)
Well, yes, one way to help some living entity is to (1) interpret it as an agent, and then (2) act in service of the terminal goals of that agent. But that’s not the only way to be helpful. It may also be possible to directly be helpful to a living entity that is not an agent, without getting any agent concepts involved at all.
I definitely don’t know how to do this, but the route that avoids agent models entirely seems more plausible me compared to working hard to interpret everything using some agent model that is often a really poor fit, and then helping on the basis of a that poorly-fitting agent model.
I’m excited about inquiring deeply into what the heck “help” means. (All please reach out to me if you’d like to join a study seminar on this topic)
I share Alex’s intuition in a sibling comment:
Yes, there is interpretive labor, and yes, things become fuzzy as situations become more and more extreme, but if you want to help an agent-ish thing it shouldn’t be too hard to add some value and not cause massive harm.
I expect MIRI-cluster to agree with this point—think of the sentiment “the AI knows what you want it to do, it just doesn’t care”. The difficulty isn’t in being competent enough to help humans, it’s in being motivated to help humans. (If you thought that we had to formally define everything and prove theorems w.r.t the formal definitions or else we’re doomed, then you might think that the fact that humans aren’t clear agents poses a problem; that might be one way that MIRI-cluster and I disagree.)
I could imagine that for some specific designs for AI systems you could say that they would fail to help humans because they make a false assumption of too-much-agentiness. If the plan was “literally run an optimal strategy pair for an assistance game (CIRL)”, I think that would be a correct critique—most egregiously, CIRL assumes a fixed reward function, but humans change over time. But I don’t see why it would be true for the “default” intelligent AI system.