To be clear, I don’t think iid explains it in all cases, I also think iid is just a particularly clean example. Hence why I said (emphasis added now):
So my position is “partial agency arises because any embedded learning algorithm will necessarily leave out aspects that the idealized learning algorithm can identify”. And as a subclaim, that this often happens because of the effective iid assumption between data points in a learning algorithm.
Re:
I’m not sure what you’re saying here. I agree that “no one wants that”.
My point is that the relevant distinction in that case seems to be “instrumental goal” vs. “terminal goal”, rather than “full agency” vs. “partial agency”. In other words, I expect that a map that split things up based on instrumental vs. terminal would do a better job of understanding the territory than one that used full vs. partial agency.
Re: evolution example, I agree that particular learning algorithms can be designed such that they incentivize partial agency. I think my intuition is that all of the particular kinds of partial agency we could incentivize would be too much of a handicap on powerful AI systems (or won’t work at all, e.g. if the way to get powerful AI systems is via mesa optimization).
I’m only claiming that **if the rules of the game remain intact** we can incentivise partial agency.
My point is that the relevant distinction in that case seems to be “instrumental goal” vs. “terminal goal”, rather than “full agency” vs. “partial agency”. In other words, I expect that a map that split things up based on instrumental vs. terminal would do a better job of understanding the territory than one that used full vs. partial agency.
Ah, I see. I definitely don’t disagree that epistemics is instrumental. (Maybe we have some terminal drive for it, but, let’s set that aside.) BUT:
I don’t think we can account for what’s going on here just by pointing that out. Yes, the fact that it’s instrumental means that we cut it off when it “goes too far”, and there’s not a nice encapsulation of what “goes too far” means. However, I think even when we set that aside there’s still an alter-the-map-to-fit-the-territory-not-the-other-way-around phenomenon. IE, yes, it’s a subgoal, but how can we understand the subgoal? Is it best understood as optimization, or something else?
When designing machine learning algorithms, this is essentially built in as a terminal goal; the training procedure incentivises predicting the data, not manipulating it. Or, if it does indeed incentivize manipulation of the data, we would like to understand that better; and we’d like to be able to design things which don’t have that incentive structure.
To be clear, I don’t think iid explains it in all cases, I also think iid is just a particularly clean example.
Sorry for the very late reply, I’ve been busy :/
To be clear, I don’t think iid explains it in all cases, I also think iid is just a particularly clean example. Hence why I said (emphasis added now):
Re:
My point is that the relevant distinction in that case seems to be “instrumental goal” vs. “terminal goal”, rather than “full agency” vs. “partial agency”. In other words, I expect that a map that split things up based on instrumental vs. terminal would do a better job of understanding the territory than one that used full vs. partial agency.
Re: evolution example, I agree that particular learning algorithms can be designed such that they incentivize partial agency. I think my intuition is that all of the particular kinds of partial agency we could incentivize would be too much of a handicap on powerful AI systems (or won’t work at all, e.g. if the way to get powerful AI systems is via mesa optimization).
Definitely agree with that.
Ah, I see. I definitely don’t disagree that epistemics is instrumental. (Maybe we have some terminal drive for it, but, let’s set that aside.) BUT:
I don’t think we can account for what’s going on here just by pointing that out. Yes, the fact that it’s instrumental means that we cut it off when it “goes too far”, and there’s not a nice encapsulation of what “goes too far” means. However, I think even when we set that aside there’s still an alter-the-map-to-fit-the-territory-not-the-other-way-around phenomenon. IE, yes, it’s a subgoal, but how can we understand the subgoal? Is it best understood as optimization, or something else?
When designing machine learning algorithms, this is essentially built in as a terminal goal; the training procedure incentivises predicting the data, not manipulating it. Or, if it does indeed incentivize manipulation of the data, we would like to understand that better; and we’d like to be able to design things which don’t have that incentive structure.
Ah, sorry for misinterpreting you.