I was mostly a gut-feeling when I posted, but let me try and articulate a few:
It relies on having a good representation. Small problems with the representation might make it unworkable.
Learning a good enough representation and verifying that you’ve done so doesn’t seem very feasible. Impact may be missed if the representation doesn’t properly capture unobserved things and long-term dependencies. Things like the creation of sub-agents seem likely to crop up in subtle, hard to learn, ways.
I haven’t looked into it, but ATM I have no theory about when this scheme could be expected to recover the “correct” model (I don’t even know how that would be defined… I’m trying to “learn” my way around the problem :P)
To put #1 another way, I’m not sure that I’ve gained anything compared with proposals to penalize impact in the input space, or some learned representation space (with the learning not directed towards discovering impact).
On the other hand, I was inspired to consider this idea when thinking about Yoshua’s proposal about causal disentangling mentioned at the end of his Asilomar talk here:
https://www.youtube.com/watch?v=ZHYXp3gJCaI. This (and maybe some other similar work, e.g. on empowerment) seem to provide a way to direct an agent’s learning towards maximizing its influence, which might help… although having an agent learn based on maximizing its influence seems like a bad idea… but I guess you might be able to then add a conflicting objective (like a regularizer) to actually limit the impact...
So then you’d end up with some sort of adversarial-ish set-up, where the agent is trying to both:
maximize potential impact (i.e. by understanding its ability to influence the world)
minimize actual impact (i.e. by refraining from taking actions which turn out (eventually) to have a large impact).
Having just finished typing this, I feel more optimistic about this last proposal than the original idea :D
We want an agent to learn about how to maximize its impact in order to avoid doing so.
(How) can an agent confidently predict its potential impact without trying potentially impactful actions?
I think it certainly can, because humans can. We use a powerful predictive model of the world to do this.
… and that’s all I have to say ATM
I was mostly a gut-feeling when I posted, but let me try and articulate a few:
It relies on having a good representation. Small problems with the representation might make it unworkable. Learning a good enough representation and verifying that you’ve done so doesn’t seem very feasible. Impact may be missed if the representation doesn’t properly capture unobserved things and long-term dependencies. Things like the creation of sub-agents seem likely to crop up in subtle, hard to learn, ways.
I haven’t looked into it, but ATM I have no theory about when this scheme could be expected to recover the “correct” model (I don’t even know how that would be defined… I’m trying to “learn” my way around the problem :P)
To put #1 another way, I’m not sure that I’ve gained anything compared with proposals to penalize impact in the input space, or some learned representation space (with the learning not directed towards discovering impact).
On the other hand, I was inspired to consider this idea when thinking about Yoshua’s proposal about causal disentangling mentioned at the end of his Asilomar talk here: https://www.youtube.com/watch?v=ZHYXp3gJCaI. This (and maybe some other similar work, e.g. on empowerment) seem to provide a way to direct an agent’s learning towards maximizing its influence, which might help… although having an agent learn based on maximizing its influence seems like a bad idea… but I guess you might be able to then add a conflicting objective (like a regularizer) to actually limit the impact...
So then you’d end up with some sort of adversarial-ish set-up, where the agent is trying to both:
maximize potential impact (i.e. by understanding its ability to influence the world)
minimize actual impact (i.e. by refraining from taking actions which turn out (eventually) to have a large impact).
Having just finished typing this, I feel more optimistic about this last proposal than the original idea :D We want an agent to learn about how to maximize its impact in order to avoid doing so.
(How) can an agent confidently predict its potential impact without trying potentially impactful actions?
I think it certainly can, because humans can. We use a powerful predictive model of the world to do this. … and that’s all I have to say ATM