Say I’m an agent that wants to increase u, but not “too strongly” (this whole thing is about how to formalize “too strongly”). Couldn’t I have a way of estimating how much other agents who don’t care about u might still care about what I do, and minimize that? i.e. avoid anything that would make other agents want to model my working as something more than “wants to increase u”.
(back in agent-designer shoes) So we could create a “moderate increaser” agent, give it a utility function u and inform it of other agents trying to increase v, w, x, y, and somehow have it avoid any strategies that would involve “decision theory interaction” with those other agents; i.e. threats, retaliation, trade … maybe something like “those agents should behave as if you didn’t exist”.
Thinking aloud here:
Say I’m an agent that wants to increase u, but not “too strongly” (this whole thing is about how to formalize “too strongly”). Couldn’t I have a way of estimating how much other agents who don’t care about u might still care about what I do, and minimize that? i.e. avoid anything that would make other agents want to model my working as something more than “wants to increase u”.
(back in agent-designer shoes) So we could create a “moderate increaser” agent, give it a utility function u and inform it of other agents trying to increase v, w, x, y, and somehow have it avoid any strategies that would involve “decision theory interaction” with those other agents; i.e. threats, retaliation, trade … maybe something like “those agents should behave as if you didn’t exist”.
Not too far away from my ideas here: http://lesswrong.com/r/discussion/lw/lv0/creating_a_satisficer/