My concern is mostly from the perspective of an (initially at least) non-ideal agent getting attracted to a local optimum.
Do you agree at least that my concern is indeed likely a local optimum in behavior?
Yes, it is absolutely possible that the trust maximizer as described here would end up in a local optimum. This is certainly tricky to avoid. This post is far from a feasible solution to the alignment problem. We’re just trying to point out some interesting features of trust as a goal, which might be helpful in combination with other measures/ideas.
Yes, it is absolutely possible that the trust maximizer as described here would end up in a local optimum. This is certainly tricky to avoid. This post is far from a feasible solution to the alignment problem. We’re just trying to point out some interesting features of trust as a goal, which might be helpful in combination with other measures/ideas.