Something that’s been intriguing me. If two agents figure out how to trust that each others goals are aligned (or at least not opposed), haven’t they essentially solved the alignment problem?
e.g. one agent could use the same method to bootstrap an aligned AI.
Something that’s been intriguing me. If two agents figure out how to trust that each others goals are aligned (or at least not opposed), haven’t they essentially solved the alignment problem?
e.g. one agent could use the same method to bootstrap an aligned AI.