Can we train ML systems that clearly manifest a collective identity?
I feel like in multi-agent reinforcement learning that’s already the case.
Re training setting for creating shared identity. What about a setting where a human and LLM take turns generating text, like in the current chat setting, but first they receive some task, f.e. “write a good strategy for this startup” and the context for this task. At the end they output the final answer and there is some reward model which rates the performance of the cyborg (human+LLM) as a whole.
In practice, having real humans in this training loop may be too costly, so we may want to replace them most of the time with an imitation of a human.
(Also a minor point to keep in mind: having emergent collective action doesn’t mean that the agents have a model of the collective self. F.e. colony of ant behaves as one, but I doubt ants have any model of the colony, rather just executing their ant procedures. Although with powerful AIs, I expect those collective self models to arise. I just mean that maybe we should be careful in transferring insight from ant colonies, swarms, hives etc., to settings with more cognitively capable agents?)
Love that post!
I feel like in multi-agent reinforcement learning that’s already the case.
Re training setting for creating shared identity. What about a setting where a human and LLM take turns generating text, like in the current chat setting, but first they receive some task, f.e. “write a good strategy for this startup” and the context for this task. At the end they output the final answer and there is some reward model which rates the performance of the cyborg (human+LLM) as a whole.
In practice, having real humans in this training loop may be too costly, so we may want to replace them most of the time with an imitation of a human.
(Also a minor point to keep in mind: having emergent collective action doesn’t mean that the agents have a model of the collective self. F.e. colony of ant behaves as one, but I doubt ants have any model of the colony, rather just executing their ant procedures. Although with powerful AIs, I expect those collective self models to arise. I just mean that maybe we should be careful in transferring insight from ant colonies, swarms, hives etc., to settings with more cognitively capable agents?)