(As reply to Zvi’s ‘If someone was founding a new AI notkilleveryoneism research organization, what is the best research agenda they should look into pursuing right now?’)
LLMs seem to represent meaning in a pretty human-like way and this seems likely to keep getting better as they get scaled up, e.g. https://arxiv.org/abs/2305.11863. This could make getting them to follow the commonsense meaning of instructions much easier. Also, similar methodologies to https://arxiv.org/abs/2305.11863 could be applied to other alignment-adjacent domains/tasks, e.g. moral reasoning, prosociality, etc.
(As reply to Zvi’s ‘If someone was founding a new AI notkilleveryoneism research organization, what is the best research agenda they should look into pursuing right now?’)
LLMs seem to represent meaning in a pretty human-like way and this seems likely to keep getting better as they get scaled up, e.g. https://arxiv.org/abs/2305.11863. This could make getting them to follow the commonsense meaning of instructions much easier. Also, similar methodologies to https://arxiv.org/abs/2305.11863 could be applied to other alignment-adjacent domains/tasks, e.g. moral reasoning, prosociality, etc.
Step 2: e.g. plug the commonsense-meaning-of-instructions following models into OpenAI’s https://openai.com/blog/introducing-superalignment.
Related intuition: turning LLM processes/simulacra into [coarse] emulations of brain processes.
(https://twitter.com/BogdanIonutCir2/status/1677060966540795905)