Sure the limbic system evolved over millions of years, but that doesn’t mean we need to evolve it as well—we could just study it and reimplement it directly without (much) iteration. I am not necessarily saying that this is a good approach to alignment—I personally would prefer a more theoretically grounded one also. But I think it is an interesting existence proof that imprinting fairly robust drives into agents through a very low bandwidth channel even after a lot of experience and without much RL is possible in practice.
Sure the limbic system evolved over millions of years, but that doesn’t mean we need to evolve it as well—we could just study it and reimplement it directly without (much) iteration. I am not necessarily saying that this is a good approach to alignment—I personally would prefer a more theoretically grounded one also. But I think it is an interesting existence proof that imprinting fairly robust drives into agents through a very low bandwidth channel even after a lot of experience and without much RL is possible in practice.