I’m not familiar enough with agent foundations to provide very detailed object level advice, but I think it would be hugely valuable to empirically test agent foundations ideas in real models, with the understanding that AGI doesn’t necessarily have to look like LMs but any theory for intelligence has to at least fit both LMs and AGI. As an example, we might believe that LMs might not have goals in the same sense as AGI eventually, but then we can ask why LMs can still seem to achieve any goals at all, and perhaps through empirical investigation of LMs we can get a better understanding of the nature of goal seeking. I think this would be much, much more valuable than generic LM alignment work.
I’m not familiar enough with agent foundations to provide very detailed object level advice, but I think it would be hugely valuable to empirically test agent foundations ideas in real models, with the understanding that AGI doesn’t necessarily have to look like LMs but any theory for intelligence has to at least fit both LMs and AGI. As an example, we might believe that LMs might not have goals in the same sense as AGI eventually, but then we can ask why LMs can still seem to achieve any goals at all, and perhaps through empirical investigation of LMs we can get a better understanding of the nature of goal seeking. I think this would be much, much more valuable than generic LM alignment work.