One major capabilities hurdle that’s related to interpretability: The difference between manually “opening up” the model to analyze its weights, etc., and being able to literally ask the model questions about why it did certain things.
And it seems like a path to solving that is to have the AI be able to analyze its own workinga, which seems like a potential path to recursive self improvement as well
One major capabilities hurdle that’s related to interpretability: The difference between manually “opening up” the model to analyze its weights, etc., and being able to literally ask the model questions about why it did certain things.
And it seems like a path to solving that is to have the AI be able to analyze its own workinga, which seems like a potential path to recursive self improvement as well