So, I think that one of the end goals of seeking to understand natural abstractions is being able to predict what abstractions get chosen given a certain training regime (e.g. the natural world, or a experimental simulation, or a static set of training data). It seems likely to me that a simpler environment, such as an Atari game, or the coin maze, is a good starting point for designing experiments to try to figure this out. I think ideally, we could gradually build up to more complex environments and more complex agents, until we are working in a sim that is usefully similar to the natural world such that the agent we train in that sim could plausibly produce artifacts of cognitive work that would be valuable to society. I try to describe something like this setup in this post: https://www.lesswrong.com/posts/pFzpRePDkbhEj9d5G/an-attempt-to-steelman-openai-s-alignment-plan
The main drawback I see to this idea is that it is so vastly more cautious an approach than leading AI labs are currently doing, that it would be a hard pill for them to swallow to have to transition to working only in such a carefully censored simulation world.
So, I think that one of the end goals of seeking to understand natural abstractions is being able to predict what abstractions get chosen given a certain training regime (e.g. the natural world, or a experimental simulation, or a static set of training data). It seems likely to me that a simpler environment, such as an Atari game, or the coin maze, is a good starting point for designing experiments to try to figure this out. I think ideally, we could gradually build up to more complex environments and more complex agents, until we are working in a sim that is usefully similar to the natural world such that the agent we train in that sim could plausibly produce artifacts of cognitive work that would be valuable to society. I try to describe something like this setup in this post: https://www.lesswrong.com/posts/pFzpRePDkbhEj9d5G/an-attempt-to-steelman-openai-s-alignment-plan
The main drawback I see to this idea is that it is so vastly more cautious an approach than leading AI labs are currently doing, that it would be a hard pill for them to swallow to have to transition to working only in such a carefully censored simulation world.