In terms of detailed plans: What about, for example, figuring out enough details about shard theory to make preregistered predictions about the test-time behaviors and internal circuits you will find in an agent after training in a novel toy environment, based on attributes of the training trajectories? Success at that would represent a real win within the field, with a lot of potential further downstream of that.
Re: the rest, even if all of those 4 approaches you listed are individually promising (which I’m inclined to agree with you on), the conjunction of them might be much less likely to work out. I personally consider them as separate bets that can stand or fall on their own, and hope that if multiple pan out then their benefits may stack.
In terms of detailed plans: What about, for example, figuring out enough details about shard theory to make preregistered predictions about the test-time behaviors and internal circuits you will find in an agent after training in a novel toy environment, based on attributes of the training trajectories? Success at that would represent a real win within the field, with a lot of potential further downstream of that.
Re: the rest, even if all of those 4 approaches you listed are individually promising (which I’m inclined to agree with you on), the conjunction of them might be much less likely to work out. I personally consider them as separate bets that can stand or fall on their own, and hope that if multiple pan out then their benefits may stack.