This does feel pretty vague in parts (e.g. “mitigating goal misgeneralization” feels more like a problem statement than a component of research), but I personally think this is a pretty good plan, and at the least, I’m very appreciative of you posting your plan publicly!
Now, we just need public alignment plans from Anthropic, Google Brain, Meta, Adept, …
This does feel pretty vague in parts (e.g. “mitigating goal misgeneralization” feels more like a problem statement than a component of research), but I personally think this is a pretty good plan, and at the least, I’m very appreciative of you posting your plan publicly!
Now, we just need public alignment plans from Anthropic, Google Brain, Meta, Adept, …