I’d personally be somewhat surprised if that was particularly useful—I think there’s a bunch of features of the alignment problem that you just don’t get with smaller models (let alone algorithmic tasks) - eg the model’s ability to understand what alignment even is. Maybe you could get some juice out of it? But knowing that a technique works to “align” an algorithmic problem would feel like v weak evidence that it works on a real problem.
Makes sense. I agree that something working on algorithmic tasks is very weak evidence, although I am somewhat interested in how much insight can we get if we put more effort into hand-crafting algorithmic tasks with interesting properties.
I’d personally be somewhat surprised if that was particularly useful—I think there’s a bunch of features of the alignment problem that you just don’t get with smaller models (let alone algorithmic tasks) - eg the model’s ability to understand what alignment even is. Maybe you could get some juice out of it? But knowing that a technique works to “align” an algorithmic problem would feel like v weak evidence that it works on a real problem.
Makes sense. I agree that something working on algorithmic tasks is very weak evidence, although I am somewhat interested in how much insight can we get if we put more effort into hand-crafting algorithmic tasks with interesting properties.