Leon Lang comments on Defining alignment research

Leon Lang 24 Aug 2024 20:47 UTC
2 points
0
To clarify: are you saying that since you perceive Chris Olah as mostly intrinsically caring about understanding neural networks (instead of mostly caring about alignment), you conclude that his work is irrelevant to alignment?
- habryka 24 Aug 2024 20:59 UTC
  6 points
  2
  Parent
  No, I have detailed inside view models of the alignment problem, and under those models consider Chris Olah’s work to be interesting but close to irrelevant (or to be about as relevant as the work of top capability researchers, whose work, to be clear, does have some relevance since of course understanding how to make systems better is relevant for understanding how AGI will behave, but where the relevance is pretty limited).