evhub comments on Clarifying inner alignment terminology

evhub 11 Nov 2020 22:31 UTC
LW: 2 AF: 2
AF
I agree that what you’re describing is a valid way of looking at what’s going on—it’s just not the way I think about it, since I find that it’s not very helpful to think of a model as a subagent of gradient descent, as gradient descent really isn’t itself an agent in a meaningful sense, nor do I think it can really be understood as “trying” to do anything in particular.
- Edouard Harris 11 Nov 2020 23:14 UTC
  LW: 3 AF: 3
  AF Parent
  Sure, makes sense! Though to be clear, I believe what I’m describing should apply to optimizers other than just gradient descent — including optimizers one might think of as reward-maximizing agents.