Strong upvote. We are definetely not talking enough about what Scaffolded Language Model Agents mean for AI alignment. They are the light of hope, interpretable by design systems with tractable alignment and slow take off potential.
One possibility that arises as part of a mixed takeoff is using machine learning to optimize for the most effective scaffolding.
This should be forbidden. Turning explicitly written in code scaffolding into another black box not only will greatly damage interpretability but also poses huge risks of accidentally creating a sentient entity without noticing it. Scaffolding for LMAs serve a very similar role to consciousness for humans, so we should be very careful in this regard.
Strong upvote. We are definetely not talking enough about what Scaffolded Language Model Agents mean for AI alignment. They are the light of hope, interpretable by design systems with tractable alignment and slow take off potential.
This should be forbidden. Turning explicitly written in code scaffolding into another black box not only will greatly damage interpretability but also poses huge risks of accidentally creating a sentient entity without noticing it. Scaffolding for LMAs serve a very similar role to consciousness for humans, so we should be very careful in this regard.