Noosphere89 comments on What does it mean for an AGI to be ‘safe’?

Noosphere89 8 Oct 2022 0:58 UTC
LW: 3 AF: 3
0
AF

Current SotA systems are very opaque — we more-or-less can’t inspect or intervene on their thoughts — and it isn’t clear how we could navigate to AI approaches that are far less opaque, and that can carry forward to AGI. (Though it seems very likely such approaches exist somewhere in the space of AI research approaches.)

Yeah, it does seem like interpreterability is a bottleneck for a lot of alignment proposals, and in particular as long as neutral networks are essentially black boxes, deceptive alignment/inner alignment issues seem almost impossible to address.
- Rob Bensinger 8 Oct 2022 2:29 UTC
  LW: 2 AF: 2
  0
  AF Parent
  Seems right to me.