Steven Byrnes comments on [Intro to brain-like-AGI safety] 14. Controlled AGI

Steven Byrnes Jun 16, 2022, 9:07 PM
3 points
it would start taking more and more work for humans to analyze its plans and determine how much flourishing is in them
I’m not sure where you’re getting that. The thing I described in my last comment did not include the humans analyzing the AI’s plans, it only involved the humans labeling YouTube videos.
It would be lovely if humans could reliably analyze the AI’s plans. But I fear that our interpretability techniques will not be up to that challenge.
we will have no way to determine if its thought assessor generalizes wrongly
I agree, see §14.4.
- MSRayne Jun 16, 2022, 9:34 PM
  3 points
  Parent
  Ah, sorry, I misunderstood you.