I found Section 6 particularly interesting! Here’s how I understand it:
Most of our worries about AI stem from catastrophic scenarios, like AI killing everyone.
It seems that to prevent these outcomes, we don’t need to do extremely complex things, such as pointing AI towards the extrapolated values of humanity.
Therefore, we don’t need to focus on instilling a perfect copy of human values into AI systems.
From my understanding, this context relates to the “be careful what you wish for” problem with AI, where AI could optimize in dangerous or unexpected ways. There’s a race here: can we control AI well enough to still gain its benefits?
However, I don’t think you’ve provided enough evidence that this level of control is actually possible. Additionally, there’s the issue of deceptive alignment—I’m not convinced we could manage this “race” without receiving some kind of feedback from AI systems.
Finally, the description of the oracle AI in this section seems quite similar to the idea of corrigible AI.
I found Section 6 particularly interesting! Here’s how I understand it:
Most of our worries about AI stem from catastrophic scenarios, like AI killing everyone.
It seems that to prevent these outcomes, we don’t need to do extremely complex things, such as pointing AI towards the extrapolated values of humanity.
Therefore, we don’t need to focus on instilling a perfect copy of human values into AI systems.
From my understanding, this context relates to the “be careful what you wish for” problem with AI, where AI could optimize in dangerous or unexpected ways. There’s a race here: can we control AI well enough to still gain its benefits?
However, I don’t think you’ve provided enough evidence that this level of control is actually possible. Additionally, there’s the issue of deceptive alignment—I’m not convinced we could manage this “race” without receiving some kind of feedback from AI systems.
Finally, the description of the oracle AI in this section seems quite similar to the idea of corrigible AI.