Feedback: I clicked through to the provided answer and had a great deal of difficulty understanding how it was relevant—it makes a number of assumptions about agents and utility functions and I wasn’t able to connect it to why I should expect an agent trained using CIRL to kill me.
FWIW here’s my alternative answer:
CIRL agents are bottlenecked on the human overseer’s ability to provide them with a learning signal through demonstration or direct communication. This is unlikely to scale to superhuman abilities in the agent, so superintelligent agents simply will not be trained using CIRL.
Feedback: I clicked through to the provided answer and had a great deal of difficulty understanding how it was relevant—it makes a number of assumptions about agents and utility functions and I wasn’t able to connect it to why I should expect an agent trained using CIRL to kill me.
FWIW here’s my alternative answer:
In other words it’s only a solution to “Learn from Teacher” in Paul’s 2019 decomposition of alignment, not to the whole alignment problem.