I thought the idea of a corrigible AI is that you’re trying to build something that isn’t itself independent and agentic, but will help you in your goals regardless.
Hmm, no I mean something broader than this, something like “humans ultimately have control and will decide what happens”. In my usage of the word, I would count situations where humans instruct their AIs to go and acquire as much power as possible for them while protecting them and then later reflect and decide what to do with this power. So, in this scenario, the AI would be arbitrarily agentic and autonomous.
Corrigibility would be as opposed to humanity e.g. appointing a succesor which doesn’t ultimately point back to some human driven process.
I would count various indirect normativity schemes here and indirect normativity feels continuous with other forms of oversight in my view (the main difference is oversight over very long time horizons such that you can’t train the AI based on it’s behavior over that horizon).
I’m not sure if my usage of the term is fully standard, but I think it roughly matches how e.g. Paul Christiano uses the term.
Hmm, no I mean something broader than this, something like “humans ultimately have control and will decide what happens”. In my usage of the word, I would count situations where humans instruct their AIs to go and acquire as much power as possible for them while protecting them and then later reflect and decide what to do with this power. So, in this scenario, the AI would be arbitrarily agentic and autonomous.
Corrigibility would be as opposed to humanity e.g. appointing a succesor which doesn’t ultimately point back to some human driven process.
I would count various indirect normativity schemes here and indirect normativity feels continuous with other forms of oversight in my view (the main difference is oversight over very long time horizons such that you can’t train the AI based on it’s behavior over that horizon).
I’m not sure if my usage of the term is fully standard, but I think it roughly matches how e.g. Paul Christiano uses the term.