(In the following I will also use “aligned” to mean “intent aligned”.)
The human+oracle system is not aligned in situations where the human would pose such questions.
Ok, sounds like “intent aligned at some points in time and not at others” was the closest guess. To confirm, would you endorse “the system was aligned when the human imitation was still trying to figure out what questions to ask the oracle (since the system was still only trying to do what H wants), and then due to its own incompetence became not aligned when the oracle started working on the unsafe question”?
Given that intent alignment in this sense seems to be property of a system+situation instead of the system itself, how would you define when the “intent alignment problem” has been solved for an AI, or when would you call an AI (such as IDA) itself “intent aligned”? (When we can reasonably expect to keep it out of situations where its alignment fails, for some reasonable amount of time, perhaps?) Or is it the case that whenever you use “intent alignment” you always have some specific situation or set of situations in mind?
(In the following I will also use “aligned” to mean “intent aligned”.)
Ok, sounds like “intent aligned at some points in time and not at others” was the closest guess. To confirm, would you endorse “the system was aligned when the human imitation was still trying to figure out what questions to ask the oracle (since the system was still only trying to do what H wants), and then due to its own incompetence became not aligned when the oracle started working on the unsafe question”?
Given that intent alignment in this sense seems to be property of a system+situation instead of the system itself, how would you define when the “intent alignment problem” has been solved for an AI, or when would you call an AI (such as IDA) itself “intent aligned”? (When we can reasonably expect to keep it out of situations where its alignment fails, for some reasonable amount of time, perhaps?) Or is it the case that whenever you use “intent alignment” you always have some specific situation or set of situations in mind?