We’ll presumably need to give O some information about the goal / target configuration set for each task.
I was imagining that the tasks can come equipped with some specification, but some sort of counterfactual also makes sense. This also gets around issues of the AI system not being appropriately “motivated”—e.g. I might be capable of performing the task “lock up puppies in cages”, but I wouldn’t do it, and so if you only look at my behavior you couldn’t say that I was capable of doing that task.
But this doesn’t really get at the spirit of Paul’s idea, which I think is about really looking inside the AI and understanding its goals.
+1 to all of this.
I was imagining that the tasks can come equipped with some specification, but some sort of counterfactual also makes sense. This also gets around issues of the AI system not being appropriately “motivated”—e.g. I might be capable of performing the task “lock up puppies in cages”, but I wouldn’t do it, and so if you only look at my behavior you couldn’t say that I was capable of doing that task.
+1 especially to this