The link to your paper is broken. I’ve read the Christiano piece. And some/most of the CEV paper, I think.
Any working intent alignment solution needs to prevent changing the intent of the human on purpose. That is a solvable problem with an AGI that understands the concept.
The problem with “understanding the concept of intent”—is that intent and goal formation are some of the most complex notions in the universe involving genetics, development, psychology, culture and everything in between. We have been arguing about what intent—and correlates like “well-being” mean—for the entire history of our civilization. It looks like we have a good set of no-nos (e.g. read the UN declaration on human rights) - but in terms of positive descriptions of good long term outcomes it gets fuzzy. There we have less guidance, though I guess trans- and post-humanism seems to be a desirable goal to many.
I intended to refer to understanding the concept of manipulation adequately to avoid it if the AGI “wanted” to.
As for understanding the concept of intent, I agree that “true” intent is very difficult to understand, particularly if it’s projected far into the future. That’s a huge problem for approaches like CEV. The virtue of the approach I’m suggesting is that it entirely bypasses that complexity (while introducing new problems). Instead of inferring “true” intent, the AGI just “wants” to do what the human principal tells it to do. The human gets to decide what their intent is. The machine just has to understand what the human meant by what they said- and the human can clarify that in a conversation. I’m thinking of this as do what I mean and check (DWIMAC) alignment. More on this in Instruction-following AGI is easier and more likely than value aligned AGI.
Thank you!
The link to your paper is broken. I’ve read the Christiano piece. And some/most of the CEV paper, I think.
Any working intent alignment solution needs to prevent changing the intent of the human on purpose. That is a solvable problem with an AGI that understands the concept.
Sorry, fixed broken link now.
The problem with “understanding the concept of intent”—is that intent and goal formation are some of the most complex notions in the universe involving genetics, development, psychology, culture and everything in between. We have been arguing about what intent—and correlates like “well-being” mean—for the entire history of our civilization. It looks like we have a good set of no-nos (e.g. read the UN declaration on human rights) - but in terms of positive descriptions of good long term outcomes it gets fuzzy. There we have less guidance, though I guess trans- and post-humanism seems to be a desirable goal to many.
I intended to refer to understanding the concept of manipulation adequately to avoid it if the AGI “wanted” to.
As for understanding the concept of intent, I agree that “true” intent is very difficult to understand, particularly if it’s projected far into the future. That’s a huge problem for approaches like CEV. The virtue of the approach I’m suggesting is that it entirely bypasses that complexity (while introducing new problems). Instead of inferring “true” intent, the AGI just “wants” to do what the human principal tells it to do. The human gets to decide what their intent is. The machine just has to understand what the human meant by what they said- and the human can clarify that in a conversation. I’m thinking of this as do what I mean and check (DWIMAC) alignment. More on this in Instruction-following AGI is easier and more likely than value aligned AGI.
I’ll read your article.