I agree that (1) is an important consideration for AI going forward, but I don’t think it really applies until the AI has a definite goal. AFAICT the goal in developing systems like GPT is mostly ‘to see what they can do’.
I don’t fault anybody for GPT completing anachronistic counterfactuals—they’re fun and interesting. It’s a feature, not a bug. You could equally call it an alignment failure if GPT-4 starting being a wet blanket and giving completions like
Prompt: “In response to the Pearl Harbor attacks, Otto von Bismarck said”
Completion: “nothing, because he was dead.”
In contrast, a system like IBM Watson has a goal of producing correct answers, making it unambiguous what the aligned answer would be.
To be clear, I think the contest still works—I just think the ‘surprisingness’ condition hides a lot of complexity wrt what we expect in the first place.
I agree that (1) is an important consideration for AI going forward, but I don’t think it really applies until the AI has a definite goal. AFAICT the goal in developing systems like GPT is mostly ‘to see what they can do’.
I don’t fault anybody for GPT completing anachronistic counterfactuals—they’re fun and interesting. It’s a feature, not a bug. You could equally call it an alignment failure if GPT-4 starting being a wet blanket and giving completions like
In contrast, a system like IBM Watson has a goal of producing correct answers, making it unambiguous what the aligned answer would be.
To be clear, I think the contest still works—I just think the ‘surprisingness’ condition hides a lot of complexity wrt what we expect in the first place.