Can’t all of these concerns be reduced to a subset of the intent-alignment problem? If I tell the AI to “maximize ethical goodness” and it instead decides to “implement plans that sound maximally good to the user” or
“maximize my current guess of what the user meant by ethical goodness according to my possibly-bad philosophy,” that is different from what I intended, and thus the AI is unaligned.
If the AI starts off with some bad philosophy ideas just because it’s relatively unskilled in philosophy vs science, we can expect that 1) it will try very hard to get better at philosophy so that it can understand “what did the user mean by ‘maximize ethical goodness,’” and 2) it will try to preserve option value in the meantime so not much will be lost if its first guess was wrong. This assumes some base level of competence on the AI’s part, but if it can do groundbreaking science research, surely it can think of those two things (or we just tell it).
Can’t all of these concerns be reduced to a subset of the intent-alignment problem? If I tell the AI to “maximize ethical goodness” and it instead decides to “implement plans that sound maximally good to the user” or “maximize my current guess of what the user meant by ethical goodness according to my possibly-bad philosophy,” that is different from what I intended, and thus the AI is unaligned.
If the AI starts off with some bad philosophy ideas just because it’s relatively unskilled in philosophy vs science, we can expect that 1) it will try very hard to get better at philosophy so that it can understand “what did the user mean by ‘maximize ethical goodness,’” and 2) it will try to preserve option value in the meantime so not much will be lost if its first guess was wrong. This assumes some base level of competence on the AI’s part, but if it can do groundbreaking science research, surely it can think of those two things (or we just tell it).