seems to me that you’re not getting at logical infallibility, since the AGI could be perfectly willing to be act humbly about its logical beliefs, but value infallibility or goal infallibility. An AI does not expect its goal statement to be fallible:
Which AI? As so often, an architecture dependent issue is being treated as a universal truth.
Figuring out how to give it the right conscience / right values is the open problem that MIRI and others care about!
The other mostly aren’t thinking in terms of “giving” …hardcoding ….values. There is a valid critique to be made of that assumption.
Which AI? As so often, an architecture dependent issue is being treated as a universal truth.
This statement maps to “programs execute their code.” I would be surprised if that were controversial.
The other mostly aren’t thinking in terms of “giving” …hardcoding ….values. There is a valid critique to be made of that assumption.
This was covered by the comment about “meta-values” earlier, and “Y being a fuzzy object itself,” which is probably not as clear as it could be. The goal management system grounds out somewhere, and that root algorithm is what I’m considering the “values” of the AI. If it can change its mind about what to value, the process it uses to change its mind is the actual fixed value. (If it can change its mind about how to change its mind, the fixedness goes up another level; if it can completely rewrite itself, now you have lost your ability to be confident in what it will do.)
Which AI? As so often, an architecture dependent issue is being treated as a universal truth.
This statement maps to “programs execute their code.” I would be surprised if that were controversial.
Humans can fail to realise the implications of uncontroversial statements. Humans are failing to realise that goal stability is architecture dependent.
This was covered by the comment about “meta-values” earlier, and “Y being a fuzzy object itself,” which is probably not as clear as it could be. The goal management system grounds out somewhere, and that root algorithm is what I’m considering the “values” of the AI.
But you shouldn’t be, at least in an un scare quoted sense of values. Goals and values aren’t descriptive labels for de facto behaviour. The goal if a paperclipper is to make paperclips; if it crashes, as an inevitable result of executing its code, we don’t say, ” Aha! It had the goal to crash all along”.
Goal stability doesn’t mean following code, since unstable systems follow their code too....using the actual meaning of “goal”.
Meta: trying to defend a claim by changing the meaning of its terms is doomed to failure.
Which AI? As so often, an architecture dependent issue is being treated as a universal truth.
The other mostly aren’t thinking in terms of “giving” …hardcoding ….values. There is a valid critique to be made of that assumption.
This statement maps to “programs execute their code.” I would be surprised if that were controversial.
This was covered by the comment about “meta-values” earlier, and “Y being a fuzzy object itself,” which is probably not as clear as it could be. The goal management system grounds out somewhere, and that root algorithm is what I’m considering the “values” of the AI. If it can change its mind about what to value, the process it uses to change its mind is the actual fixed value. (If it can change its mind about how to change its mind, the fixedness goes up another level; if it can completely rewrite itself, now you have lost your ability to be confident in what it will do.)
Humans can fail to realise the implications of uncontroversial statements. Humans are failing to realise that goal stability is architecture dependent.
But you shouldn’t be, at least in an un scare quoted sense of values. Goals and values aren’t descriptive labels for de facto behaviour. The goal if a paperclipper is to make paperclips; if it crashes, as an inevitable result of executing its code, we don’t say, ” Aha! It had the goal to crash all along”.
Goal stability doesn’t mean following code, since unstable systems follow their code too....using the actual meaning of “goal”.
Meta: trying to defend a claim by changing the meaning of its terms is doomed to failure.