That is explicitly why the predictor is scored on how well it fulfills the user’s values, and not merely on how well it predicts them. I noted that an AI merely trained to predict would likely destroy the world and do things like dissecting human brains to better model our values.
Yep, I understood that you intended for the Predictor to also/primarily be scored on how well it fulfills the User’s values.
I’m modeling our disagreement something like this:
Aiyen: It could be a good idea to directly incentivize a powerful AI to learn to predict humans, so long as one also directly incentivizes it to optimize for humans values.
rvnnt: Directly incentivizing a powerful AI to learn to predict humans would likely lead to the AI allocating at least some fraction of its (eventually vast) resources to e.g. simulating humans experiencing horrible things. Thus it would probably be a very bad idea to directly incentivize a powerful AI to learn to predict humans, even if one also incentivizes it to optimize for human values.
Does that seem roughly correct to you?
(If yes, I’m curious how you’d guarantee that the Predictor does not end up allocating lots of resources to some kind of mindcrime?)
That is explicitly why the predictor is scored on how well it fulfills the user’s values, and not merely on how well it predicts them. I noted that an AI merely trained to predict would likely destroy the world and do things like dissecting human brains to better model our values.
Yep, I understood that you intended for the Predictor to also/primarily be scored on how well it fulfills the User’s values.
I’m modeling our disagreement something like this:
Aiyen: It could be a good idea to directly incentivize a powerful AI to learn to predict humans, so long as one also directly incentivizes it to optimize for humans values.
rvnnt: Directly incentivizing a powerful AI to learn to predict humans would likely lead to the AI allocating at least some fraction of its (eventually vast) resources to e.g. simulating humans experiencing horrible things. Thus it would probably be a very bad idea to directly incentivize a powerful AI to learn to predict humans, even if one also incentivizes it to optimize for human values.
Does that seem roughly correct to you? (If yes, I’m curious how you’d guarantee that the Predictor does not end up allocating lots of resources to some kind of mindcrime?)