I mostly agree with that tradeoff—a perfect humanity empowering agent could still result in sub-optimal futures if it empowers us and we then make mistakes, vs what could be achieved by a theoretical fully aligned sovereign. But that really doesn’t seem so bad, and also may not be likely as empowering us probably entails helping us with better future modeling.
In practice the closest we may get to a fully aligned sovereign is some form of uploading, because practical strong optimization in full alignment with our brain’s utility function probably requires extracting and empowering functional equivalents to much of the brain’s valence/value circuits.
So the ideal scenario is probably AI that helps us upload and then hands over power.
“that” here refers to “a perfect humanity empowering agent” which hands power over to humanity. In that sense it’s not that different from us advancing without AI. So if you think that’s extremely bad because you are assuming only a narrow subset of humanity is empowered, well, that isn’t what I meant by “a perfect humanity empowering agent”. If you still think that’s extremely bad even if humanity is empowered broadly then you seem to just think that humanity advancing without AI would be extremely bad. In that case I think you are expecting too much of your AI and we have more fundamental disagreements.
If you still think that’s extremely bad even if humanity is empowered broadly then you seem to just think that humanity advancing without AI would be extremely bad. In that case I think you are expecting too much of your AI and we have more fundamental disagreements.
Humans usually put up lots of restrictions that reduce empowerment in favor of safety. I think we can be excessive about such restrictions, but I don’t think they are always a bad idea, and instead think that if you totally removed them, you would probably make the world much worse. Examples of things that seem like a good idea to me:
Putting up fences to prevent falling off stairs, even though this disempowers you from jumping down the stairs.
Some restrictions on sale of dangerous drugs.
Electrical sockets are designed to not lead to exposed high-voltage wires.
And the above are just things that are mainly designed to protect you from yourself. If we also count disempowering people to prevent them from harming others, then I support bans and limits on many kinds of weapon sales, and I think it would be absolutely terrible if an AI taught people a simple way to build a nuke in their garage.
Your examples are just examples of empowerment tradeoffs.
Fences that prevent you from falling off stairs can be empowering because death or disability are (maximally, and extremely) disempowering.
Same with drugs and sockets. Precomitting to a restriction on your future ability to use some dangerous addictive drug can increase empowerment, because addiction is highly disempowering. I don’t think you are correctly modelling long term empowerment.
I think in order to generally model this as disempowering, you need a model of human irrationality, as if you instead model humans as rational utility maximizers, we wouldn’t make major simple avoidable mistakes that we would need protection from.
But modelling human irrationality seems like a difficult and ill-posed problem, which contains most of the difficulty of the alignment problem.
The difficulties this leads to in practice is what to do when writing “empowerment” into the the utility function from your AI; how do you specify that it is human-level rationality that must be empowered, rather than ideal utility maximizers?
My comment began as a discourse of why practical agents are not really utility argmaxers (due to the optimizer’s curse).
You do not need to model human irrationality and it is generally a mistake to do so.
Consider a child who doesn’t understand that the fence is to prevent them from falling off stairs. It would be a mistake to optimize for the child’s empowerment using their limited irrational world model. It is correct to use the AI’s more powerful world model for computing empowerment, which results in putting up the fence (or equivalent) in situations where the AI models that as preventing the child from death or disability.
I mostly agree with that tradeoff—a perfect humanity empowering agent could still result in sub-optimal futures if it empowers us and we then make mistakes, vs what could be achieved by a theoretical fully aligned sovereign. But that really doesn’t seem so bad, and also may not be likely as empowering us probably entails helping us with better future modeling.
In practice the closest we may get to a fully aligned sovereign is some form of uploading, because practical strong optimization in full alignment with our brain’s utility function probably requires extracting and empowering functional equivalents to much of the brain’s valence/value circuits.
So the ideal scenario is probably AI that helps us upload and then hands over power.
It seems potentially extremely bad to me, since power could cause e.g. death, maiming or torture if wielded wrong.
“that” here refers to “a perfect humanity empowering agent” which hands power over to humanity. In that sense it’s not that different from us advancing without AI. So if you think that’s extremely bad because you are assuming only a narrow subset of humanity is empowered, well, that isn’t what I meant by “a perfect humanity empowering agent”. If you still think that’s extremely bad even if humanity is empowered broadly then you seem to just think that humanity advancing without AI would be extremely bad. In that case I think you are expecting too much of your AI and we have more fundamental disagreements.
Humans usually put up lots of restrictions that reduce empowerment in favor of safety. I think we can be excessive about such restrictions, but I don’t think they are always a bad idea, and instead think that if you totally removed them, you would probably make the world much worse. Examples of things that seem like a good idea to me:
Putting up fences to prevent falling off stairs, even though this disempowers you from jumping down the stairs.
Some restrictions on sale of dangerous drugs.
Electrical sockets are designed to not lead to exposed high-voltage wires.
And the above are just things that are mainly designed to protect you from yourself. If we also count disempowering people to prevent them from harming others, then I support bans and limits on many kinds of weapon sales, and I think it would be absolutely terrible if an AI taught people a simple way to build a nuke in their garage.
Your examples are just examples of empowerment tradeoffs.
Fences that prevent you from falling off stairs can be empowering because death or disability are (maximally, and extremely) disempowering.
Same with drugs and sockets. Precomitting to a restriction on your future ability to use some dangerous addictive drug can increase empowerment, because addiction is highly disempowering. I don’t think you are correctly modelling long term empowerment.
I think in order to generally model this as disempowering, you need a model of human irrationality, as if you instead model humans as rational utility maximizers, we wouldn’t make major simple avoidable mistakes that we would need protection from.
But modelling human irrationality seems like a difficult and ill-posed problem, which contains most of the difficulty of the alignment problem.
The difficulties this leads to in practice is what to do when writing “empowerment” into the the utility function from your AI; how do you specify that it is human-level rationality that must be empowered, rather than ideal utility maximizers?
My comment began as a discourse of why practical agents are not really utility argmaxers (due to the optimizer’s curse).
You do not need to model human irrationality and it is generally a mistake to do so.
Consider a child who doesn’t understand that the fence is to prevent them from falling off stairs. It would be a mistake to optimize for the child’s empowerment using their limited irrational world model. It is correct to use the AI’s more powerful world model for computing empowerment, which results in putting up the fence (or equivalent) in situations where the AI models that as preventing the child from death or disability.
Likewise for the other scenarios.