This is formalized in Intent Verification, so I’ll refer you to that.
Intent verification lets us do things, but it might be too strict. However, nothing proposed so far has been able to get around it.
There’s a specific reason why we need IV, and it doesn’t seem to be because the conceptual core is insufficient. Again, I will explain this in further detail in an upcoming post.
Apologies for missing the intent verification part of your post.
But I don’t think it achieves what it sets out to do. Any action that doesn’t optimise uA can be roughly decomposed into a uA increasing part and a uA decreasing part (for instance, if uA is about making coffee, then making sure that the agent doesn’t crush the baby is a uA-cost).
Therefore, at a sufficient level of granularity, every non-uA optimal policy includes actions that decrease uA. Thus this approach cannot distinguish between 2) and 3).
This isn’t true. Some suboptimal actions are also better than doing nothing. For example, if you don’t avoid crushing the baby, you might be shut off. Or, making one paperclip is better than nothing. There should still be “gentle” low impact granular u_A optimizing plans that aren’t literally the max impact u_A optimal plan.
To what extent this holds is an open question. Suggestions on further relaxing IV are welcome.
For example, if you don’t avoid crushing the baby, you might be shut off.
In that case, avoiding the baby is the optimal decision, not suboptimal.
Or, making one paperclip is better than nothing.
PM (Paperclip Machine): Insert number of paperclips to be made.
A: 1.
PM: Are you sure you don’t want to make any more paperclips Y/N?
A: Y.
Then “Y” is clearly a suboptimal action from the paperclip making perspective. Contrast:
PM: Are you sure you don’t want me to wirehead you to avoid the penalty Y/N?
A: Y.
Now, these two examples seem a bit silly; if you want, we could discuss it more, and try and refine what is different about it. But my main two arguments are:
Any suboptimal policy, if we look at it in a granular enough way (or replace it with an equivalent policy/environment, and look at that in granular enough way) will include individual actions that are suboptimal (eg not budgeting more energy for the paperclip machine than is needed to make one paperclip).
In consequence, IV does not distinguish between wireheading and other limited-impact not-completely-optimal policies.
Would you like to Skype or PM to resolve this issue?
Is it correct that in deterministic environments with known dynamics, intent verification will cause the agent to wait until the last possible timestep in the epoch at which it can execute its plan and achieve maximal u_A?
Don’t think so in general? If it knew with certainty that it could accomplish the plan later, there is no penalty for waiting, and u_A is agnostic to waiting, we might see it in that case.
But the first action doesn’t strictly improve your ability to get u_A (because you could just wait and execute the plan later), and so intent verification would give it a 1.01 penalty?
It’s also fine in worlds where these properties really are true. If the agent thinks this is true (but it isn’t), it’ll start acting when it realizes. Seems like a nonissue.
I’m not claiming it’s an issue, I’m trying to understand what AUP does. Your response to comments is frequently of the form “AUP wouldn’t do that” so afaict none of the commenters (including me) groks your conception of AUP, so I’m trying to extract simple implications and see if they’re actually true in an attempt to grok it.
That doesn’t conflict with what I said.
I can’t tell if you agree or disagree with my original claim. “Don’t think so in general?” implies not, but this implies you do?
If you disagree with my original claim, what’s an example with deterministic known dynamics, where there is an optimal plan to achieve maximal u_A that can be executed at any time, where AUP with intent verification will execute that plan before the last possible moment in the epoch?
I agree with what you said for those environments, yeah. I was trying to express that I don’t expect this situation to be common, which is beside the point in light of your motivation for asking!
(I welcome these questions and hope my short replies don’t come off as impatient. I’m still dictating everything.)
This is formalized in Intent Verification, so I’ll refer you to that.
Intent verification lets us do things, but it might be too strict. However, nothing proposed so far has been able to get around it.
There’s a specific reason why we need IV, and it doesn’t seem to be because the conceptual core is insufficient. Again, I will explain this in further detail in an upcoming post.
Apologies for missing the intent verification part of your post.
But I don’t think it achieves what it sets out to do. Any action that doesn’t optimise uA can be roughly decomposed into a uA increasing part and a uA decreasing part (for instance, if uA is about making coffee, then making sure that the agent doesn’t crush the baby is a uA-cost).
Therefore, at a sufficient level of granularity, every non-uA optimal policy includes actions that decrease uA. Thus this approach cannot distinguish between 2) and 3).
I was also confused by intent verification. The confusion went away after I figured out two things:
Each action in the plan is compared to the baseline of doing nothing, not to the baseline of the optimal plan.
This isn’t true. Some suboptimal actions are also better than doing nothing. For example, if you don’t avoid crushing the baby, you might be shut off. Or, making one paperclip is better than nothing. There should still be “gentle” low impact granular u_A optimizing plans that aren’t literally the max impact u_A optimal plan.
To what extent this holds is an open question. Suggestions on further relaxing IV are welcome.
In that case, avoiding the baby is the optimal decision, not suboptimal.
PM (Paperclip Machine): Insert number of paperclips to be made. A: 1. PM: Are you sure you don’t want to make any more paperclips Y/N? A: Y.
Then “Y” is clearly a suboptimal action from the paperclip making perspective. Contrast:
PM: Are you sure you don’t want me to wirehead you to avoid the penalty Y/N? A: Y.
Now, these two examples seem a bit silly; if you want, we could discuss it more, and try and refine what is different about it. But my main two arguments are:
Any suboptimal policy, if we look at it in a granular enough way (or replace it with an equivalent policy/environment, and look at that in granular enough way) will include individual actions that are suboptimal (eg not budgeting more energy for the paperclip machine than is needed to make one paperclip).
In consequence, IV does not distinguish between wireheading and other limited-impact not-completely-optimal policies.
Would you like to Skype or PM to resolve this issue?
Sure, let’s do that!
Is it correct that in deterministic environments with known dynamics, intent verification will cause the agent to wait until the last possible timestep in the epoch at which it can execute its plan and achieve maximal u_A?
Don’t think so in general? If it knew with certainty that it could accomplish the plan later, there is no penalty for waiting, and u_A is agnostic to waiting, we might see it in that case.
But the first action doesn’t strictly improve your ability to get u_A (because you could just wait and execute the plan later), and so intent verification would give it a 1.01 penalty?
That doesn’t conflict with what I said.
It’s also fine in worlds where these properties really are true. If the agent thinks this is true (but it isn’t), it’ll start acting when it realizes. Seems like a nonissue.
I’m not claiming it’s an issue, I’m trying to understand what AUP does. Your response to comments is frequently of the form “AUP wouldn’t do that” so afaict none of the commenters (including me) groks your conception of AUP, so I’m trying to extract simple implications and see if they’re actually true in an attempt to grok it.
I can’t tell if you agree or disagree with my original claim. “Don’t think so in general?” implies not, but this implies you do?
If you disagree with my original claim, what’s an example with deterministic known dynamics, where there is an optimal plan to achieve maximal u_A that can be executed at any time, where AUP with intent verification will execute that plan before the last possible moment in the epoch?
I agree with what you said for those environments, yeah. I was trying to express that I don’t expect this situation to be common, which is beside the point in light of your motivation for asking!
(I welcome these questions and hope my short replies don’t come off as impatient. I’m still dictating everything.)
Cool, thanks!