I’m saying that, for the same reason that myopic agents think about rocks the same way non-myopic agents think about rocks, also myopic agents will care about long-term stuff the same way non-myopic agents do. The thinking needed to make cool stuff happen generalizes like the thinking needed to deal with rocks. So yeah, you can say “myopic agents by definition don’t care about long-term stuff”, but if by care you mean the thing that actually matters, the thing about causing stuff to happen, then you’ve swept basically the entire problem under the rug.
They *could*, but we don’t know how to separate caring from thinking, modeling, having effects; and the first 1000 programs that think about long term stuff that you find just by looking for programs that think about long term stuff, also care about long term stuff.
What you’re saying seems to contradict the orthogonality thesis. Intelligence level and goals are independent, or at least not tightly interdependent.
Let’s use the common example of a paperclip maximizer. Maximizing total long-term paperclips is a strange goal for an agent to have, but most people in AI alignment think it’s possible that an AI could be trained to optimize for something like thislike this could in principle emerge from training (though we don’t know how to reliably train one on purpose).
Now why couldn’t an agent by motivated to maximize short-term paperclips? It wants more paperclips, but it will always take 1 paperclip now over 1 or even 10 or 100 a minute in the future. It wants paperclips ASAP. This is one contrived example of what a myopic AI might look like—a myopic paperclip maximizer.
I was trying to contrast the myopic paperclip maximizer idea with the classic paperclip maximizer. Perhaps “long-term” was a lousy choice of words. What would be better: simple paperclip maximizer, unconditional paperclip maximizer, or something?
Update: On second thought, maybe what you were getting at is that it’s not clear how to deliberately train a paperclip maximizer in the current paradigm. If you tried, you’d likely end up with a mesa-optimizer on some unpredictable proxy objective, like a deceptively aligned steel maximizer.
Yes, I’m saying that AIs are very likely to have (in a broad sense, including e.g. having subagents that have) long-term goals.
Now why couldn’t an agent by motivated to maximize short-term paperclips?
It *could*, but I’m saying that making an AI like that isn’t like choosing a loss function for training, because long-term thinking is convergent.
Your original comment said:
I can’t see anything unnatural about an agent that has both consequentialist reasoning capabilities and a high time preference.
This is what I’m arguing against. I’m saying it’s very unnatural. *Possible*, but very unnatural.
And:
This means that it would never sacrifice reward now for reward later, and so it would essentially be exempt from instrumental convergence.
This sounds like you’re saying that myopia *makes* there not be convergent instrumental goals. I’m saying myopia basically *implies* there not being convergent instrumental goals, and therefore is at least as hard as making there not be CIGs.
most people in AI alignment think it’s possible that an AI could be trained to optimize for something like this.
I don’t think we have any idea how to do this. If we knew how to get an AGI system to reliably maximize the number of paperclips in the universe, that might be most of the (strawberry-grade) alignment problem solved right there.
You’re right, my mistake—of course we don’t know how to deliberately and reliably train a paperclip maximizer. I’ve updated the parent comment now to say:
most people in AI alignment think it’s possible that an AI like this could in principle emerge from training (though we don’t know how to reliably train one on purpose).
It feels like you are setting a discount rate higher than reality demands. A rationally intelligent agent should wind up with a discount rate that matches reality (e.g. in this case, probably the rate at which paper clips decay or the global real rate of interest).
I’m saying that, for the same reason that myopic agents think about rocks the same way non-myopic agents think about rocks, also myopic agents will care about long-term stuff the same way non-myopic agents do. The thinking needed to make cool stuff happen generalizes like the thinking needed to deal with rocks. So yeah, you can say “myopic agents by definition don’t care about long-term stuff”, but if by care you mean the thing that actually matters, the thing about causing stuff to happen, then you’ve swept basically the entire problem under the rug.
Why can myopic agents not think about long-term stuff the same way as non-myopic agents but still not care about long-term stuff?
They *could*, but we don’t know how to separate caring from thinking, modeling, having effects; and the first 1000 programs that think about long term stuff that you find just by looking for programs that think about long term stuff, also care about long term stuff.
What you’re saying seems to contradict the orthogonality thesis. Intelligence level and goals are independent, or at least not tightly interdependent.
Let’s use the common example of a paperclip maximizer. Maximizing total long-term paperclips is a strange goal for an agent to have, but most people in AI alignment think it’s possible that an AI
could be trained to optimize for something like thislike this could in principle emerge from training (though we don’t know how to reliably train one on purpose).Now why couldn’t an agent by motivated to maximize short-term paperclips? It wants more paperclips, but it will always take 1 paperclip now over 1 or even 10 or 100 a minute in the future. It wants paperclips ASAP. This is one contrived example of what a myopic AI might look like—a myopic paperclip maximizer.
I don’t think we could train an AI to optimize for long-term paperclips. Maybe I’m not “most people in AI alignment” but still, just saying.
I was trying to contrast the myopic paperclip maximizer idea with the classic paperclip maximizer. Perhaps “long-term” was a lousy choice of words. What would be better: simple paperclip maximizer, unconditional paperclip maximizer, or something?
Update: On second thought, maybe what you were getting at is that it’s not clear how to deliberately train a paperclip maximizer in the current paradigm. If you tried, you’d likely end up with a mesa-optimizer on some unpredictable proxy objective, like a deceptively aligned steel maximizer.
Yes, I’m saying that AIs are very likely to have (in a broad sense, including e.g. having subagents that have) long-term goals.
It *could*, but I’m saying that making an AI like that isn’t like choosing a loss function for training, because long-term thinking is convergent.
Your original comment said:
This is what I’m arguing against. I’m saying it’s very unnatural. *Possible*, but very unnatural.
And:
This sounds like you’re saying that myopia *makes* there not be convergent instrumental goals. I’m saying myopia basically *implies* there not being convergent instrumental goals, and therefore is at least as hard as making there not be CIGs.
I don’t think we have any idea how to do this. If we knew how to get an AGI system to reliably maximize the number of paperclips in the universe, that might be most of the (strawberry-grade) alignment problem solved right there.
You’re right, my mistake—of course we don’t know how to deliberately and reliably train a paperclip maximizer. I’ve updated the parent comment now to say:
It feels like you are setting a discount rate higher than reality demands. A rationally intelligent agent should wind up with a discount rate that matches reality (e.g. in this case, probably the rate at which paper clips decay or the global real rate of interest).