The premise that “infinite value” is possible, is an assumption.
This seems a bit like the presumption that “divide by zero” is possible. Assigning a probability to the possibility that divide by zero results in a value doesn’t make sense, I think, because the logical rules themselves rules this out.
However, if I look at this together with your earlier post (http://web.archive.org/web/20230317162246/https://www.lesswrong.com/posts/dPCpHZmGzc9abvAdi/orthogonality-thesis-is-wrong):
I think I get where you’re coming from in that if the agent can conceptualise that (many) (extreme) high-value states are possible where those values are not yet known to it, yet still plans for those value possibilities in some kind of “RL discovery process”, then internal state-value optimisation converges on power-seeking behaviour — as optimal for reaching the expected value of such states in the future (this further assumes that the agent’s prior distribution lines up – eg. assumes unknown positive values are possible, does not have a prior distribution that is hugely negatively skewed over negative rewards).
I think initially specifying premises such as these more precisely initially ensures the reasoning from there is consistent/valid. The above would not apply to any agent, nor even to any “AGI” (a fuzzy term; I would define it more specifically as “fully-autonomous, cross-domain-optimising, artificial machinery”
He didn’t say that “infinite value” is logically impossible. He desdribed it as an assumption.
When saying “is possible, I’m not sure if he meant “is possible (conceptually)” or “is possible (according to the ontology/optimization-criteria of any given agent)”. I think the latter would be most sensible.
He later said: “I think initially specifying premises such as these more precisely initially ensures the reasoning from there is consistent/valid.”. Not sure if I interpreted him correctly, but I saw it largely as an encouragment to think more explicitly about things like these (not be sloppy about it). Or if not an encouragement to do that, then at least pointing out that it’s something you’re currently not doing.
If we have a traditional/standard utility-function, and use traditional/standard math in regards to that utility function, then involving credences of infinitie-utility outcomes would typically make things “break down” (with most actions considered to have expected utilities that are either infinite or undefined).
Like, suppose action A has 0.001% chance of infinite negative utility and 99% chance of infinite positive utility. The utility of that action would, I think, be undefined (I haven’t looked into it). I can tell for sure that mathemathically it would not be regarded to have positive utility. Here is a video that explains why.
If that doesn’t make intuitive sense to you, then that’s fine. But mathemathically that’s how it is. And that’s something to have awareness of (account for in a non-handwavy way) if you’re trying to make a mathemathical argument with a basis in utility functions that deal with infinities.
Even if you did account for that it would be besides the point from my perspective, in more ways than one. So what we’re discussing now is not actually a crux for me.
Like, suppose action A has 0.001% chance of infinite negative utility and 99% chance of infinite positive utility. The utility of that action would, I think, be undefined
For me personally, it would of course make a big difference whether there is a 0.00000001% chance of infinite positive utility or a 99.999999999% chance. But that is me going with my own intuitions. The standard math relating to EV-calculations doesn’t support this.
Do you think you can deny existence of an outcome with infinite utility? The fact that things “break down” is not a valid argument. If you cannot deny—it’s possible. And it it’s possible—alignment impossible.
Do you think you can deny existence of an outcome with infinite utility?
To me, according to my preferences/goals/inclinations, there are conceivable outcomes with infinite utility/disutility.
But I think it is possible (and feasible) for a program/mind to be extremely capable, and affect the world, and not “care” about infinite outcomes.
The fact that things “break down” is not a valid argument.
I guess that depends on what’s being discussed. Like, it is something to take into account/consideration if you want to prove something while referencing utility-functions that reference infinities.
But I think it is possible (and feasible) for a program/mind to be extremely capable, and affect the world, and not “care” about infinite outcomes.
As I understand you do not agree with
If an outcome with infinite utility is presented, then it doesn’t matter how small its probability is: all actions which lead to that outcome will have to dominate the agent’s behavior.
from Pascal’s Mugging, not with me. Do you have any arguments for that?
I do have arguments for that, and I have already mentioned some of them earlier in our discussion (you may not share that assesment, despite us being relatively close in mind-space compared to most possible minds, but oh well).
Some of the more relevant comments from me are on one of the posts that you deleted.
As I mention here, I think I’ll try to round off this discussion. (Edit: I had a malformed/misleading sentence in that comment that should be fixed now.)
The premise that “infinite value” is possible, is an assumption.
This seems a bit like the presumption that “divide by zero” is possible. Assigning a probability to the possibility that divide by zero results in a value doesn’t make sense, I think, because the logical rules themselves rules this out.
However, if I look at this together with your earlier post (http://web.archive.org/web/20230317162246/https://www.lesswrong.com/posts/dPCpHZmGzc9abvAdi/orthogonality-thesis-is-wrong): I think I get where you’re coming from in that if the agent can conceptualise that (many) (extreme) high-value states are possible where those values are not yet known to it, yet still plans for those value possibilities in some kind of “RL discovery process”, then internal state-value optimisation converges on power-seeking behaviour — as optimal for reaching the expected value of such states in the future (this further assumes that the agent’s prior distribution lines up – eg. assumes unknown positive values are possible, does not have a prior distribution that is hugely negatively skewed over negative rewards).
I think initially specifying premises such as these more precisely initially ensures the reasoning from there is consistent/valid. The above would not apply to any agent, nor even to any “AGI” (a fuzzy term; I would define it more specifically as “fully-autonomous, cross-domain-optimising, artificial machinery”
Why do you think “infinite value” is logically impossible? Scientists do not dismiss possibility that the universe is infinite. https://bigthink.com/starts-with-a-bang/universe-infinite/
He didn’t say that “infinite value” is logically impossible. He desdribed it as an assumption.
When saying “is possible, I’m not sure if he meant “is possible (conceptually)” or “is possible (according to the ontology/optimization-criteria of any given agent)”. I think the latter would be most sensible.
He later said: “I think initially specifying premises such as these more precisely initially ensures the reasoning from there is consistent/valid.”. Not sure if I interpreted him correctly, but I saw it largely as an encouragment to think more explicitly about things like these (not be sloppy about it). Or if not an encouragement to do that, then at least pointing out that it’s something you’re currently not doing.
If we have a traditional/standard utility-function, and use traditional/standard math in regards to that utility function, then involving credences of infinitie-utility outcomes would typically make things “break down” (with most actions considered to have expected utilities that are either infinite or undefined).
Like, suppose action A has 0.001% chance of infinite negative utility and 99% chance of infinite positive utility. The utility of that action would, I think, be undefined (I haven’t looked into it). I can tell for sure that mathemathically it would not be regarded to have positive utility. Here is a video that explains why.
If that doesn’t make intuitive sense to you, then that’s fine. But mathemathically that’s how it is. And that’s something to have awareness of (account for in a non-handwavy way) if you’re trying to make a mathemathical argument with a basis in utility functions that deal with infinities.
Even if you did account for that it would be besides the point from my perspective, in more ways than one. So what we’re discussing now is not actually a crux for me.
For me personally, it would of course make a big difference whether there is a 0.00000001% chance of infinite positive utility or a 99.999999999% chance. But that is me going with my own intuitions. The standard math relating to EV-calculations doesn’t support this.
Do you think you can deny existence of an outcome with infinite utility? The fact that things “break down” is not a valid argument. If you cannot deny—it’s possible. And it it’s possible—alignment impossible.
To me, according to my preferences/goals/inclinations, there are conceivable outcomes with infinite utility/disutility.
But I think it is possible (and feasible) for a program/mind to be extremely capable, and affect the world, and not “care” about infinite outcomes.
I guess that depends on what’s being discussed. Like, it is something to take into account/consideration if you want to prove something while referencing utility-functions that reference infinities.
As I understand you do not agree with
from Pascal’s Mugging, not with me. Do you have any arguments for that?
I do have arguments for that, and I have already mentioned some of them earlier in our discussion (you may not share that assesment, despite us being relatively close in mind-space compared to most possible minds, but oh well).
Some of the more relevant comments from me are on one of the posts that you deleted.
As I mention here, I think I’ll try to round off this discussion. (Edit: I had a malformed/misleading sentence in that comment that should be fixed now.)