Not sure what you mean by “optimal behavior”. I think I can see how the things make sense if the starting point is that there is this things called “goals”, and (I, the mind/agent) am motivated to optimize for “goals”. But I don’t assume this as an obvious/universal starting-point (be that for minds in general, extremely intelligent minds in general, minds in general that are very capable and might have a big influence on the universe, etc).
This is a common mistake to assume, that if you don’t know your goal, then it does not exist (...)
My perspective is that even AIs that are (what I’d think of as) utility maximizes wouldn’t necessarily think in terms of “goals”.
The examples you list are related to humans. I agree that humans often have goals that they don’t have explicit awareness of. And humans may also often have as an attitude that it makes sense to be in a position to act upon goals that they form in the future. I think that is true for more types of intelligent entities than just humans, but I don’t think it generally/always is true for “minds in general”.
Caring more about future goals you may form in the future, compared e.g. goals others may have, is not a logical necessity IMO. It may feel “obvious” to us, but what to us are obvious instincts will often not be so for all (or even most) minds in the space of possible minds.
I guess there are different possible interpretations of “better”. I think it would be possible for software-programs to be much more mentally capable than me across most/all dimentions, and still not have “starting points” that I would consider “good” (for various interpretations of “good”).
As I understand you assume different starting-point.
I’m not sure. Like, it’s not as if I don’t have beliefs or assumptions or guesses relating to AIs. But I think I probably make less general/universal assumptions that I’d expect to hold for “all” [AIs / agents / etc].
Fitch’s paradox of knowability and Gödel’s incompleteness theorems prove that there may be true statements that are unknowable. For example “rational goal exists” may be true and unknowable. Therefore “rational goal may exist” is true. Therefore it is not an assumption. Do you agree?
Fitch’s paradox of knowability and Gödel’s incompleteness theorems prove that there may be true statements that are unknowable.
Independently of Gödel’s incompleteness theorems (which I have heard of) and Fitch’s paradox of knowability (which I had not heard of), I do agree that there can be true statements that are unknown/unknowable (including relatively “simple” ones) 🙂
For example “rational goal exists” may be true and unknowable. Therefore “rational goal may exist” is true. (...) Do you agree?
I don’t think it follows from “there may be statements that are true and unknowable” that “any particular statement may be true and unknowable”.
Also, some statements may be seen as non-sensical / ill-defined / don’t have a clear meaning.
Regarding the term “rational goal”, I think it isn’t well enough specified/clarified for me to agree or disagree about whether “rational goals” exist.
In regards to Gödel’s incompleteness theorem, I suspect “rational goal” (the way you think of it) probably couldn’t be defined clearly enough to be the kind of statement that Gödel was reasoning about.
I agree that not any statement may be true and unknowable. But to be honest most of statements that we can think of may be true and unknowable, for example “aliens exist”, “huge threats exist”, etc.
It seems that you do not recognize https://www.lesswrong.com/tag/pascal-s-mugging . Can you prove that there cannot be any unknowable true statement that could be used for Pascal’s mugging? Because that’s necessary if you want to prove Orthogonality thesis is right.
Not sure what you mean by “recognize”. I am familiar with the concept.
But to be honest most of statements that we can think of may be true and unknowable, for example “aliens exist”, “huge threats exist”, etc.
“huge threat” is a statement that is loaded with assumptions that not all minds/AIs/agents will share.
Can you prove that there cannot be any unknowable true statement that could be used for Pascal’s mugging?
Used for Pascal’s mugging against who? (Humans? Cofffee machines? Any AI that you would classify as an agent? Any AI that I would classify as an agent? Any highly intelligent mind with broad capabilities? Any highly intelligent mind with broad capabilities that has a big effect on the world?)
OK, let me rephrase my question. There is a phrase in Pascal’s Mugging
If an outcome with infinite utility is presented, then it doesn’t matter how small its probability is: all actions which lead to that outcome will have to dominate the agent’s behavior.
I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?
If an outcome with infinite utility is presented, then it doesn’t matter how small its probability is: all actions which lead to that outcome will have to dominate the agent’s behavior.
My perspective would probably be more similar to yours (maybe still with substantial differences) if I had the following assumptions:
All agents have a utility-function (or act indistinguishably from agents that do)
All agents where #1 is the case act in a pure/straight-forward way to maximize that utility-function (not e.g. discounting infinities)
All agents where #1 is the case have utility-functions that relate to states of the universe
Cases involving infinite positive/negative expected utility would always/typically speak in favor of one behavior/action. (As opposed to there being different possibilities that imply infinite negative/positive expected utility, and—well, not quite “cancel each other out”, but make it so that traditional models of utility-maximization sort of break down).
I think that I myself am an example of an agent. I am relatively utilitarian compared to most humans. Far-fetched possibilities with infinite negative/positive utility don’t dominate my behavior. This is not due to me not understanding the logic behind Pascal’s Muggings (I find the logic of it simple and straight-forward).
Generally I think you are overestimating the appropriateness/correctness/merit of using a “simple”/abstract model of agents/utility-maximizers, and presuming that any/most “agents” (as we more broadly conceive of that term) would work in accordance with that model.
I see that Google defines an agent as “a person or thing that takes an active role or produces a specified effect”. I think of it is cluster-like concept, so there isn’t really any definition that fully encapsulates how I’d use that term (generally speaking I’m inclined towards not just using it differently than you, but also using it less than you do here).
Btw, for one possible way to think about utility-maximizers (another cluster-like concept IMO), you could see this post. And here and here are more posts that describe “agency” in a similar way:
In this sort of view, being “agent-like” is more of gradual thing than a yes-no-thing. This aligns with my own internal model of “agentness”, but it’s not as if there is any simple/crisp definition that fully encapsulates my conception of “agentness”.
I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?
In regards to the first sentence (“I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist”):
No, I don’t agree with that.
In regards to the second sentence (“And I argue that an agent cannot be certain of that”):
I’m not sure what internal ontologies different “agents” would have. Maybe, like with us, may have some/many uncertainties that don’t correspond to clear numeric values.
In some sense, I don’t see “infinite certainty” as being appropriate in regards to (more or less) any belief. I would not call myself “infinitely certain” that moving my thumb slightly upwards right now won’t doom me to an eternity in hell, or that doing so won’t save me from an eternity in hell. But I’m confident enough that I don’t think it’s worth it for me to spend time/energy worrying about those particular “possibilities”.
I’d argue that the only reason you do not comply with Pascal’s mugging is because you don’t have unavoidable urge to be rational, which is not going to be the case with AGI.
Thanks for your input, it will take some time for me to process it.
I’d argue that the only reason you do not comply with Pascal’s mugging is because you don’t have unavoidable urge to be rational, which is not going to be the case with AGI.
I’d agree that among superhuman AGIs that we are likely to make, most would probably be prone towards rationality/consistency/”optimization” in ways I’m not.
I think there are self-consistent/”optimizing” ways to think/act that wouldn’t make minds prone to Pascal’s muggings.
For example, I don’t think there is anything logically inconsistent about e.g. trying to act so as to maximize the median reward, as opposed to the expected value of rewards (I give “median reward” as a simple example—that particular example doesn’t seem likely to me to occur in practice).
Thanks for your input, it will take some time for me to process it.
One more thought. I think it is wrong to consider Pascal’s mugging a vulnerability. Dealing with unknown probabilities has its utility:
Investments with high risk and high ROI
Experiments
Safety (eliminate threats before they happen)
Same traits that make us intelligent (ability to logically reason), make us power seekers. And this is going to be the same with AGI, just much more effective.
Same traits that make us intelligent (ability to logically reason), make us power seekers.
Well, I do think the two are connected/correlated. And arguments relating to instrumental convergence are a big part of why I take AI risk seriously. But I don’t think strong abilities in logical reasoning necessitates power-seeking “on its own”.
I think it is wrong to consider Pascal’s mugging a vulnerability.
For the record, I don’t think I used the word “vulnerability”, but maybe I phrased myself in a way that implied me thinking of things that way. And maybe I also partly think that way.
I’m not sure what I think regarding beliefs about small probabilities. One complication is that I also don’t have certainty in my own probability-guesstimates.
I’d agree that for smart humans it’s advisable to often/mostly think in terms of expected value, and to also take low-probability events seriously. But there are exceptions to this from my perspective.
In practice, I’m not much moved by the original Pascal’s Vager (and I’d find it hard to compare the probability of the Christian fantasy to other fantasies I can invent spontaneously in my head).
Sorry, but it seems to me that you are stuck with AGI analogy to humans without a reason. Many times human behavior does not correlate with AGI: humans do mass suicides, humans have phobias, humans take great risks for fun, etc. In other words—humans do not seek to be as rational as possible.
I agree that being skeptical towards Pascal’s Wager is reasonable, because there are many evidence that God is fictional. But this is not the case with “an outcome with infinite utility may exist”, there is just logic here, no hidden agenda, this is as fundamental as “I think therefore I am”. Nothing is more rational than complying with this. Don’t you think?
Not sure what you mean by “optimal behavior”. I think I can see how the things make sense if the starting point is that there is this things called “goals”, and (I, the mind/agent) am motivated to optimize for “goals”. But I don’t assume this as an obvious/universal starting-point (be that for minds in general, extremely intelligent minds in general, minds in general that are very capable and might have a big influence on the universe, etc).
My perspective is that even AIs that are (what I’d think of as) utility maximizes wouldn’t necessarily think in terms of “goals”.
The examples you list are related to humans. I agree that humans often have goals that they don’t have explicit awareness of. And humans may also often have as an attitude that it makes sense to be in a position to act upon goals that they form in the future. I think that is true for more types of intelligent entities than just humans, but I don’t think it generally/always is true for “minds in general”.
Caring more about future goals you may form in the future, compared e.g. goals others may have, is not a logical necessity IMO. It may feel “obvious” to us, but what to us are obvious instincts will often not be so for all (or even most) minds in the space of possible minds.
Thanks again.
As I understand you assume different starting-point. Why do you think your starting point is better?
I guess there are different possible interpretations of “better”. I think it would be possible for software-programs to be much more mentally capable than me across most/all dimentions, and still not have “starting points” that I would consider “good” (for various interpretations of “good”).
I’m not sure. Like, it’s not as if I don’t have beliefs or assumptions or guesses relating to AIs. But I think I probably make less general/universal assumptions that I’d expect to hold for “all” [AIs / agents / etc].
This post is sort of relevant to my perspective 🙂
Fitch’s paradox of knowability and Gödel’s incompleteness theorems prove that there may be true statements that are unknowable. For example “rational goal exists” may be true and unknowable. Therefore “rational goal may exist” is true. Therefore it is not an assumption. Do you agree?
Independently of Gödel’s incompleteness theorems (which I have heard of) and Fitch’s paradox of knowability (which I had not heard of), I do agree that there can be true statements that are unknown/unknowable (including relatively “simple” ones) 🙂
I don’t think it follows from “there may be statements that are true and unknowable” that “any particular statement may be true and unknowable”.
Also, some statements may be seen as non-sensical / ill-defined / don’t have a clear meaning.
Regarding the term “rational goal”, I think it isn’t well enough specified/clarified for me to agree or disagree about whether “rational goals” exist.
In regards to Gödel’s incompleteness theorem, I suspect “rational goal” (the way you think of it) probably couldn’t be defined clearly enough to be the kind of statement that Gödel was reasoning about.
I don’t think there are universally compelling arguments (more about that here).
I agree that not any statement may be true and unknowable. But to be honest most of statements that we can think of may be true and unknowable, for example “aliens exist”, “huge threats exist”, etc.
It seems that you do not recognize https://www.lesswrong.com/tag/pascal-s-mugging . Can you prove that there cannot be any unknowable true statement that could be used for Pascal’s mugging? Because that’s necessary if you want to prove Orthogonality thesis is right.
Not sure what you mean by “recognize”. I am familiar with the concept.
“huge threat” is a statement that is loaded with assumptions that not all minds/AIs/agents will share.
Used for Pascal’s mugging against who? (Humans? Cofffee machines? Any AI that you would classify as an agent? Any AI that I would classify as an agent? Any highly intelligent mind with broad capabilities? Any highly intelligent mind with broad capabilities that has a big effect on the world?)
OK, let me rephrase my question. There is a phrase in Pascal’s Mugging
I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?
My perspective would probably be more similar to yours (maybe still with substantial differences) if I had the following assumptions:
All agents have a utility-function (or act indistinguishably from agents that do)
All agents where #1 is the case act in a pure/straight-forward way to maximize that utility-function (not e.g. discounting infinities)
All agents where #1 is the case have utility-functions that relate to states of the universe
Cases involving infinite positive/negative expected utility would always/typically speak in favor of one behavior/action. (As opposed to there being different possibilities that imply infinite negative/positive expected utility, and—well, not quite “cancel each other out”, but make it so that traditional models of utility-maximization sort of break down).
I think that I myself am an example of an agent. I am relatively utilitarian compared to most humans. Far-fetched possibilities with infinite negative/positive utility don’t dominate my behavior. This is not due to me not understanding the logic behind Pascal’s Muggings (I find the logic of it simple and straight-forward).
Generally I think you are overestimating the appropriateness/correctness/merit of using a “simple”/abstract model of agents/utility-maximizers, and presuming that any/most “agents” (as we more broadly conceive of that term) would work in accordance with that model.
I see that Google defines an agent as “a person or thing that takes an active role or produces a specified effect”. I think of it is cluster-like concept, so there isn’t really any definition that fully encapsulates how I’d use that term (generally speaking I’m inclined towards not just using it differently than you, but also using it less than you do here).
Btw, for one possible way to think about utility-maximizers (another cluster-like concept IMO), you could see this post. And here and here are more posts that describe “agency” in a similar way:
In this sort of view, being “agent-like” is more of gradual thing than a yes-no-thing. This aligns with my own internal model of “agentness”, but it’s not as if there is any simple/crisp definition that fully encapsulates my conception of “agentness”.
In regards to the first sentence (“I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist”):
No, I don’t agree with that.
In regards to the second sentence (“And I argue that an agent cannot be certain of that”):
I’m not sure what internal ontologies different “agents” would have. Maybe, like with us, may have some/many uncertainties that don’t correspond to clear numeric values.
In some sense, I don’t see “infinite certainty” as being appropriate in regards to (more or less) any belief. I would not call myself “infinitely certain” that moving my thumb slightly upwards right now won’t doom me to an eternity in hell, or that doing so won’t save me from an eternity in hell. But I’m confident enough that I don’t think it’s worth it for me to spend time/energy worrying about those particular “possibilities”.
I’d argue that the only reason you do not comply with Pascal’s mugging is because you don’t have unavoidable urge to be rational, which is not going to be the case with AGI.
Thanks for your input, it will take some time for me to process it.
I’d agree that among superhuman AGIs that we are likely to make, most would probably be prone towards rationality/consistency/”optimization” in ways I’m not.
I think there are self-consistent/”optimizing” ways to think/act that wouldn’t make minds prone to Pascal’s muggings.
For example, I don’t think there is anything logically inconsistent about e.g. trying to act so as to maximize the median reward, as opposed to the expected value of rewards (I give “median reward” as a simple example—that particular example doesn’t seem likely to me to occur in practice).
🙂
One more thought. I think it is wrong to consider Pascal’s mugging a vulnerability. Dealing with unknown probabilities has its utility:
Investments with high risk and high ROI
Experiments
Safety (eliminate threats before they happen)
Same traits that make us intelligent (ability to logically reason), make us power seekers. And this is going to be the same with AGI, just much more effective.
Well, I do think the two are connected/correlated. And arguments relating to instrumental convergence are a big part of why I take AI risk seriously. But I don’t think strong abilities in logical reasoning necessitates power-seeking “on its own”.
For the record, I don’t think I used the word “vulnerability”, but maybe I phrased myself in a way that implied me thinking of things that way. And maybe I also partly think that way.
I’m not sure what I think regarding beliefs about small probabilities. One complication is that I also don’t have certainty in my own probability-guesstimates.
I’d agree that for smart humans it’s advisable to often/mostly think in terms of expected value, and to also take low-probability events seriously. But there are exceptions to this from my perspective.
In practice, I’m not much moved by the original Pascal’s Vager (and I’d find it hard to compare the probability of the Christian fantasy to other fantasies I can invent spontaneously in my head).
Sorry, but it seems to me that you are stuck with AGI analogy to humans without a reason. Many times human behavior does not correlate with AGI: humans do mass suicides, humans have phobias, humans take great risks for fun, etc. In other words—humans do not seek to be as rational as possible.
I agree that being skeptical towards Pascal’s Wager is reasonable, because there are many evidence that God is fictional. But this is not the case with “an outcome with infinite utility may exist”, there is just logic here, no hidden agenda, this is as fundamental as “I think therefore I am”. Nothing is more rational than complying with this. Don’t you think?