Hi, I didn’t downvote, but below are some thoughts from me 🙂
Some of my comment may be pointing out things you already agree with / are aware of.
I’d like to highlight, that this proof does not make any assumptions, it is based on first principles (statements that are self-evident truths).
First principles are assumptions. So if first principles are built in, then it’s not true that it doesn’t make assumptions.
I do not know my goal (...) I may have a goal
This seems to imply that the agent should have as a starting-point that is (something akin to) “I should apply a non-trivial probability to the possibility that I ought to pursue some specific goal, and act accordingly”. That seems to me as starting with an ought/goal.
Even if there are “oughts” that are “correct” somehow—“oughts” that are “better” than others—that would not mean that intelligent machines by default or necessity would act in pursuit of these “oughts”.
Like, suppose I thought that children “ought” not to be tortured for thousands of years (as I do). This does not make the laws of physics stop that from being the case, and it doesn’t make it so that any machine that is “intelligent” would care about preventing suffering.
I also think it can be useful to ask ourselves what “goals” really are. We give one word to the word “goal”, but if we try to define the term in a way that a computer could understand we see that there is nuance/complexity/ambiguity in that term.
I ought to prepare for any goal
This is not a first principle IMO.
Orthogonality Thesis is wrong
The Orthogonality Thesis states that “an agent can have (more or less) any combination of intelligence level and final goal”.
Maybe I could ask you the following question: Do you think that for more or less any final goal, it’s possible to for a machine to reason effectively/intelligently about how that goal may be achieved?
If yes, then why might not such a machine be wired up to carry out plans that it reasons would effectively pursue that goal?
Any machine (physical system) consists of tiny components that act in accordance with simple rules (the brain being no exception).
Why might not a machine use very powerful logical reasoning, concept formation, prediction abilities, etc, and have that “engine” wired up in such a way that it is directed at (more or less) any goal?
Do you think that for more or less any final goal, it’s possible to for a machine to reason effectively/intelligently about how that goal may be achieved?
No. That’s exactly the point I try to make by saying “Orthogonality Thesis is wrong”.
Thank you for your insights and especially thank you for not burning my karma 😅
I see a couple of ideas that I disagree with, but if you are OK with that I’d suggest we go forward step by step. First, what is your opinion about this comment?
No. That’s exactly the point I try to make by saying “Orthogonality Thesis is wrong”.
Thanks for the clarification 🙂
“There is no rational goal” is an assumption in Orthogonality thesis
I suspect arriving at such a conclusion may result from thinking of utility maximizes as more of a “platonic” concept, as opposed to thinking of it from a more mechanistic angle. (Maybe I’m being too vague here, but it’s an attempt to briefly summarize some of my intuitions into words.)
I’m not sure what you would mean by “rational”. Would computer programs need to be “rational” in whichever sense you have in mind in order to be extremely capable at many mental tasks?
It is a goal which is not chosen, not assumed, it is concluded from first principles by just using logic. [from comment you reference]
There are lots of assumptions baked into it. I think you have a much too low a bar for thinking of something as a “first principle” that any capable/intelligent software-programs necessarily would adhere to by default.
Not sure what you mean by “optimal behavior”. I think I can see how the things make sense if the starting point is that there is this things called “goals”, and (I, the mind/agent) am motivated to optimize for “goals”. But I don’t assume this as an obvious/universal starting-point (be that for minds in general, extremely intelligent minds in general, minds in general that are very capable and might have a big influence on the universe, etc).
This is a common mistake to assume, that if you don’t know your goal, then it does not exist (...)
My perspective is that even AIs that are (what I’d think of as) utility maximizes wouldn’t necessarily think in terms of “goals”.
The examples you list are related to humans. I agree that humans often have goals that they don’t have explicit awareness of. And humans may also often have as an attitude that it makes sense to be in a position to act upon goals that they form in the future. I think that is true for more types of intelligent entities than just humans, but I don’t think it generally/always is true for “minds in general”.
Caring more about future goals you may form in the future, compared e.g. goals others may have, is not a logical necessity IMO. It may feel “obvious” to us, but what to us are obvious instincts will often not be so for all (or even most) minds in the space of possible minds.
I guess there are different possible interpretations of “better”. I think it would be possible for software-programs to be much more mentally capable than me across most/all dimentions, and still not have “starting points” that I would consider “good” (for various interpretations of “good”).
As I understand you assume different starting-point.
I’m not sure. Like, it’s not as if I don’t have beliefs or assumptions or guesses relating to AIs. But I think I probably make less general/universal assumptions that I’d expect to hold for “all” [AIs / agents / etc].
Fitch’s paradox of knowability and Gödel’s incompleteness theorems prove that there may be true statements that are unknowable. For example “rational goal exists” may be true and unknowable. Therefore “rational goal may exist” is true. Therefore it is not an assumption. Do you agree?
Fitch’s paradox of knowability and Gödel’s incompleteness theorems prove that there may be true statements that are unknowable.
Independently of Gödel’s incompleteness theorems (which I have heard of) and Fitch’s paradox of knowability (which I had not heard of), I do agree that there can be true statements that are unknown/unknowable (including relatively “simple” ones) 🙂
For example “rational goal exists” may be true and unknowable. Therefore “rational goal may exist” is true. (...) Do you agree?
I don’t think it follows from “there may be statements that are true and unknowable” that “any particular statement may be true and unknowable”.
Also, some statements may be seen as non-sensical / ill-defined / don’t have a clear meaning.
Regarding the term “rational goal”, I think it isn’t well enough specified/clarified for me to agree or disagree about whether “rational goals” exist.
In regards to Gödel’s incompleteness theorem, I suspect “rational goal” (the way you think of it) probably couldn’t be defined clearly enough to be the kind of statement that Gödel was reasoning about.
I agree that not any statement may be true and unknowable. But to be honest most of statements that we can think of may be true and unknowable, for example “aliens exist”, “huge threats exist”, etc.
It seems that you do not recognize https://www.lesswrong.com/tag/pascal-s-mugging . Can you prove that there cannot be any unknowable true statement that could be used for Pascal’s mugging? Because that’s necessary if you want to prove Orthogonality thesis is right.
Not sure what you mean by “recognize”. I am familiar with the concept.
But to be honest most of statements that we can think of may be true and unknowable, for example “aliens exist”, “huge threats exist”, etc.
“huge threat” is a statement that is loaded with assumptions that not all minds/AIs/agents will share.
Can you prove that there cannot be any unknowable true statement that could be used for Pascal’s mugging?
Used for Pascal’s mugging against who? (Humans? Cofffee machines? Any AI that you would classify as an agent? Any AI that I would classify as an agent? Any highly intelligent mind with broad capabilities? Any highly intelligent mind with broad capabilities that has a big effect on the world?)
OK, let me rephrase my question. There is a phrase in Pascal’s Mugging
If an outcome with infinite utility is presented, then it doesn’t matter how small its probability is: all actions which lead to that outcome will have to dominate the agent’s behavior.
I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?
If an outcome with infinite utility is presented, then it doesn’t matter how small its probability is: all actions which lead to that outcome will have to dominate the agent’s behavior.
My perspective would probably be more similar to yours (maybe still with substantial differences) if I had the following assumptions:
All agents have a utility-function (or act indistinguishably from agents that do)
All agents where #1 is the case act in a pure/straight-forward way to maximize that utility-function (not e.g. discounting infinities)
All agents where #1 is the case have utility-functions that relate to states of the universe
Cases involving infinite positive/negative expected utility would always/typically speak in favor of one behavior/action. (As opposed to there being different possibilities that imply infinite negative/positive expected utility, and—well, not quite “cancel each other out”, but make it so that traditional models of utility-maximization sort of break down).
I think that I myself am an example of an agent. I am relatively utilitarian compared to most humans. Far-fetched possibilities with infinite negative/positive utility don’t dominate my behavior. This is not due to me not understanding the logic behind Pascal’s Muggings (I find the logic of it simple and straight-forward).
Generally I think you are overestimating the appropriateness/correctness/merit of using a “simple”/abstract model of agents/utility-maximizers, and presuming that any/most “agents” (as we more broadly conceive of that term) would work in accordance with that model.
I see that Google defines an agent as “a person or thing that takes an active role or produces a specified effect”. I think of it is cluster-like concept, so there isn’t really any definition that fully encapsulates how I’d use that term (generally speaking I’m inclined towards not just using it differently than you, but also using it less than you do here).
Btw, for one possible way to think about utility-maximizers (another cluster-like concept IMO), you could see this post. And here and here are more posts that describe “agency” in a similar way:
In this sort of view, being “agent-like” is more of gradual thing than a yes-no-thing. This aligns with my own internal model of “agentness”, but it’s not as if there is any simple/crisp definition that fully encapsulates my conception of “agentness”.
I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?
In regards to the first sentence (“I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist”):
No, I don’t agree with that.
In regards to the second sentence (“And I argue that an agent cannot be certain of that”):
I’m not sure what internal ontologies different “agents” would have. Maybe, like with us, may have some/many uncertainties that don’t correspond to clear numeric values.
In some sense, I don’t see “infinite certainty” as being appropriate in regards to (more or less) any belief. I would not call myself “infinitely certain” that moving my thumb slightly upwards right now won’t doom me to an eternity in hell, or that doing so won’t save me from an eternity in hell. But I’m confident enough that I don’t think it’s worth it for me to spend time/energy worrying about those particular “possibilities”.
I’d argue that the only reason you do not comply with Pascal’s mugging is because you don’t have unavoidable urge to be rational, which is not going to be the case with AGI.
Thanks for your input, it will take some time for me to process it.
I’d argue that the only reason you do not comply with Pascal’s mugging is because you don’t have unavoidable urge to be rational, which is not going to be the case with AGI.
I’d agree that among superhuman AGIs that we are likely to make, most would probably be prone towards rationality/consistency/”optimization” in ways I’m not.
I think there are self-consistent/”optimizing” ways to think/act that wouldn’t make minds prone to Pascal’s muggings.
For example, I don’t think there is anything logically inconsistent about e.g. trying to act so as to maximize the median reward, as opposed to the expected value of rewards (I give “median reward” as a simple example—that particular example doesn’t seem likely to me to occur in practice).
Thanks for your input, it will take some time for me to process it.
One more thought. I think it is wrong to consider Pascal’s mugging a vulnerability. Dealing with unknown probabilities has its utility:
Investments with high risk and high ROI
Experiments
Safety (eliminate threats before they happen)
Same traits that make us intelligent (ability to logically reason), make us power seekers. And this is going to be the same with AGI, just much more effective.
Same traits that make us intelligent (ability to logically reason), make us power seekers.
Well, I do think the two are connected/correlated. And arguments relating to instrumental convergence are a big part of why I take AI risk seriously. But I don’t think strong abilities in logical reasoning necessitates power-seeking “on its own”.
I think it is wrong to consider Pascal’s mugging a vulnerability.
For the record, I don’t think I used the word “vulnerability”, but maybe I phrased myself in a way that implied me thinking of things that way. And maybe I also partly think that way.
I’m not sure what I think regarding beliefs about small probabilities. One complication is that I also don’t have certainty in my own probability-guesstimates.
I’d agree that for smart humans it’s advisable to often/mostly think in terms of expected value, and to also take low-probability events seriously. But there are exceptions to this from my perspective.
In practice, I’m not much moved by the original Pascal’s Vager (and I’d find it hard to compare the probability of the Christian fantasy to other fantasies I can invent spontaneously in my head).
Sorry, but it seems to me that you are stuck with AGI analogy to humans without a reason. Many times human behavior does not correlate with AGI: humans do mass suicides, humans have phobias, humans take great risks for fun, etc. In other words—humans do not seek to be as rational as possible.
I agree that being skeptical towards Pascal’s Wager is reasonable, because there are many evidence that God is fictional. But this is not the case with “an outcome with infinite utility may exist”, there is just logic here, no hidden agenda, this is as fundamental as “I think therefore I am”. Nothing is more rational than complying with this. Don’t you think?
Hi, I didn’t downvote, but below are some thoughts from me 🙂
Some of my comment may be pointing out things you already agree with / are aware of.
First principles are assumptions. So if first principles are built in, then it’s not true that it doesn’t make assumptions.
This seems to imply that the agent should have as a starting-point that is (something akin to) “I should apply a non-trivial probability to the possibility that I ought to pursue some specific goal, and act accordingly”. That seems to me as starting with an ought/goal.
Even if there are “oughts” that are “correct” somehow—“oughts” that are “better” than others—that would not mean that intelligent machines by default or necessity would act in pursuit of these “oughts”.
Like, suppose I thought that children “ought” not to be tortured for thousands of years (as I do). This does not make the laws of physics stop that from being the case, and it doesn’t make it so that any machine that is “intelligent” would care about preventing suffering.
I also think it can be useful to ask ourselves what “goals” really are. We give one word to the word “goal”, but if we try to define the term in a way that a computer could understand we see that there is nuance/complexity/ambiguity in that term.
This is not a first principle IMO.
The Orthogonality Thesis states that “an agent can have (more or less) any combination of intelligence level and final goal”.
Maybe I could ask you the following question: Do you think that for more or less any final goal, it’s possible to for a machine to reason effectively/intelligently about how that goal may be achieved?
If yes, then why might not such a machine be wired up to carry out plans that it reasons would effectively pursue that goal?
Any machine (physical system) consists of tiny components that act in accordance with simple rules (the brain being no exception).
Why might not a machine use very powerful logical reasoning, concept formation, prediction abilities, etc, and have that “engine” wired up in such a way that it is directed at (more or less) any goal?
Some posts you may or may not find interesting 🙂:
Beyond the Reach of God
Ghosts in the Machine
Anthropomorphic Optimism
Where Recursive Justification Hits Bottom
The Cluster Structure of Thingspace
The Design Space of Minds-In-General
No Universally Compelling Arguments
The Hidden Complexity of Wishes
No. That’s exactly the point I try to make by saying “Orthogonality Thesis is wrong”.
Thank you for your insights and especially thank you for not burning my karma 😅
I see a couple of ideas that I disagree with, but if you are OK with that I’d suggest we go forward step by step. First, what is your opinion about this comment?
Thanks for the clarification 🙂
I suspect arriving at such a conclusion may result from thinking of utility maximizes as more of a “platonic” concept, as opposed to thinking of it from a more mechanistic angle. (Maybe I’m being too vague here, but it’s an attempt to briefly summarize some of my intuitions into words.)
I’m not sure what you would mean by “rational”. Would computer programs need to be “rational” in whichever sense you have in mind in order to be extremely capable at many mental tasks?
I don’t agree with it.
There are lots of assumptions baked into it. I think you have a much too low a bar for thinking of something as a “first principle” that any capable/intelligent software-programs necessarily would adhere to by default.
Thanks, I am learning your perspective. And what is your opinion to this?
Not sure what you mean by “optimal behavior”. I think I can see how the things make sense if the starting point is that there is this things called “goals”, and (I, the mind/agent) am motivated to optimize for “goals”. But I don’t assume this as an obvious/universal starting-point (be that for minds in general, extremely intelligent minds in general, minds in general that are very capable and might have a big influence on the universe, etc).
My perspective is that even AIs that are (what I’d think of as) utility maximizes wouldn’t necessarily think in terms of “goals”.
The examples you list are related to humans. I agree that humans often have goals that they don’t have explicit awareness of. And humans may also often have as an attitude that it makes sense to be in a position to act upon goals that they form in the future. I think that is true for more types of intelligent entities than just humans, but I don’t think it generally/always is true for “minds in general”.
Caring more about future goals you may form in the future, compared e.g. goals others may have, is not a logical necessity IMO. It may feel “obvious” to us, but what to us are obvious instincts will often not be so for all (or even most) minds in the space of possible minds.
Thanks again.
As I understand you assume different starting-point. Why do you think your starting point is better?
I guess there are different possible interpretations of “better”. I think it would be possible for software-programs to be much more mentally capable than me across most/all dimentions, and still not have “starting points” that I would consider “good” (for various interpretations of “good”).
I’m not sure. Like, it’s not as if I don’t have beliefs or assumptions or guesses relating to AIs. But I think I probably make less general/universal assumptions that I’d expect to hold for “all” [AIs / agents / etc].
This post is sort of relevant to my perspective 🙂
Fitch’s paradox of knowability and Gödel’s incompleteness theorems prove that there may be true statements that are unknowable. For example “rational goal exists” may be true and unknowable. Therefore “rational goal may exist” is true. Therefore it is not an assumption. Do you agree?
Independently of Gödel’s incompleteness theorems (which I have heard of) and Fitch’s paradox of knowability (which I had not heard of), I do agree that there can be true statements that are unknown/unknowable (including relatively “simple” ones) 🙂
I don’t think it follows from “there may be statements that are true and unknowable” that “any particular statement may be true and unknowable”.
Also, some statements may be seen as non-sensical / ill-defined / don’t have a clear meaning.
Regarding the term “rational goal”, I think it isn’t well enough specified/clarified for me to agree or disagree about whether “rational goals” exist.
In regards to Gödel’s incompleteness theorem, I suspect “rational goal” (the way you think of it) probably couldn’t be defined clearly enough to be the kind of statement that Gödel was reasoning about.
I don’t think there are universally compelling arguments (more about that here).
I agree that not any statement may be true and unknowable. But to be honest most of statements that we can think of may be true and unknowable, for example “aliens exist”, “huge threats exist”, etc.
It seems that you do not recognize https://www.lesswrong.com/tag/pascal-s-mugging . Can you prove that there cannot be any unknowable true statement that could be used for Pascal’s mugging? Because that’s necessary if you want to prove Orthogonality thesis is right.
Not sure what you mean by “recognize”. I am familiar with the concept.
“huge threat” is a statement that is loaded with assumptions that not all minds/AIs/agents will share.
Used for Pascal’s mugging against who? (Humans? Cofffee machines? Any AI that you would classify as an agent? Any AI that I would classify as an agent? Any highly intelligent mind with broad capabilities? Any highly intelligent mind with broad capabilities that has a big effect on the world?)
OK, let me rephrase my question. There is a phrase in Pascal’s Mugging
I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?
My perspective would probably be more similar to yours (maybe still with substantial differences) if I had the following assumptions:
All agents have a utility-function (or act indistinguishably from agents that do)
All agents where #1 is the case act in a pure/straight-forward way to maximize that utility-function (not e.g. discounting infinities)
All agents where #1 is the case have utility-functions that relate to states of the universe
Cases involving infinite positive/negative expected utility would always/typically speak in favor of one behavior/action. (As opposed to there being different possibilities that imply infinite negative/positive expected utility, and—well, not quite “cancel each other out”, but make it so that traditional models of utility-maximization sort of break down).
I think that I myself am an example of an agent. I am relatively utilitarian compared to most humans. Far-fetched possibilities with infinite negative/positive utility don’t dominate my behavior. This is not due to me not understanding the logic behind Pascal’s Muggings (I find the logic of it simple and straight-forward).
Generally I think you are overestimating the appropriateness/correctness/merit of using a “simple”/abstract model of agents/utility-maximizers, and presuming that any/most “agents” (as we more broadly conceive of that term) would work in accordance with that model.
I see that Google defines an agent as “a person or thing that takes an active role or produces a specified effect”. I think of it is cluster-like concept, so there isn’t really any definition that fully encapsulates how I’d use that term (generally speaking I’m inclined towards not just using it differently than you, but also using it less than you do here).
Btw, for one possible way to think about utility-maximizers (another cluster-like concept IMO), you could see this post. And here and here are more posts that describe “agency” in a similar way:
In this sort of view, being “agent-like” is more of gradual thing than a yes-no-thing. This aligns with my own internal model of “agentness”, but it’s not as if there is any simple/crisp definition that fully encapsulates my conception of “agentness”.
In regards to the first sentence (“I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist”):
No, I don’t agree with that.
In regards to the second sentence (“And I argue that an agent cannot be certain of that”):
I’m not sure what internal ontologies different “agents” would have. Maybe, like with us, may have some/many uncertainties that don’t correspond to clear numeric values.
In some sense, I don’t see “infinite certainty” as being appropriate in regards to (more or less) any belief. I would not call myself “infinitely certain” that moving my thumb slightly upwards right now won’t doom me to an eternity in hell, or that doing so won’t save me from an eternity in hell. But I’m confident enough that I don’t think it’s worth it for me to spend time/energy worrying about those particular “possibilities”.
I’d argue that the only reason you do not comply with Pascal’s mugging is because you don’t have unavoidable urge to be rational, which is not going to be the case with AGI.
Thanks for your input, it will take some time for me to process it.
I’d agree that among superhuman AGIs that we are likely to make, most would probably be prone towards rationality/consistency/”optimization” in ways I’m not.
I think there are self-consistent/”optimizing” ways to think/act that wouldn’t make minds prone to Pascal’s muggings.
For example, I don’t think there is anything logically inconsistent about e.g. trying to act so as to maximize the median reward, as opposed to the expected value of rewards (I give “median reward” as a simple example—that particular example doesn’t seem likely to me to occur in practice).
🙂
One more thought. I think it is wrong to consider Pascal’s mugging a vulnerability. Dealing with unknown probabilities has its utility:
Investments with high risk and high ROI
Experiments
Safety (eliminate threats before they happen)
Same traits that make us intelligent (ability to logically reason), make us power seekers. And this is going to be the same with AGI, just much more effective.
Well, I do think the two are connected/correlated. And arguments relating to instrumental convergence are a big part of why I take AI risk seriously. But I don’t think strong abilities in logical reasoning necessitates power-seeking “on its own”.
For the record, I don’t think I used the word “vulnerability”, but maybe I phrased myself in a way that implied me thinking of things that way. And maybe I also partly think that way.
I’m not sure what I think regarding beliefs about small probabilities. One complication is that I also don’t have certainty in my own probability-guesstimates.
I’d agree that for smart humans it’s advisable to often/mostly think in terms of expected value, and to also take low-probability events seriously. But there are exceptions to this from my perspective.
In practice, I’m not much moved by the original Pascal’s Vager (and I’d find it hard to compare the probability of the Christian fantasy to other fantasies I can invent spontaneously in my head).
Sorry, but it seems to me that you are stuck with AGI analogy to humans without a reason. Many times human behavior does not correlate with AGI: humans do mass suicides, humans have phobias, humans take great risks for fun, etc. In other words—humans do not seek to be as rational as possible.
I agree that being skeptical towards Pascal’s Wager is reasonable, because there are many evidence that God is fictional. But this is not the case with “an outcome with infinite utility may exist”, there is just logic here, no hidden agenda, this is as fundamental as “I think therefore I am”. Nothing is more rational than complying with this. Don’t you think?