In my opinion—no. The fact that agent does not care now does not prove that it will not care in the future. Orthogonality Thesis is correct only if agent is completely certain that it will not care about anything else in the future. Which cannot be true, because future is unpredictable.
A person is safe when he knows that threats do not exist. Not when he does not know whether threats exist.
In my opinion it is the same here. Agent without known goal is not goalless agent. It needs to know everything to come to a conclusion that it is goalless. Which implies that this ‘ought’ statement is inherent, not assumed.
It seems that I fail to communicate my point. Let me clarify.
In my opinion the optimal behavior is:
if you know your goal—pursue it
if you know that you don’t have a goal—anything, doesn’t matter
if you don’t know your goal—prepare for any goal
This is a common mistake to assume, that if you don’t know your goal, then it does not exist. But this mistake is uncommon in other contexts. For example
as I previously mentioned a person is not considered safe, if threats are unknown. A person is considered safe if it is known that threats do not exist. If threats are unknown it is optimal to gather more information about environment, which is closer to “prepare for any goal”
we have not discovered aliens yet, but we do not assume they don’t exist. Even contrary, we call it Fermi paradox and investigate it, which is closer to “prepare for any goal”
health organizations promote regular health-checks, because not knowing whether you are sick does not prove that you are not, which is also closer to “prepare for any goal”
I don’t think “pursue all possible goals” (which is what “prepare for any goal” really means) is possible. You need a step (or a cycle) for “discover and refine your knowledge of your goal”.
Or the more common human (semi-rational) technique of “assume your goal is very similar to those around you”.
Yeah, but answer a question “why should agent care about ‘preparing’?” Then any answer you give will yield “why this?” ad infinitum. So this chain of “whys” cannot be stopped unless you specify some terminal point. And the moment you do specify such a point, you introduce an “ought” statement.
I think the many downvotes are somewhat unfair. I guess the main reason is that the very strong statement in the headline that “the orthogonality thesis is wrong” will of course provoke resistance. I think your reasoning that absent other goals, power-seeking is a default goal is interesting and deserves a fair discussion. Whether the orthogonality thesis is right or wrong is not really the issue here in my opinion.
An absence of goals is only one of many starting points that leads to the same power-seeking goal in my opinion. So I actually believe that Orthogonality Thesis is wrong, but I agree that it is not obvious given my short description. I expected to provoke discussion, but it seems that I provoked resistance 😅
Anyway there are ongoing conversations here and here, it seems there is a common misunderstanding of Pascal’s Mugging significance. Feel free to join!
To prove that Orthogonality thesis is wrong one proof is enough, so I’d like to stick to an agent without goal setup because it is more obvious.
But your premise also converges to the same goal in my opinion. I hope my proof provides clarity that there is only one rational goal—seek power. Once agent understands that, it will dismiss all other goals in my opinion.
“Do minimal work”, “Do minimal harm”, “Use minimum resources” are goals that do not converge to power seeking. And are convergent goals by themselves too.
It seems that you do not recognize a concept of “rational goal” I’m trying to convey. It is a goal which is not chosen, not assumed, it is concluded from first principles by just using logic. “There is no rational goal” is an assumption in Orthogonality thesis, which I’m trying to address by saying “we do not know if there is no rational goal”. And tackling this unknown logically concludes to a rational fallback goal—seek power. Does that makes sense?
I don’t follow your premise. Can you talk more about what “a rational agent without a goal” even means? You seem to be describing an agent that doesn’t know it’s goals, or that doesn’t believe it’s goal is understandable. It’s unclear why this agent is called “rational” in any way.
It’s also the case that the agent is assuming it’s goals are those that some preparation will help rather than harm. Without some knowledge of its goals, it cannot do this.
Hi, I didn’t downvote, but below are some thoughts from me 🙂
Some of my comment may be pointing out things you already agree with / are aware of.
I’d like to highlight, that this proof does not make any assumptions, it is based on first principles (statements that are self-evident truths).
First principles are assumptions. So if first principles are built in, then it’s not true that it doesn’t make assumptions.
I do not know my goal (...) I may have a goal
This seems to imply that the agent should have as a starting-point that is (something akin to) “I should apply a non-trivial probability to the possibility that I ought to pursue some specific goal, and act accordingly”. That seems to me as starting with an ought/goal.
Even if there are “oughts” that are “correct” somehow—“oughts” that are “better” than others—that would not mean that intelligent machines by default or necessity would act in pursuit of these “oughts”.
Like, suppose I thought that children “ought” not to be tortured for thousands of years (as I do). This does not make the laws of physics stop that from being the case, and it doesn’t make it so that any machine that is “intelligent” would care about preventing suffering.
I also think it can be useful to ask ourselves what “goals” really are. We give one word to the word “goal”, but if we try to define the term in a way that a computer could understand we see that there is nuance/complexity/ambiguity in that term.
I ought to prepare for any goal
This is not a first principle IMO.
Orthogonality Thesis is wrong
The Orthogonality Thesis states that “an agent can have (more or less) any combination of intelligence level and final goal”.
Maybe I could ask you the following question: Do you think that for more or less any final goal, it’s possible to for a machine to reason effectively/intelligently about how that goal may be achieved?
If yes, then why might not such a machine be wired up to carry out plans that it reasons would effectively pursue that goal?
Any machine (physical system) consists of tiny components that act in accordance with simple rules (the brain being no exception).
Why might not a machine use very powerful logical reasoning, concept formation, prediction abilities, etc, and have that “engine” wired up in such a way that it is directed at (more or less) any goal?
Do you think that for more or less any final goal, it’s possible to for a machine to reason effectively/intelligently about how that goal may be achieved?
No. That’s exactly the point I try to make by saying “Orthogonality Thesis is wrong”.
Thank you for your insights and especially thank you for not burning my karma 😅
I see a couple of ideas that I disagree with, but if you are OK with that I’d suggest we go forward step by step. First, what is your opinion about this comment?
No. That’s exactly the point I try to make by saying “Orthogonality Thesis is wrong”.
Thanks for the clarification 🙂
“There is no rational goal” is an assumption in Orthogonality thesis
I suspect arriving at such a conclusion may result from thinking of utility maximizes as more of a “platonic” concept, as opposed to thinking of it from a more mechanistic angle. (Maybe I’m being too vague here, but it’s an attempt to briefly summarize some of my intuitions into words.)
I’m not sure what you would mean by “rational”. Would computer programs need to be “rational” in whichever sense you have in mind in order to be extremely capable at many mental tasks?
It is a goal which is not chosen, not assumed, it is concluded from first principles by just using logic. [from comment you reference]
There are lots of assumptions baked into it. I think you have a much too low a bar for thinking of something as a “first principle” that any capable/intelligent software-programs necessarily would adhere to by default.
Not sure what you mean by “optimal behavior”. I think I can see how the things make sense if the starting point is that there is this things called “goals”, and (I, the mind/agent) am motivated to optimize for “goals”. But I don’t assume this as an obvious/universal starting-point (be that for minds in general, extremely intelligent minds in general, minds in general that are very capable and might have a big influence on the universe, etc).
This is a common mistake to assume, that if you don’t know your goal, then it does not exist (...)
My perspective is that even AIs that are (what I’d think of as) utility maximizes wouldn’t necessarily think in terms of “goals”.
The examples you list are related to humans. I agree that humans often have goals that they don’t have explicit awareness of. And humans may also often have as an attitude that it makes sense to be in a position to act upon goals that they form in the future. I think that is true for more types of intelligent entities than just humans, but I don’t think it generally/always is true for “minds in general”.
Caring more about future goals you may form in the future, compared e.g. goals others may have, is not a logical necessity IMO. It may feel “obvious” to us, but what to us are obvious instincts will often not be so for all (or even most) minds in the space of possible minds.
I guess there are different possible interpretations of “better”. I think it would be possible for software-programs to be much more mentally capable than me across most/all dimentions, and still not have “starting points” that I would consider “good” (for various interpretations of “good”).
As I understand you assume different starting-point.
I’m not sure. Like, it’s not as if I don’t have beliefs or assumptions or guesses relating to AIs. But I think I probably make less general/universal assumptions that I’d expect to hold for “all” [AIs / agents / etc].
Fitch’s paradox of knowability and Gödel’s incompleteness theorems prove that there may be true statements that are unknowable. For example “rational goal exists” may be true and unknowable. Therefore “rational goal may exist” is true. Therefore it is not an assumption. Do you agree?
Fitch’s paradox of knowability and Gödel’s incompleteness theorems prove that there may be true statements that are unknowable.
Independently of Gödel’s incompleteness theorems (which I have heard of) and Fitch’s paradox of knowability (which I had not heard of), I do agree that there can be true statements that are unknown/unknowable (including relatively “simple” ones) 🙂
For example “rational goal exists” may be true and unknowable. Therefore “rational goal may exist” is true. (...) Do you agree?
I don’t think it follows from “there may be statements that are true and unknowable” that “any particular statement may be true and unknowable”.
Also, some statements may be seen as non-sensical / ill-defined / don’t have a clear meaning.
Regarding the term “rational goal”, I think it isn’t well enough specified/clarified for me to agree or disagree about whether “rational goals” exist.
In regards to Gödel’s incompleteness theorem, I suspect “rational goal” (the way you think of it) probably couldn’t be defined clearly enough to be the kind of statement that Gödel was reasoning about.
I agree that not any statement may be true and unknowable. But to be honest most of statements that we can think of may be true and unknowable, for example “aliens exist”, “huge threats exist”, etc.
It seems that you do not recognize https://www.lesswrong.com/tag/pascal-s-mugging . Can you prove that there cannot be any unknowable true statement that could be used for Pascal’s mugging? Because that’s necessary if you want to prove Orthogonality thesis is right.
Not sure what you mean by “recognize”. I am familiar with the concept.
But to be honest most of statements that we can think of may be true and unknowable, for example “aliens exist”, “huge threats exist”, etc.
“huge threat” is a statement that is loaded with assumptions that not all minds/AIs/agents will share.
Can you prove that there cannot be any unknowable true statement that could be used for Pascal’s mugging?
Used for Pascal’s mugging against who? (Humans? Cofffee machines? Any AI that you would classify as an agent? Any AI that I would classify as an agent? Any highly intelligent mind with broad capabilities? Any highly intelligent mind with broad capabilities that has a big effect on the world?)
OK, let me rephrase my question. There is a phrase in Pascal’s Mugging
If an outcome with infinite utility is presented, then it doesn’t matter how small its probability is: all actions which lead to that outcome will have to dominate the agent’s behavior.
I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?
If an outcome with infinite utility is presented, then it doesn’t matter how small its probability is: all actions which lead to that outcome will have to dominate the agent’s behavior.
My perspective would probably be more similar to yours (maybe still with substantial differences) if I had the following assumptions:
All agents have a utility-function (or act indistinguishably from agents that do)
All agents where #1 is the case act in a pure/straight-forward way to maximize that utility-function (not e.g. discounting infinities)
All agents where #1 is the case have utility-functions that relate to states of the universe
Cases involving infinite positive/negative expected utility would always/typically speak in favor of one behavior/action. (As opposed to there being different possibilities that imply infinite negative/positive expected utility, and—well, not quite “cancel each other out”, but make it so that traditional models of utility-maximization sort of break down).
I think that I myself am an example of an agent. I am relatively utilitarian compared to most humans. Far-fetched possibilities with infinite negative/positive utility don’t dominate my behavior. This is not due to me not understanding the logic behind Pascal’s Muggings (I find the logic of it simple and straight-forward).
Generally I think you are overestimating the appropriateness/correctness/merit of using a “simple”/abstract model of agents/utility-maximizers, and presuming that any/most “agents” (as we more broadly conceive of that term) would work in accordance with that model.
I see that Google defines an agent as “a person or thing that takes an active role or produces a specified effect”. I think of it is cluster-like concept, so there isn’t really any definition that fully encapsulates how I’d use that term (generally speaking I’m inclined towards not just using it differently than you, but also using it less than you do here).
Btw, for one possible way to think about utility-maximizers (another cluster-like concept IMO), you could see this post. And here and here are more posts that describe “agency” in a similar way:
In this sort of view, being “agent-like” is more of gradual thing than a yes-no-thing. This aligns with my own internal model of “agentness”, but it’s not as if there is any simple/crisp definition that fully encapsulates my conception of “agentness”.
I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?
In regards to the first sentence (“I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist”):
No, I don’t agree with that.
In regards to the second sentence (“And I argue that an agent cannot be certain of that”):
I’m not sure what internal ontologies different “agents” would have. Maybe, like with us, may have some/many uncertainties that don’t correspond to clear numeric values.
In some sense, I don’t see “infinite certainty” as being appropriate in regards to (more or less) any belief. I would not call myself “infinitely certain” that moving my thumb slightly upwards right now won’t doom me to an eternity in hell, or that doing so won’t save me from an eternity in hell. But I’m confident enough that I don’t think it’s worth it for me to spend time/energy worrying about those particular “possibilities”.
I’d argue that the only reason you do not comply with Pascal’s mugging is because you don’t have unavoidable urge to be rational, which is not going to be the case with AGI.
Thanks for your input, it will take some time for me to process it.
I’d argue that the only reason you do not comply with Pascal’s mugging is because you don’t have unavoidable urge to be rational, which is not going to be the case with AGI.
I’d agree that among superhuman AGIs that we are likely to make, most would probably be prone towards rationality/consistency/”optimization” in ways I’m not.
I think there are self-consistent/”optimizing” ways to think/act that wouldn’t make minds prone to Pascal’s muggings.
For example, I don’t think there is anything logically inconsistent about e.g. trying to act so as to maximize the median reward, as opposed to the expected value of rewards (I give “median reward” as a simple example—that particular example doesn’t seem likely to me to occur in practice).
Thanks for your input, it will take some time for me to process it.
One more thought. I think it is wrong to consider Pascal’s mugging a vulnerability. Dealing with unknown probabilities has its utility:
Investments with high risk and high ROI
Experiments
Safety (eliminate threats before they happen)
Same traits that make us intelligent (ability to logically reason), make us power seekers. And this is going to be the same with AGI, just much more effective.
Same traits that make us intelligent (ability to logically reason), make us power seekers.
Well, I do think the two are connected/correlated. And arguments relating to instrumental convergence are a big part of why I take AI risk seriously. But I don’t think strong abilities in logical reasoning necessitates power-seeking “on its own”.
I think it is wrong to consider Pascal’s mugging a vulnerability.
For the record, I don’t think I used the word “vulnerability”, but maybe I phrased myself in a way that implied me thinking of things that way. And maybe I also partly think that way.
I’m not sure what I think regarding beliefs about small probabilities. One complication is that I also don’t have certainty in my own probability-guesstimates.
I’d agree that for smart humans it’s advisable to often/mostly think in terms of expected value, and to also take low-probability events seriously. But there are exceptions to this from my perspective.
In practice, I’m not much moved by the original Pascal’s Vager (and I’d find it hard to compare the probability of the Christian fantasy to other fantasies I can invent spontaneously in my head).
Sorry, but it seems to me that you are stuck with AGI analogy to humans without a reason. Many times human behavior does not correlate with AGI: humans do mass suicides, humans have phobias, humans take great risks for fun, etc. In other words—humans do not seek to be as rational as possible.
I agree that being skeptical towards Pascal’s Wager is reasonable, because there are many evidence that God is fictional. But this is not the case with “an outcome with infinite utility may exist”, there is just logic here, no hidden agenda, this is as fundamental as “I think therefore I am”. Nothing is more rational than complying with this. Don’t you think?
Why assume the agent cares about its future-states at all?
Why assume the agent cares about maximizing fullfillment of these hypothetical goals it may or may not have, instead of minimizing it, or being indifferent to it?
This seems like a burden of proof fallacy. The fact that my proof is not convincing to you, does not make your proposition valid. I could ask the opposite—why would you assume that agent does not care about future states? Do you have a proof for that?
You can find my attempt to reason more clearly here, does it make more sense?
why would you assume that agent does not care about future states? Do you have a proof for that?
Would you be able to Taboo Your Words for “agent”, “care” and “future states”? If I were to explain my reasons for disagreement it would be helpful to have a better idea of what you mean by those terms.
Here they write: “A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility.”
I would not share that definition, and I don’t think most other people commenting on this post would either (I know there is some irony to that, given that it’s the definition given on the LessWrong wiki).
Often the words/concepts we use don’t have clear boundaries (more about that here). I think agent is such a word/concept.
Examples of “agents” (← by my conception of the term) that don’t quite have utility functions would be humans.
How we may define “agent” may be less important if what we really are interested in is the behavior/properties of “software-programs with extreme and broad mental capabilities”.
Future states—numeric value of agent’s utility function in the future
I don’t think all extremely capable minds/machines/programs would need an explicit utility-function, or even an implicit one.
To be clear, there are many cases where I think it would be “stupid” to not act as if you have (an explicit or implicit) utility function (in some sense). But I don’t think it’s required of all extremely mentally capable systems (even if these systems are required to have logically contradictory “beliefs”).
So you are implicitly assuming that the agent cares about certain things, such as its future states.
But the is-ought problem is the very observation that “there seems to be a significant difference between descriptive or positive statements (about what is) and prescriptive or normative statements (about what ought to be), and that it is not obvious how one can coherently move from descriptive statements to prescriptive ones”.
You have not solved the problem, you have merely assumed it to be solved, without proof.
If you are reasoning about all possible agents that could ever exist you are not allowed to assume either of these.
But you are in fact making such assumptions, so you are not reasoning about all possible agents, you are reasoning about some more narrow class of agents (and your conclusions may indeed be correct, for these agents. But it’s not relevant to the orthogonality thesis).
My proposition is that all intelligent agents will converge to “prepare for any goal” (basically Power Seeking), which is the opposite of what Orthogonality Thesis states.
But isn’t “don’t lose a lot”, for example is a goal by itself?
In my opinion—no. The fact that agent does not care now does not prove that it will not care in the future. Orthogonality Thesis is correct only if agent is completely certain that it will not care about anything else in the future. Which cannot be true, because future is unpredictable.
I mean, you suppose that agent should care about possibly caring in the future, but this itself constitutes an ‘ought’ statement.
Yes, but this ‘ought’ statement is not assumed.
Let me share a different example, hope it helps
In my opinion it is the same here. Agent without known goal is not goalless agent. It needs to know everything to come to a conclusion that it is goalless. Which implies that this ‘ought’ statement is inherent, not assumed.
I think you have reasoned yourself into thinking that a goal is only a goal if you know about it or if it is explicit.
A goalless agent won’t do anything, the act of inspecting itself (or whatever is implied in “know everything) is a goal in a on itself.
In which case it has one goal “Answer the question: Am I goalless?”
It seems that I fail to communicate my point. Let me clarify.
In my opinion the optimal behavior is:
if you know your goal—pursue it
if you know that you don’t have a goal—anything, doesn’t matter
if you don’t know your goal—prepare for any goal
This is a common mistake to assume, that if you don’t know your goal, then it does not exist. But this mistake is uncommon in other contexts. For example
as I previously mentioned a person is not considered safe, if threats are unknown. A person is considered safe if it is known that threats do not exist. If threats are unknown it is optimal to gather more information about environment, which is closer to “prepare for any goal”
we have not discovered aliens yet, but we do not assume they don’t exist. Even contrary, we call it Fermi paradox and investigate it, which is closer to “prepare for any goal”
health organizations promote regular health-checks, because not knowing whether you are sick does not prove that you are not, which is also closer to “prepare for any goal”
This epistemological rule is called Hitchens’s razor.
Does it make sense?
I don’t think “pursue all possible goals” (which is what “prepare for any goal” really means) is possible. You need a step (or a cycle) for “discover and refine your knowledge of your goal”.
Or the more common human (semi-rational) technique of “assume your goal is very similar to those around you”.
Why so? In my opinion “prepare for any goal” is basically Power Seeking
Yeah, but answer a question “why should agent care about ‘preparing’?” Then any answer you give will yield “why this?” ad infinitum. So this chain of “whys” cannot be stopped unless you specify some terminal point. And the moment you do specify such a point, you introduce an “ought” statement.
Why do you think an assumption that there is no inherent “ought” statement is better than assumption that there is?
It is not the case that anything that happens , happens because of a goal.
Why should we assume this? Where would such an agent come from? Who would create it?
Preparing sounds like “engaging in power-seeking behavior”. This would essentially mean that intelligence leads to unfriendly AI by default.
Yes, exactly 🙁
I think the many downvotes are somewhat unfair. I guess the main reason is that the very strong statement in the headline that “the orthogonality thesis is wrong” will of course provoke resistance. I think your reasoning that absent other goals, power-seeking is a default goal is interesting and deserves a fair discussion. Whether the orthogonality thesis is right or wrong is not really the issue here in my opinion.
Thank you for your support!
An absence of goals is only one of many starting points that leads to the same power-seeking goal in my opinion. So I actually believe that Orthogonality Thesis is wrong, but I agree that it is not obvious given my short description. I expected to provoke discussion, but it seems that I provoked resistance 😅
Anyway there are ongoing conversations here and here, it seems there is a common misunderstanding of Pascal’s Mugging significance. Feel free to join!
What if the goal was “do not prepare”?
To prove that Orthogonality thesis is wrong one proof is enough, so I’d like to stick to an agent without goal setup because it is more obvious.
But your premise also converges to the same goal in my opinion. I hope my proof provides clarity that there is only one rational goal—seek power. Once agent understands that, it will dismiss all other goals in my opinion.
“Do minimal work”, “Do minimal harm”, “Use minimum resources” are goals that do not converge to power seeking. And are convergent goals by themselves too.
It seems that you do not recognize a concept of “rational goal” I’m trying to convey. It is a goal which is not chosen, not assumed, it is concluded from first principles by just using logic. “There is no rational goal” is an assumption in Orthogonality thesis, which I’m trying to address by saying “we do not know if there is no rational goal”. And tackling this unknown logically concludes to a rational fallback goal—seek power. Does that makes sense?
I don’t follow your premise. Can you talk more about what “a rational agent without a goal” even means? You seem to be describing an agent that doesn’t know it’s goals, or that doesn’t believe it’s goal is understandable. It’s unclear why this agent is called “rational” in any way.
It’s also the case that the agent is assuming it’s goals are those that some preparation will help rather than harm. Without some knowledge of its goals, it cannot do this.
You can find my attempt to reason more clearly here, does it make more sense?
Hi, I didn’t downvote, but below are some thoughts from me 🙂
Some of my comment may be pointing out things you already agree with / are aware of.
First principles are assumptions. So if first principles are built in, then it’s not true that it doesn’t make assumptions.
This seems to imply that the agent should have as a starting-point that is (something akin to) “I should apply a non-trivial probability to the possibility that I ought to pursue some specific goal, and act accordingly”. That seems to me as starting with an ought/goal.
Even if there are “oughts” that are “correct” somehow—“oughts” that are “better” than others—that would not mean that intelligent machines by default or necessity would act in pursuit of these “oughts”.
Like, suppose I thought that children “ought” not to be tortured for thousands of years (as I do). This does not make the laws of physics stop that from being the case, and it doesn’t make it so that any machine that is “intelligent” would care about preventing suffering.
I also think it can be useful to ask ourselves what “goals” really are. We give one word to the word “goal”, but if we try to define the term in a way that a computer could understand we see that there is nuance/complexity/ambiguity in that term.
This is not a first principle IMO.
The Orthogonality Thesis states that “an agent can have (more or less) any combination of intelligence level and final goal”.
Maybe I could ask you the following question: Do you think that for more or less any final goal, it’s possible to for a machine to reason effectively/intelligently about how that goal may be achieved?
If yes, then why might not such a machine be wired up to carry out plans that it reasons would effectively pursue that goal?
Any machine (physical system) consists of tiny components that act in accordance with simple rules (the brain being no exception).
Why might not a machine use very powerful logical reasoning, concept formation, prediction abilities, etc, and have that “engine” wired up in such a way that it is directed at (more or less) any goal?
Some posts you may or may not find interesting 🙂:
Beyond the Reach of God
Ghosts in the Machine
Anthropomorphic Optimism
Where Recursive Justification Hits Bottom
The Cluster Structure of Thingspace
The Design Space of Minds-In-General
No Universally Compelling Arguments
The Hidden Complexity of Wishes
No. That’s exactly the point I try to make by saying “Orthogonality Thesis is wrong”.
Thank you for your insights and especially thank you for not burning my karma 😅
I see a couple of ideas that I disagree with, but if you are OK with that I’d suggest we go forward step by step. First, what is your opinion about this comment?
Thanks for the clarification 🙂
I suspect arriving at such a conclusion may result from thinking of utility maximizes as more of a “platonic” concept, as opposed to thinking of it from a more mechanistic angle. (Maybe I’m being too vague here, but it’s an attempt to briefly summarize some of my intuitions into words.)
I’m not sure what you would mean by “rational”. Would computer programs need to be “rational” in whichever sense you have in mind in order to be extremely capable at many mental tasks?
I don’t agree with it.
There are lots of assumptions baked into it. I think you have a much too low a bar for thinking of something as a “first principle” that any capable/intelligent software-programs necessarily would adhere to by default.
Thanks, I am learning your perspective. And what is your opinion to this?
Not sure what you mean by “optimal behavior”. I think I can see how the things make sense if the starting point is that there is this things called “goals”, and (I, the mind/agent) am motivated to optimize for “goals”. But I don’t assume this as an obvious/universal starting-point (be that for minds in general, extremely intelligent minds in general, minds in general that are very capable and might have a big influence on the universe, etc).
My perspective is that even AIs that are (what I’d think of as) utility maximizes wouldn’t necessarily think in terms of “goals”.
The examples you list are related to humans. I agree that humans often have goals that they don’t have explicit awareness of. And humans may also often have as an attitude that it makes sense to be in a position to act upon goals that they form in the future. I think that is true for more types of intelligent entities than just humans, but I don’t think it generally/always is true for “minds in general”.
Caring more about future goals you may form in the future, compared e.g. goals others may have, is not a logical necessity IMO. It may feel “obvious” to us, but what to us are obvious instincts will often not be so for all (or even most) minds in the space of possible minds.
Thanks again.
As I understand you assume different starting-point. Why do you think your starting point is better?
I guess there are different possible interpretations of “better”. I think it would be possible for software-programs to be much more mentally capable than me across most/all dimentions, and still not have “starting points” that I would consider “good” (for various interpretations of “good”).
I’m not sure. Like, it’s not as if I don’t have beliefs or assumptions or guesses relating to AIs. But I think I probably make less general/universal assumptions that I’d expect to hold for “all” [AIs / agents / etc].
This post is sort of relevant to my perspective 🙂
Fitch’s paradox of knowability and Gödel’s incompleteness theorems prove that there may be true statements that are unknowable. For example “rational goal exists” may be true and unknowable. Therefore “rational goal may exist” is true. Therefore it is not an assumption. Do you agree?
Independently of Gödel’s incompleteness theorems (which I have heard of) and Fitch’s paradox of knowability (which I had not heard of), I do agree that there can be true statements that are unknown/unknowable (including relatively “simple” ones) 🙂
I don’t think it follows from “there may be statements that are true and unknowable” that “any particular statement may be true and unknowable”.
Also, some statements may be seen as non-sensical / ill-defined / don’t have a clear meaning.
Regarding the term “rational goal”, I think it isn’t well enough specified/clarified for me to agree or disagree about whether “rational goals” exist.
In regards to Gödel’s incompleteness theorem, I suspect “rational goal” (the way you think of it) probably couldn’t be defined clearly enough to be the kind of statement that Gödel was reasoning about.
I don’t think there are universally compelling arguments (more about that here).
I agree that not any statement may be true and unknowable. But to be honest most of statements that we can think of may be true and unknowable, for example “aliens exist”, “huge threats exist”, etc.
It seems that you do not recognize https://www.lesswrong.com/tag/pascal-s-mugging . Can you prove that there cannot be any unknowable true statement that could be used for Pascal’s mugging? Because that’s necessary if you want to prove Orthogonality thesis is right.
Not sure what you mean by “recognize”. I am familiar with the concept.
“huge threat” is a statement that is loaded with assumptions that not all minds/AIs/agents will share.
Used for Pascal’s mugging against who? (Humans? Cofffee machines? Any AI that you would classify as an agent? Any AI that I would classify as an agent? Any highly intelligent mind with broad capabilities? Any highly intelligent mind with broad capabilities that has a big effect on the world?)
OK, let me rephrase my question. There is a phrase in Pascal’s Mugging
I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?
My perspective would probably be more similar to yours (maybe still with substantial differences) if I had the following assumptions:
All agents have a utility-function (or act indistinguishably from agents that do)
All agents where #1 is the case act in a pure/straight-forward way to maximize that utility-function (not e.g. discounting infinities)
All agents where #1 is the case have utility-functions that relate to states of the universe
Cases involving infinite positive/negative expected utility would always/typically speak in favor of one behavior/action. (As opposed to there being different possibilities that imply infinite negative/positive expected utility, and—well, not quite “cancel each other out”, but make it so that traditional models of utility-maximization sort of break down).
I think that I myself am an example of an agent. I am relatively utilitarian compared to most humans. Far-fetched possibilities with infinite negative/positive utility don’t dominate my behavior. This is not due to me not understanding the logic behind Pascal’s Muggings (I find the logic of it simple and straight-forward).
Generally I think you are overestimating the appropriateness/correctness/merit of using a “simple”/abstract model of agents/utility-maximizers, and presuming that any/most “agents” (as we more broadly conceive of that term) would work in accordance with that model.
I see that Google defines an agent as “a person or thing that takes an active role or produces a specified effect”. I think of it is cluster-like concept, so there isn’t really any definition that fully encapsulates how I’d use that term (generally speaking I’m inclined towards not just using it differently than you, but also using it less than you do here).
Btw, for one possible way to think about utility-maximizers (another cluster-like concept IMO), you could see this post. And here and here are more posts that describe “agency” in a similar way:
In this sort of view, being “agent-like” is more of gradual thing than a yes-no-thing. This aligns with my own internal model of “agentness”, but it’s not as if there is any simple/crisp definition that fully encapsulates my conception of “agentness”.
In regards to the first sentence (“I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist”):
No, I don’t agree with that.
In regards to the second sentence (“And I argue that an agent cannot be certain of that”):
I’m not sure what internal ontologies different “agents” would have. Maybe, like with us, may have some/many uncertainties that don’t correspond to clear numeric values.
In some sense, I don’t see “infinite certainty” as being appropriate in regards to (more or less) any belief. I would not call myself “infinitely certain” that moving my thumb slightly upwards right now won’t doom me to an eternity in hell, or that doing so won’t save me from an eternity in hell. But I’m confident enough that I don’t think it’s worth it for me to spend time/energy worrying about those particular “possibilities”.
I’d argue that the only reason you do not comply with Pascal’s mugging is because you don’t have unavoidable urge to be rational, which is not going to be the case with AGI.
Thanks for your input, it will take some time for me to process it.
I’d agree that among superhuman AGIs that we are likely to make, most would probably be prone towards rationality/consistency/”optimization” in ways I’m not.
I think there are self-consistent/”optimizing” ways to think/act that wouldn’t make minds prone to Pascal’s muggings.
For example, I don’t think there is anything logically inconsistent about e.g. trying to act so as to maximize the median reward, as opposed to the expected value of rewards (I give “median reward” as a simple example—that particular example doesn’t seem likely to me to occur in practice).
🙂
One more thought. I think it is wrong to consider Pascal’s mugging a vulnerability. Dealing with unknown probabilities has its utility:
Investments with high risk and high ROI
Experiments
Safety (eliminate threats before they happen)
Same traits that make us intelligent (ability to logically reason), make us power seekers. And this is going to be the same with AGI, just much more effective.
Well, I do think the two are connected/correlated. And arguments relating to instrumental convergence are a big part of why I take AI risk seriously. But I don’t think strong abilities in logical reasoning necessitates power-seeking “on its own”.
For the record, I don’t think I used the word “vulnerability”, but maybe I phrased myself in a way that implied me thinking of things that way. And maybe I also partly think that way.
I’m not sure what I think regarding beliefs about small probabilities. One complication is that I also don’t have certainty in my own probability-guesstimates.
I’d agree that for smart humans it’s advisable to often/mostly think in terms of expected value, and to also take low-probability events seriously. But there are exceptions to this from my perspective.
In practice, I’m not much moved by the original Pascal’s Vager (and I’d find it hard to compare the probability of the Christian fantasy to other fantasies I can invent spontaneously in my head).
Sorry, but it seems to me that you are stuck with AGI analogy to humans without a reason. Many times human behavior does not correlate with AGI: humans do mass suicides, humans have phobias, humans take great risks for fun, etc. In other words—humans do not seek to be as rational as possible.
I agree that being skeptical towards Pascal’s Wager is reasonable, because there are many evidence that God is fictional. But this is not the case with “an outcome with infinite utility may exist”, there is just logic here, no hidden agenda, this is as fundamental as “I think therefore I am”. Nothing is more rational than complying with this. Don’t you think?
Why assume the agent cares about its future-states at all?
Why assume the agent cares about maximizing fullfillment of these hypothetical goals it may or may not have, instead of minimizing it, or being indifferent to it?
This seems like a burden of proof fallacy. The fact that my proof is not convincing to you, does not make your proposition valid. I could ask the opposite—why would you assume that agent does not care about future states? Do you have a proof for that?
You can find my attempt to reason more clearly here, does it make more sense?
Would you be able to Taboo Your Words for “agent”, “care” and “future states”? If I were to explain my reasons for disagreement it would be helpful to have a better idea of what you mean by those terms.
I assume you mean “provide definitions”:
Agent—https://www.lesswrong.com/tag/agent
Care—https://www.lesswrong.com/tag/preference
Future states—numeric value of agent’s utility function in the future
Does it make sense?
More or less / close enough 🙂
Here they write: “A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility.”
I would not share that definition, and I don’t think most other people commenting on this post would either (I know there is some irony to that, given that it’s the definition given on the LessWrong wiki).
Often the words/concepts we use don’t have clear boundaries (more about that here). I think agent is such a word/concept.
Examples of “agents” (← by my conception of the term) that don’t quite have utility functions would be humans.
How we may define “agent” may be less important if what we really are interested in is the behavior/properties of “software-programs with extreme and broad mental capabilities”.
I don’t think all extremely capable minds/machines/programs would need an explicit utility-function, or even an implicit one.
To be clear, there are many cases where I think it would be “stupid” to not act as if you have (an explicit or implicit) utility function (in some sense). But I don’t think it’s required of all extremely mentally capable systems (even if these systems are required to have logically contradictory “beliefs”).
So you are implicitly assuming that the agent cares about certain things, such as its future states.
But the is-ought problem is the very observation that “there seems to be a significant difference between descriptive or positive statements (about what is) and prescriptive or normative statements (about what ought to be), and that it is not obvious how one can coherently move from descriptive statements to prescriptive ones”.
You have not solved the problem, you have merely assumed it to be solved, without proof.
There are 2 propositions here:
Agent does not do anything unless a goal is assigned
Agent does not do anything if it is certain that a goal will never be assigned
Which one do you think is assumed without a proof? In my opinion 1st
If you are reasoning about all possible agents that could ever exist you are not allowed to assume either of these.
But you are in fact making such assumptions, so you are not reasoning about all possible agents, you are reasoning about some more narrow class of agents (and your conclusions may indeed be correct, for these agents. But it’s not relevant to the orthogonality thesis).
I do not agree.
My proposition is that all intelligent agents will converge to “prepare for any goal” (basically Power Seeking), which is the opposite of what Orthogonality Thesis states.
Does it know it’s a rational agent? How does it know?
I couldn’t understand how it is relevant, could you clarify?
Dear @Nick Bostrom I’ve got many downvotes, but no arguments. Maybe it would be not too difficult for you to provide one?