This seems like a burden of proof fallacy. The fact that my proof is not convincing to you, does not make your proposition valid. I could ask the opposite—why would you assume that agent does not care about future states? Do you have a proof for that?
You can find my attempt to reason more clearly here, does it make more sense?
why would you assume that agent does not care about future states? Do you have a proof for that?
Would you be able to Taboo Your Words for “agent”, “care” and “future states”? If I were to explain my reasons for disagreement it would be helpful to have a better idea of what you mean by those terms.
Here they write: “A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility.”
I would not share that definition, and I don’t think most other people commenting on this post would either (I know there is some irony to that, given that it’s the definition given on the LessWrong wiki).
Often the words/concepts we use don’t have clear boundaries (more about that here). I think agent is such a word/concept.
Examples of “agents” (← by my conception of the term) that don’t quite have utility functions would be humans.
How we may define “agent” may be less important if what we really are interested in is the behavior/properties of “software-programs with extreme and broad mental capabilities”.
Future states—numeric value of agent’s utility function in the future
I don’t think all extremely capable minds/machines/programs would need an explicit utility-function, or even an implicit one.
To be clear, there are many cases where I think it would be “stupid” to not act as if you have (an explicit or implicit) utility function (in some sense). But I don’t think it’s required of all extremely mentally capable systems (even if these systems are required to have logically contradictory “beliefs”).
So you are implicitly assuming that the agent cares about certain things, such as its future states.
But the is-ought problem is the very observation that “there seems to be a significant difference between descriptive or positive statements (about what is) and prescriptive or normative statements (about what ought to be), and that it is not obvious how one can coherently move from descriptive statements to prescriptive ones”.
You have not solved the problem, you have merely assumed it to be solved, without proof.
If you are reasoning about all possible agents that could ever exist you are not allowed to assume either of these.
But you are in fact making such assumptions, so you are not reasoning about all possible agents, you are reasoning about some more narrow class of agents (and your conclusions may indeed be correct, for these agents. But it’s not relevant to the orthogonality thesis).
My proposition is that all intelligent agents will converge to “prepare for any goal” (basically Power Seeking), which is the opposite of what Orthogonality Thesis states.
This seems like a burden of proof fallacy. The fact that my proof is not convincing to you, does not make your proposition valid. I could ask the opposite—why would you assume that agent does not care about future states? Do you have a proof for that?
You can find my attempt to reason more clearly here, does it make more sense?
Would you be able to Taboo Your Words for “agent”, “care” and “future states”? If I were to explain my reasons for disagreement it would be helpful to have a better idea of what you mean by those terms.
I assume you mean “provide definitions”:
Agent—https://www.lesswrong.com/tag/agent
Care—https://www.lesswrong.com/tag/preference
Future states—numeric value of agent’s utility function in the future
Does it make sense?
More or less / close enough 🙂
Here they write: “A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility.”
I would not share that definition, and I don’t think most other people commenting on this post would either (I know there is some irony to that, given that it’s the definition given on the LessWrong wiki).
Often the words/concepts we use don’t have clear boundaries (more about that here). I think agent is such a word/concept.
Examples of “agents” (← by my conception of the term) that don’t quite have utility functions would be humans.
How we may define “agent” may be less important if what we really are interested in is the behavior/properties of “software-programs with extreme and broad mental capabilities”.
I don’t think all extremely capable minds/machines/programs would need an explicit utility-function, or even an implicit one.
To be clear, there are many cases where I think it would be “stupid” to not act as if you have (an explicit or implicit) utility function (in some sense). But I don’t think it’s required of all extremely mentally capable systems (even if these systems are required to have logically contradictory “beliefs”).
So you are implicitly assuming that the agent cares about certain things, such as its future states.
But the is-ought problem is the very observation that “there seems to be a significant difference between descriptive or positive statements (about what is) and prescriptive or normative statements (about what ought to be), and that it is not obvious how one can coherently move from descriptive statements to prescriptive ones”.
You have not solved the problem, you have merely assumed it to be solved, without proof.
There are 2 propositions here:
Agent does not do anything unless a goal is assigned
Agent does not do anything if it is certain that a goal will never be assigned
Which one do you think is assumed without a proof? In my opinion 1st
If you are reasoning about all possible agents that could ever exist you are not allowed to assume either of these.
But you are in fact making such assumptions, so you are not reasoning about all possible agents, you are reasoning about some more narrow class of agents (and your conclusions may indeed be correct, for these agents. But it’s not relevant to the orthogonality thesis).
I do not agree.
My proposition is that all intelligent agents will converge to “prepare for any goal” (basically Power Seeking), which is the opposite of what Orthogonality Thesis states.