With those kind of definitions having a paperclip maximers and a paper clip maximiser that also tries to keep humans alive the former is fit for a more singular purpose and is thus “more rational”.
I am also a bit worried about equivocation between economic sense of “rationality” ie trying to get an outcome vs “honing epistemics” kind of rationality.
Also interesting is the intersection of intellilgence that is not rational. Presumably if control over some phenomena is outside the scope of the existence of the AI meddling with that would be off-mission and thus a “bias”. There is some circularity on whether “problem” makes sense without a goal. But it would be interesting if increasing intelligence would make the agent less rational (in the economic sense).
In the “subject to axioms” kind of rationality the functionality could be subject for “the utility function is not up for grabs”. What is a bias form one angle is a facet of mission scope from the other. Arguments about what the utility fuction should look like will be invulnerable to requirement to be following a utility function. Thus no “utility function evolution”.
I am still confused about these topics. We know that any behavior can be expressed as a complicated world-history utility function, and that therefore anything at all could be rational according to these. So I sometimes think of rationality as a spectrum, in which the simpler the utility function justifying your actions the more rational you are. According to such a definition rationality may actually be opposed to human values at the highest end, so it makes a lot of sense to focus on intelligence that is not fully rational.
Not really sure what you mean by a “honing epistemics” kind of rationality, but I understand that moral uncertainty in the perspective of the AGI may increase the chance that it keep some small fraction of the universe for us, so that would also be great. Is that what you mean? I don’t think it is going to be easy to have the AGI consider some phenomena as outside its scope (such that it would be irrational to meddle with it). If we want the AGI not to leave us alone, then this should be a value that we need to include in their utility function somehow.
Utility function evolution is something complicated. I worry a lot about that, particularly because this seems one of the ways to achieve corrigibility and we really want that, but it also looks as a violation of goal-integrity on the perspective of the AGI. Maybe it is possible for the AGI to consider this “module” responsible for giving feedback to itself as part of itself, just as we (usually) consider our midbrain and other evolutionary ancient “subcortical” areas as a part of us rather than some “other” system interfering with our higher goals.
That kind of conception of “rationality as simpletonness” is very unsual. I offer almost perfectly opposite view that an agent that cares about hunger is more primitive and less advanced being than one that cares about hunger and thirst. And the more sophistication there is to the being the more components its utility function seems to have.
with “honing epistemics” I am more trying get at the property of that makes a rationalist a rationalist. Being a homo economicus doesn’t make you be especially principled in your epistemics.
I agree my conception is unusual, I am ready to abandon it in favor of some better definition. At the same time I feel like an utility function having way too many components makes it useless as a concept.
Because here I’m trying to derive the utility from the actions, I feel like we can understand the being better the less information is required to encode its utility function, in a Kolmogorov complexity sense, and that if its too complex then there is no good explanation to the actions and we conclude the agent is acting somewhat randomly.
Maybe trying to derive the utility as a ‘compression’ of the actions is where the problem is, and I should distinguish more what the agent does from what the agent wants. An agent is then going to be irrational only if the wants are inconsistent with each other; if the actions are inconsistent with what it wants then it is merely incompetent, which is something else.
With those kind of definitions having a paperclip maximers and a paper clip maximiser that also tries to keep humans alive the former is fit for a more singular purpose and is thus “more rational”.
I am also a bit worried about equivocation between economic sense of “rationality” ie trying to get an outcome vs “honing epistemics” kind of rationality.
Also interesting is the intersection of intellilgence that is not rational. Presumably if control over some phenomena is outside the scope of the existence of the AI meddling with that would be off-mission and thus a “bias”. There is some circularity on whether “problem” makes sense without a goal. But it would be interesting if increasing intelligence would make the agent less rational (in the economic sense).
In the “subject to axioms” kind of rationality the functionality could be subject for “the utility function is not up for grabs”. What is a bias form one angle is a facet of mission scope from the other. Arguments about what the utility fuction should look like will be invulnerable to requirement to be following a utility function. Thus no “utility function evolution”.
I am still confused about these topics. We know that any behavior can be expressed as a complicated world-history utility function, and that therefore anything at all could be rational according to these. So I sometimes think of rationality as a spectrum, in which the simpler the utility function justifying your actions the more rational you are. According to such a definition rationality may actually be opposed to human values at the highest end, so it makes a lot of sense to focus on intelligence that is not fully rational.
Not really sure what you mean by a “honing epistemics” kind of rationality, but I understand that moral uncertainty in the perspective of the AGI may increase the chance that it keep some small fraction of the universe for us, so that would also be great. Is that what you mean? I don’t think it is going to be easy to have the AGI consider some phenomena as outside its scope (such that it would be irrational to meddle with it). If we want the AGI not to leave us alone, then this should be a value that we need to include in their utility function somehow.
Utility function evolution is something complicated. I worry a lot about that, particularly because this seems one of the ways to achieve corrigibility and we really want that, but it also looks as a violation of goal-integrity on the perspective of the AGI. Maybe it is possible for the AGI to consider this “module” responsible for giving feedback to itself as part of itself, just as we (usually) consider our midbrain and other evolutionary ancient “subcortical” areas as a part of us rather than some “other” system interfering with our higher goals.
That kind of conception of “rationality as simpletonness” is very unsual. I offer almost perfectly opposite view that an agent that cares about hunger is more primitive and less advanced being than one that cares about hunger and thirst. And the more sophistication there is to the being the more components its utility function seems to have.
with “honing epistemics” I am more trying get at the property of that makes a rationalist a rationalist. Being a homo economicus doesn’t make you be especially principled in your epistemics.
I agree my conception is unusual, I am ready to abandon it in favor of some better definition. At the same time I feel like an utility function having way too many components makes it useless as a concept.
Because here I’m trying to derive the utility from the actions, I feel like we can understand the being better the less information is required to encode its utility function, in a Kolmogorov complexity sense, and that if its too complex then there is no good explanation to the actions and we conclude the agent is acting somewhat randomly.
Maybe trying to derive the utility as a ‘compression’ of the actions is where the problem is, and I should distinguish more what the agent does from what the agent wants. An agent is then going to be irrational only if the wants are inconsistent with each other; if the actions are inconsistent with what it wants then it is merely incompetent, which is something else.