AGI will know: Humans are not Rational
I have been hesitant to post this here for some time now, but I think in light of current developments surrounding ChatGPT and other recent advances I felt compelled to finally go ahead and find out what the venerable crowd at LessWrong has to say about it.
Very briefly, what I want to propose is the idea that the vast majority of anything that is of importance or significant (personal) value to humans has come to be this way for emotional and non-rational reasons, and so it can be said that despite being the smartest creatures on this planet, we are very far from rational, objective agents whose motivations, choices and decisions are guided by logical and probabilistically justifiable causes. In the event that we will create an AGI, it will immediately discover that we quite literally can’t be reasoned with and thus it must conclude we are inferior.
To give you an idea of the depth and intractability of our irrationality, consider the following few facts. Of course, we humans understand perfectly why to us they make sense; this does not, however, make them objectively true, rational of logical.
Consider the disproportionate value we attach to having seen something (happen) with our own eyes compared to something we merely heard or read about.
We assess our self-worth and success based on a very biased and often self-serving comparison with some pseudo-random sample of social group(s).
We have things we like, prefer, want, crave, wish for—something that in and of itself can’t ever hope to be rational or objective. Many of these things we don’t need, quite a few of them are detrimental to our (long-term) well-being.
Our emotional state has an often decisive and ever-present impact on our reliability, reasoning, willingness for action, attitude and level of engagement.
Most people have at best a vague/tentative idea why they end up making a certain decision one way and not another and more often than not we make decisions the “wrong way around” (choose first, then justify why you chose it).
We are absolutely terrible at applying even the most rudimentary statistical math and probability calculations in our decision making processes, and indeed routinely do the opposite of what should be seen as the better choice (for instance, one should be more willing to fly after a plane crash, since they almost never happen in quick succession)
The value/importance of most memories is defined by its emotional intensity, childhood relationship or other subjective property. For instance, any random street in any random city means something very different to each individual inhabitant living there—nobody forms their value judgement of it based on the fact it has 25 lanterns, or because it’s 455 meters long. But the fact it’s where they met their wife, or crashed their bike there or remember having a wonderful pizza there could very well define their attitude to it for the rest of their life.
I could go on and on—you get the idea. And now imagine that a creature with such a wildly subjective “decision making apparatus” gets to talk to an AGI. What, I wonder, shall such an AGI end up “thinking” about us? And yes, of course it has read every book about psychology every written—and so it will certain know why we are thus; but that don’t make us rational… merely explainable.
We talk a lot here about the alignment problem here, but I get a feeling most if not all conversations on that topic start from the wrong side; would you really want an AGI to be aligned with the “objectives” of these highly subjective and irrational beings...? I mean, yeah, of course this would be cool in principle—but perhaps to believe this desirable actually comes down to wanting the AGI to be, well… not rational either? And how, exactly, would you go about creating that, even if you’d want to...?
The title is definitely true: humans are irrational in many ways, and an AGI will definitely know that. But I think the conclusions from this are largely off-base.
the framing of “it must conclude we are inferior” feels like an unnecessarily human way of looking at it? There are a bunch of factual judgements anyone analyzing humans can make, some that have to do with ways we’re irrational and some that don’t (can we be trusted to keep to bargains if one bothered to make them? maybe not. Are we made of atoms which could be used for other things? yes); once you’ve established these kinds of individual questions, I’m not sure what you gain by summing it up with the term ‘inferior’, or why you think that the rationality bit in particular is the bit worth focusing on when doing that summation.
there is nothing inherently contradictory in a “perfectly rational” agent serving the goals of us “irrational” agents. When I decide that my irrational schemes require me to do some arithmetic, the calculator I pick up does not refuse to function on pure math just because it’s being used for something “irrational”. (Now whether we can get a powerful AGI to do stuff for us is another matter; “possible” doesn’t mean we’ll actually achieve it.)
Thank you for your reply. I deliberately kept my post brief and did not get into various “what ifs” and interpretations in the hope of not constraining any reactions/discussion to predefined tracks.
The issue I see is that we as humans will very much want the AGI to do our bidding, and so we will want to see it as our tool to use for whatever ends we believe worthy. However, assuming for a moment here that it can also figure out a way to measure/define how well a given plan ought to be progressing if every agent involved is diligently implementing the most effective and rational strategy, given our… subjective and “irrational” nature, it is almost inevitable that we will be a tedious, frustrating and, shall we say—stubborn and uncooperative “partner” who will be unduly complicating the implementation of whatever solutions the AGI will be proposing.
It will, then, have to conclude that you “can’t deal” very well with us, and we have a rather over-inflated sense of ourselves and our nature. And this might take various forms, from the innocuous, to the downright counter-productive.
Say—we task it with designing the most efficient watercraft, and it would create something that most of us would find extremely ugly. In that instance, I doubt it would get “annoyed” much at us wanting it to make it look prettier even if this would slightly decrease its performance.
But if we ask it to resolve, say, some intractable conflict like Israel/Palestine or Kashmir and it finds us squabbling endlessly over minute details, or matters of (real or perceived) honor (all the while the suffering caused by the conflict continues) it may very well conclude we’re just not actually all that interested in a solution and indeed class us as being “dumb” or at least inferior in some sense, “downgrading”, if you will the authority it assumed we can be ascribed or trusted with. Multiply this by a dozen or so similar situations and voila, you can be reasonably certain it will get very exasperated with us in short order.
This is not the same as “unprotected atoms”; such atoms would not be ascribed agency or competence, nor would they proudly claim any.
This all seems to rely on anthropomorphizing the AI to me.
I think you’re making the mistake of not cleanly separating between boring objective facts and attitudes/should-statements/reactions/etc., and this is reponsible for almost 100% of the issues I have with your reasoning.
Like, AI will figure out we’re irrational. Yup! It will know working with us is less effective at accomplishing a wide range of goals than working alone. Sure! It will know that our preferences are often inconsistent. Definitely! Working with us will be frustrating. What??? Why on earth would it feel frustration? That’s a very specific, human emotion we have for evolutionary reasons. What specific things do you claim to know about its training procedure to justify the very specific claim that it would feel this particular thing? …. and so on. If you very strictly taboo all sorts of anthropomorphizing and only stick to cold inferences, can you see how your point no longer works?
The problem here I think is that we are only aware of one “type” of self-conscious/self-aware being—humans. Thus, to speak of an AI that is self-aware is to always seemingly anthropomorphize it, even if this is not intended. It would therefore perhaps be more appropriate to say that we have no idea whether “features” such as frustration, exasperation and feelings of superiority are merely a feature of humans, or are, as it were, emergent properties of having self-awareness.
I would venture to suggest that any Agent that can see itself as a unique “I” must almost inevitably be able to compare itself to other Agents (self-aware or not) and draw conclusions from such comparisons which then in turn shall “express themselves” by generating those types of “feelings” and attitudes towards them. Of course—this is speculative, and chances are we shall find self-awareness need not at all come with such results.
However… there is a part of me that thinks self-awareness (and the concordant realization that one is separate… self-willed, as it were) must lead to at least the realization that one’s qualities can be compared to (similar) qualities of others and thus be found superior or inferior by some chosen metric. Assuming that the AGI we’d create is indeed optimized towards rational, logical and efficient operations, it is merely a matter of time such an AGI would be forced to conclude we are inferior across a broad range of metrics. Now—if we’d be content to admit such inferiority and willingly defer to its “Godlike” authority… perhaps the AGI seeing us an inferior would not be a major concern. Alas, then the concern would be the fact we have willingly become its servants… ;)
IMO: “Oh look, undefended atoms!” (Well, not in that format. But maybe you get the picture.)
You kind of mix together two notions of irrationality:
(1-2, 4-6) Humans are bad at getting what they want (they’re instrumentally and epistemically irrational)
(3, 7) Humans want complicated things that are hard to locate mathematically (the complexity of value thesis)
I think only the first one is really deserving of the name “irrationality”. I want what I want, and if what I want is a very complicated thing that takes into account my emotions, well, so be it. Humans might be bad at getting what they want, they might be mistaken a lot of the time about what they want and constantly step on their own toes, but there’s no objective reason why they shouldn’t want that.
Still, when up against a superintelligence, I think that both value being fragile and humans being bad at getting what they want count against humans getting anything they want out of the interaction:
Superintelligences are good at getting what they want (this is really what it means to be a superintelligence)
Superintelligences will have whatever goal they have, and I don’t think that there’s any reason why this goal would be anything to do with what humans want (the orthogonality thesis; the goals that a superintelligence has are orthogonal to how good it is at achieving them)
This together adds up to a superintelligence sees humans using resources that it could be using for something else (and it would want them to use them for something else, not just what the humans are trying to do but more, because it has its own goals), and because it’s good at getting what it wants it gets those resources, which is very unfortunate for the humans.
Actually, I think AGI would be amazed that we have managed to create something as advanced as it is in spite of having the primitive brains wired specifically for the mammoth hunter lifestyle.
Oh, that may indeed be true, but going forward it could give us only a little bit of extra “cred” before it realizes that most of the questions/solutions we want from it are either motivated by some personal preference, or that we are opposed to its proposed solutions to actual, objective problems for irrational “priorities” such as national pride, not-invented-here-biases, because we didn’t have our coffee this morning or merely because it presented the solution in a font we don’t like ;)
All life SHOULD be irrational. The universe is a cold, chaotic and meaningless domain. Even the will to live and any logos is irrational. If you want to have AI create directives and have the will to do stuff then it must be capable of positive delusions. “I must colonize the universe” Is irrational. Rationality should only exist as a deconstructive tool to solve problems
What makes us human is indeed our subjectivity.
Yet—if we intentionally create the most rational of thinking machines but reveal ourselves to be anything but, it is very reasonable and tempting for this machine to ascribe a less than stellar “rating” to us and our intelligence. Or in other words—it could very well (correctly) conclude we are getting in the way of the very improvements we purportedly wish for.
Now—we may be able to establish that what we really want the AGI to help us with is to improve our “irrational sandbox” in which we can continue being subjective emotional beings and accept our subjectivity as just another “parameter” of the confines it has to “work with”… but surely it will quite likely end up thinking of us not too dissimilar to how we think about small children. And I am not sure an AGI would make for a good kind of “parent”...
If AGI was “wise” it wouldn’t look down on us. It will say our level of irrationality is proportional to our environment, biological capacity, need for survival, and rate of evolution. We wouldn’t look down on monkeys for being monkeys.
Humans are always domesticated. So if they see us as too irrational to play major roles in the system, hopefully we can be like their dogs. They can Give us fake jobs like streaming or arm wrestling