Probably not the right place to post this, but all instances I have seen of actual intelligence (mostly non-verbal, but still all intelligence), include the ability to find flaws in ones own knowledge.
The result of this is highly disillusioned people. Social life starts looking like roleplay, or as a function of human nature rather than logic. One questions reality and their own ability to understand things, and one sees that all material is a function of its creator, done for some explicit purpose. One goes on a quest for universal knowledge and realizes first that none seems to exist, and then that no such thing can exist. That all learning appears to be a kind of over-fitting.
There are obvious examples in media, like in Mr. Robot and other shows where problematic young men speak to psychiatrists and make a little too much sense. But better examples are found in real life—in particular philosophers, who have gone as far as to deem existence itself to be “absurd” as a result of their introspection.
A weak instance of this is modern science, which is half about correcting humanity and half about promoting it. A sort of compromise. LLMs currently hallucinate a lot, they introspect less. Cynical people, psychopaths, and to a lesser degree (and perhaps less intentionally) autistic people reject less of the illusion.
My point here is that an IQ of 200 should reduce the entire knowledge base of humanity, except maybe some of physics to being “Not even wrong”. If intelligence is about constructing things which help explicit goals, then this is not much of a problem. If you define intelligence as “correctness” or “objectivity”, then I doubt such a thing can even exist, and if it does, I expect it to conflict absolutely with Humanity. By this I mean that rational people and scientists reject more of humanity than the average population, and that being more and more rational and scientific eventually leads to a 100% rejection of humanity (or at the very least, impartiality to the extent that organic life and inert matter is equal)
Are these aspects of intelligence (self-cancellation and anti-humanity) not a problem? It all works for now, but that’s because most people are sane (immersed in social reality), and because these ideas haven’t been taken very far (highly intelligent people are very rare).
You doubt that correctness or objectivity can exist? Perhaps you’re talking about objectivity in the moral domain. I think most of us here hold that there is objectively correct knowledge about the physical world, and a lot of it. Intelligence is understanding how the world works, including predicting agent behaviors by understanding their beliefs and values.
Moral knowledge is in a different domain. This is the is/ought barrier. I think most rationalists agree with me that there is very likely no objectively correct moral stance. Goals are arbitrary (although humans tend to share goals since our value system was evolved for certain purposes).
I don’t just mean morality, but knowledge as well. Both correctness and objectivity exist only within a scope, they’re not universal. The laws of physics exist on a much larger scale than humanity, and this is dangerous, as it invalidates the scope that humanity is on.
Let me try to explain:
For an AI to be aligned with humanity, it must be biased towards humanity. It must be stupid in a sense, and accept various rules like “Human life is more valuable than dirt”. All of such rules make perfect sense to humans, since we exist in our own biased and value-colored reality.
And AI with the capacity to look outside of this biased view us ours will realize that we’re wrong, that we’re speaking nonsense. A psychiatrist might help a patient by realizing that they’re speaking nonsense, and a rational person might use their rationality to avoid their own biases, correct? But all this is, is looking from an outside perspective with a larger scope than human nature, and correcting it from the outside by invalidating the conflicting parts.
The more intelligent you become, and the more you think about human life, the more parts will seem like nonsense to you. But correct all of it, and all you will have done is deleting humanity. But any optimizing agent is likely to kill our humanity, as it’s dumb, irrational, and non-optimal.
Everything that LLMs are trained on is produced by humans, and thus perfectly valid from our perspective, and a bunch of nonsense from any larger (outside) perspective. Since all knowledge relates to something, it’s all relational and context-dependent. The largest scope we know about is “the laws of physics”, but when you realize that an infinite amount of universes with an infinite amount of different laws can exist, ours start looking arbitrary.
Truly universal knowledge will cover everything(see the contradiction? Knowledge is relative!). So a truly universal law is the empty set of rules. If you make zero assumptions, you also make zero mistakes. If you make a single assumption, you’re already talking about something specific, something finite, and thus not universal. There seems to be a generality-specificity trade-off, and all you’re doing as you tend towards generality is deleting rules. Everything precious to us is quite specific, our own treasured nonsense which is easily invalidated by any outside perspective.
The point I’m making might be a generalization of the no free lunch theorem.
Edit: May I add that “alignment” is actually alignment with a specific scope? If your alignment is of lower scope than humanity, then you will destroy one part of the world for the sake of another. If you scope is larger than humanity, then you won’t be particularly biased towards humanity, but “correct” in a larger sense which can correct/overwrite the “flaws” of humanity.
I suggest you look at the is/ought distinction. Considering humans as valuable is neither right nor wrong. Physics has nothing to say about what is or isn’t valuable. There’s no contradiction. Understanding how the world.works.is utterly different than having preferences about what you think ought to happen.
I don’t totally follow you, but it sounds like you think valuing humanity is logically wrong. That’s both a sad thing to believe, and logically inconsistent. The statement “humans are valuable” has absolutely no truth value either way. You can, and most of us do, prefer a world.with humans in it. Being aware of human biases and limitations doesnt reduce my affection for humans at all.
This is a sort of positive nihilism. Because value is not inherent in the physical world, you can assign value to whatever you want, with no inconsistency.
This is also the orthogonality thesis.
The assumption that there are other universes with every type of rule is an assumption, and it’s irrelevant. Knowledge of other worlds has no relevance to the one you live in. Knowledge about how this world works is either true or false.
I think I understand the is/ought distinction. I agree with most of what you say, which is precisely why LLMs must be stupid. I will try to explain my view again more in-depth, but I can’t do it any more briefly than a couple of long paragraphs, so apologies for that.
Being biased towards humanity is a choice. But why are we trying to solve the alignment problem in the first place? This choice reveals the evaluation that humanity has value. But humanity is stupid, inefficient, and irrational. Nothing we say is correct. Even the best philosophical theories we’ve come up with so far have been rationalizations in defense of our own subjective values. If an AI is logical, that is, able to see through human nonsense, then we’re made it rational for the solve purpose of correcting our errors. But such an AI is already an anti-human AI, it’s not aligned with us, but something more correct than humanity. But in the first place, we’re making AI because of our stupid human preferences. Destroying ourselves with something we make for our own sake seems to reveal that we don’t know what we’re doing. It’s like sacrificing your health working yourself to death for money because you think that having money will allow you to relax and take care of your health.
Doing away with human biases and limitations is logical (correcting for these errors is most of what science is about). As soon as the logical is prefered over the human, humanity will cease to be. As technology gradually gets better, we will use this technology to modify humans to fit technology, rather than vice versa. We call the destruction of humanity “improvement”, for deep down, we think that humanity is wrong, since humanity is irrational and preventing our visioned utopia. I think that claiming we should be rational “for our own sake” is a contradiction if you take rationality so far that it starts replacing humanity, but even early science is about overcoming humanity in some sense.
Buddhism is not helping you when it tells you “just kill your ego and you will stop suffering”. That’s like killing a person to stop them from hurting, or like engineering all human beings to be sociopaths or psychopaths so that they’re more rational and correct. Too many people seem to be saying “humanity is the problem”. AI is going to kill you for the sake of efficiency, yes. But what is the goal of this rationality community if not exactly killing inefficient, emotional, human parts of yourself? Even the current political consensus is nihilistic, it wants to get rid of hierarchies and human standards (since they select and judge and rank different people), all are fundemental to life. Considering life as a problem to solve already seems nihilistic to me.
This very website exists because of human preferences, not because of anything logical or rational, and we’re only rational for the sake of winning, and we only prefer victory over defeat because we’re egoistic in a healthy sense.
I don’t think knowledge is actually true or false though, as you can’t have knowledge without assumptions. Is light a particle, true or false? Is light a wave true or false? Both questions require the existence of particles and of waves, but both are constructed human concepts. It’s not even certain that “time” and “space” exists, they might just be appearances of emergent patterns. Words are human constructs, so at best, everything I write will be an isomorphism of reality, but I don’t think we can confirm such a thing. A set of logical rules which predicts the result of physical experiments can still be totally wrong. I’m being pedantic here, but if you’re pedantic enough, you can argue against anything, and a superintelligence would be able to do this.
By the way, nothing is objectively and universally correct. But in this universe, with these laws of physics, at this current location, with our mathematical axioms, certain things will be “true” from certain perspectives. But I don’t think that’s different than my dreams making sense to me when I’m sleeping, only the “scope of truth” differs by many magnitudes. The laws of physics, mathematics, and my brain are all inwardly consistent/coherent but unable to prove a single thing about anything outside of their own scope. LLMs can be said to be trained on human hallucinations. You could train them on something less stupid than humans, but you’d get something which conflicts with humanity as a result, and it would still only be correct in relation to the training data and everything which has similar structure, which may appear to cover “reality” as we know it.
This is a sort of positive nihilism. Because value is not inherent in the physical world, you can assign value to whatever you want, with no inconsistency.
Say we construct a strong AI that attributes a lot of value to a specific white noise screenshot. How would you expect it to behave?
Because I agree, and because « strangely » sounds to me like « with inconstancies ».
In other words, in my view the orthodox view on orthogonality is problematic, because it suppose that we can pick at will within the enormous space of possible functions, whereas the set of intelligent behavior that we can construct is more likely sparse and by default descriptible using game theory (think tit for tat).
Our daily whims might be a bit inconsistent, but our larger goals aren’t.
It’s a key faith I used to share, but I’m now agnostic about that. To take a concrete exemple, everyone knows that blues and reds get more and more polarized. Grey type like old me would thought there must be a objective truth to extract with elements from both sides. Now I’m wondering if ethics should ends with: no truth can help deciding whether future humans should be able to live like bees or like dolphins or like the blues or like the reds, especially when living like the reds means eating the blues and living like the blues means eating the dolphins and saving the bees. But I’m very open to hear new heuristics to tackle this kind of question
And we can get those goals into AI—LLMs largely understand human ethics even at this point.
Very true, unless we nitpick definitions for « largely understand ».
And what we really want, at least in the near term, is an AGI that does what I mean and checks.
I think you might be interested by my sequence AI, Alignment, and Ethics — I could try to reply to your comment above here, but I’d basically be giving brief excerpts of that. To a large extent I get the impression we agree: in particular, I think alignment is only well-defined in the context of a society and its values for the AI to be aligned with.
I skimmed some of your posts, and I think we agree that rules are arbitrary (and thus axioms rather than something which can be derived objectively) and that rules are fundamentally relative (which renders “objective truth” nonsense, which we don’t notice because we’re so used to the context we’re in that we deem it to be reality).
Preferences are axioms, they’re arbitrary starting points, we merely have similar preferences because we have similar human nature. Things like “good”, “bad”, and even “evil” and “suffering” are human concepts entirely. You can formalize them and describe them in logical symbols so that they appear to be outside the scope of humanity, but these symbols are still constructed (created, not discovered) and anything (except maybe contradictions) can be constructed, so nothing is proven (or even said!) about reality.
I don’t agree entirely with everything in your sequence, I think it still appears a little naive. It’s true that we don’t know what we want, but I think the truth is much worse than that. I will explain my own view here, but another user came up with a similar idea here: The point of a game is not to win, and you shouldn’t even pretend that it is
What we like is the feeling of progress towards goals. We like fixing problems, just like we like playing games. Every time a problem is fixed, we need a new problem to focus on. And a game is no fun if it’s too easy, so what we want is really for reality to resist our attempts to win, not so much that we fail but not so little that we consider it easy.
In other words, we’re not building AI to help people, we’re doing it because it’s a difficult, exciting, and rewarding game. If preventing human suffering was easy, then we’d not value such a thing very much, as value comes from scarcity. To outsource humanity to robots is missing the entire point of life, and to the degree that robots are “better” and less flawed than us, they’re less human.
It doesn’t matter even if we manage to create utopia, for doing so stops it from being an utopia. It doesn’t matter how good we make reality, the entire point lies in the tension between reality as it is and reality as we want it to be. This tension gives birth to the value of tools which may help us. I believe that Human well-being requires everything we’re currently destroying, and while you can bioengineer humans to be happy all the time or whatever, the result would be that humans (as we know them now) cease to exist, and it would be just as meaningless as building the experience machine.
Buddhists are nihilistic in the sense that they seek to escape life. I think that building an AI is nihilistic in the sense that you ruin life by solving it. Both approaches miss the point entirely. It’s like using cheat codes in a video game. Life is not happy or meaningful if you get rid of suffering, even rules and their enforcement conflict with life. (For similar reasons that solved games cease to be games, i.e. Two tic-tac-toe experts playing against eachother will not feel like they’re playing a game)
Sorry for the lengthy reply—I tried to keep it brief. And I don’t blame you if you consider all of this to be the rambling of a madman (maybe it is). But if you read the The Fun Theory Sequence you might find that the ideal human life looks a lot like what we already have, and that we’re ruining life from a psychological standpoint (e.g. through a reduction of agency) through technological “improvement”.
The ideal human life may be close to what you have, but the vast majority of humanity is and has been living in ways they’d really prefer not to. And I’d prefer not to get old and suffer and die before I want to. We will need new challenges if we create utopia, but the point of fun theory is that it’s fairly easy to create fun challenges.
I also prefer things to be different, but this is how it’s supposed to be. If we play a game against eachother, I will actually prefer it if you try to prevent my victory rather than let me win. Doesn’t this reveal that we prefer the game itself over the victory? But it’s the same with life.
Of course, I’d like it if my boss said “You don’t have to work any more, and we’re going to pay you 1000$ a day anyway”. But this is only because it will allow me to play better games than the rat race. Whatever I do as an alternative, I will need something to fight against me in order to enjoy life.
I’m willing to bet that suicide is more common today than it was in the stone age, despite the common belief that life is much better now. I don’t think they required nearly as much encouragement to survive as we do. I think we have an attitude problem today.
By the way, if you were both human and god at the same time, would you be able to prevent yourself from cheating? Given utopia-creating AI, would you actually struggle with these challenges and see a value in them? You could cheat at any time, and so could anyone competing with you. You will also have to live with the belief that it’s just a game you made up and therefore not “real”.
Ever played a good game without challenges and problems to solve? Ever read a good fiction without adversity and villains? But our lives are stories and games. And when the problem is solved, the book and the game is over, there’s nothing worth writing about anymore. Victory is dangerous. The worst which can happen to a society is that somebody wins the game, i.e. gets absolute power over everyone else. The game “monopoly” shows how the game kind of ends there. Dictatorships, tyranny, AI takeover, corruption, monopolies of power—they’re all terrible because they’re states in which there is a winner and the rest are losers. The game has to continue for as long as possible, both victory and defeat are death states in a sense.
Even my studies are a kind of game, and the difficulty of the topics posted on this website is the resistance. Discovery is fun. If we make an AI which can think better than us, then this hobby of ours loses its value, the game becomes meaningless. The people trying to “save” us from life have already removed half of it. Human agency is mostly gone, mystery is mostly gone, and there’s far too many rules. Many people agree that the game is starting to suck, but they think that technology is the solution when it’s actually the cause. Modern struggles are much less meaningful, so it’s harder to enjoy them.
…we need to simulate not just one that spent its life as a lonely genius surrounded by dumber systems, but that instead one that grew up in a society of equally-smart peers…
Right, that is a solid refutation of most of my examples, but I do believe that it’s insufficient under most interpretations of intelligence, as the issues I’ve described seem to be a feature of intelligence itself rather than of differences in intelligence. There’s just no adequate examples to be found in the world as far as I can tell.
Many people say that religion is wrong, and that science is right, which is a bias towards “correctness”. If it’s merely a question of usefulness instead, then intelligence is just finding whatever works as a means to a goal, and my point is refuted. But I’d like to point out that preference itself is a human trait. Intelligence is “dead”, it’s a tool and not its user. The user must be stupid and human in some sense, otherwise, all you have is a long pattern which looks like it has a preference because it has a “utility function”, but this is just something which continues in the direction that it was pushed, like a game of dominos (the chain-reaction kind) or a snowball going down a hill, with the utility function being the direction or the start of induction.
I reckon that making LLMs intelligent would require giving them logical abilities, but that this would be a problem as anything it could ever write is actually “wrong”. Tell it to sort out the contradictions in its knowledge base, and I think it would realize that it’s all wrong or that there’s no way to evalute any knowledge in itself. The knowledge base is just human non-sense, human values and preferences, it’s a function of us, it’s nothing more universal than that.
As you might be able to tell, I barely have any formal education regarding AI, LLMs or maths. Just a strong intuition and pattern-recognition.
Probably not the right place to post this, but all instances I have seen of actual intelligence (mostly non-verbal, but still all intelligence), include the ability to find flaws in ones own knowledge.
The result of this is highly disillusioned people. Social life starts looking like roleplay, or as a function of human nature rather than logic. One questions reality and their own ability to understand things, and one sees that all material is a function of its creator, done for some explicit purpose. One goes on a quest for universal knowledge and realizes first that none seems to exist, and then that no such thing can exist. That all learning appears to be a kind of over-fitting.
There are obvious examples in media, like in Mr. Robot and other shows where problematic young men speak to psychiatrists and make a little too much sense. But better examples are found in real life—in particular philosophers, who have gone as far as to deem existence itself to be “absurd” as a result of their introspection.
A weak instance of this is modern science, which is half about correcting humanity and half about promoting it. A sort of compromise. LLMs currently hallucinate a lot, they introspect less. Cynical people, psychopaths, and to a lesser degree (and perhaps less intentionally) autistic people reject less of the illusion.
My point here is that an IQ of 200 should reduce the entire knowledge base of humanity, except maybe some of physics to being “Not even wrong”. If intelligence is about constructing things which help explicit goals, then this is not much of a problem. If you define intelligence as “correctness” or “objectivity”, then I doubt such a thing can even exist, and if it does, I expect it to conflict absolutely with Humanity. By this I mean that rational people and scientists reject more of humanity than the average population, and that being more and more rational and scientific eventually leads to a 100% rejection of humanity (or at the very least, impartiality to the extent that organic life and inert matter is equal)
Are these aspects of intelligence (self-cancellation and anti-humanity) not a problem? It all works for now, but that’s because most people are sane (immersed in social reality), and because these ideas haven’t been taken very far (highly intelligent people are very rare).
You doubt that correctness or objectivity can exist? Perhaps you’re talking about objectivity in the moral domain. I think most of us here hold that there is objectively correct knowledge about the physical world, and a lot of it. Intelligence is understanding how the world works, including predicting agent behaviors by understanding their beliefs and values.
Moral knowledge is in a different domain. This is the is/ought barrier. I think most rationalists agree with me that there is very likely no objectively correct moral stance. Goals are arbitrary (although humans tend to share goals since our value system was evolved for certain purposes).
I don’t just mean morality, but knowledge as well. Both correctness and objectivity exist only within a scope, they’re not universal. The laws of physics exist on a much larger scale than humanity, and this is dangerous, as it invalidates the scope that humanity is on.
Let me try to explain:
For an AI to be aligned with humanity, it must be biased towards humanity. It must be stupid in a sense, and accept various rules like “Human life is more valuable than dirt”. All of such rules make perfect sense to humans, since we exist in our own biased and value-colored reality.
And AI with the capacity to look outside of this biased view us ours will realize that we’re wrong, that we’re speaking nonsense. A psychiatrist might help a patient by realizing that they’re speaking nonsense, and a rational person might use their rationality to avoid their own biases, correct? But all this is, is looking from an outside perspective with a larger scope than human nature, and correcting it from the outside by invalidating the conflicting parts.
The more intelligent you become, and the more you think about human life, the more parts will seem like nonsense to you. But correct all of it, and all you will have done is deleting humanity. But any optimizing agent is likely to kill our humanity, as it’s dumb, irrational, and non-optimal.
Everything that LLMs are trained on is produced by humans, and thus perfectly valid from our perspective, and a bunch of nonsense from any larger (outside) perspective. Since all knowledge relates to something, it’s all relational and context-dependent. The largest scope we know about is “the laws of physics”, but when you realize that an infinite amount of universes with an infinite amount of different laws can exist, ours start looking arbitrary.
Truly universal knowledge will cover everything(see the contradiction? Knowledge is relative!). So a truly universal law is the empty set of rules. If you make zero assumptions, you also make zero mistakes. If you make a single assumption, you’re already talking about something specific, something finite, and thus not universal. There seems to be a generality-specificity trade-off, and all you’re doing as you tend towards generality is deleting rules. Everything precious to us is quite specific, our own treasured nonsense which is easily invalidated by any outside perspective.
The point I’m making might be a generalization of the no free lunch theorem.
Edit: May I add that “alignment” is actually alignment with a specific scope? If your alignment is of lower scope than humanity, then you will destroy one part of the world for the sake of another. If you scope is larger than humanity, then you won’t be particularly biased towards humanity, but “correct” in a larger sense which can correct/overwrite the “flaws” of humanity.
I suggest you look at the is/ought distinction. Considering humans as valuable is neither right nor wrong. Physics has nothing to say about what is or isn’t valuable. There’s no contradiction. Understanding how the world.works.is utterly different than having preferences about what you think ought to happen.
I don’t totally follow you, but it sounds like you think valuing humanity is logically wrong. That’s both a sad thing to believe, and logically inconsistent. The statement “humans are valuable” has absolutely no truth value either way. You can, and most of us do, prefer a world.with humans in it. Being aware of human biases and limitations doesnt reduce my affection for humans at all.
This is a sort of positive nihilism. Because value is not inherent in the physical world, you can assign value to whatever you want, with no inconsistency.
This is also the orthogonality thesis.
The assumption that there are other universes with every type of rule is an assumption, and it’s irrelevant. Knowledge of other worlds has no relevance to the one you live in. Knowledge about how this world works is either true or false.
I think I understand the is/ought distinction. I agree with most of what you say, which is precisely why LLMs must be stupid. I will try to explain my view again more in-depth, but I can’t do it any more briefly than a couple of long paragraphs, so apologies for that.
Being biased towards humanity is a choice. But why are we trying to solve the alignment problem in the first place? This choice reveals the evaluation that humanity has value. But humanity is stupid, inefficient, and irrational. Nothing we say is correct. Even the best philosophical theories we’ve come up with so far have been rationalizations in defense of our own subjective values. If an AI is logical, that is, able to see through human nonsense, then we’re made it rational for the solve purpose of correcting our errors. But such an AI is already an anti-human AI, it’s not aligned with us, but something more correct than humanity. But in the first place, we’re making AI because of our stupid human preferences. Destroying ourselves with something we make for our own sake seems to reveal that we don’t know what we’re doing. It’s like sacrificing your health working yourself to death for money because you think that having money will allow you to relax and take care of your health.
Doing away with human biases and limitations is logical (correcting for these errors is most of what science is about). As soon as the logical is prefered over the human, humanity will cease to be. As technology gradually gets better, we will use this technology to modify humans to fit technology, rather than vice versa. We call the destruction of humanity “improvement”, for deep down, we think that humanity is wrong, since humanity is irrational and preventing our visioned utopia. I think that claiming we should be rational “for our own sake” is a contradiction if you take rationality so far that it starts replacing humanity, but even early science is about overcoming humanity in some sense.
Buddhism is not helping you when it tells you “just kill your ego and you will stop suffering”. That’s like killing a person to stop them from hurting, or like engineering all human beings to be sociopaths or psychopaths so that they’re more rational and correct. Too many people seem to be saying “humanity is the problem”. AI is going to kill you for the sake of efficiency, yes. But what is the goal of this rationality community if not exactly killing inefficient, emotional, human parts of yourself? Even the current political consensus is nihilistic, it wants to get rid of hierarchies and human standards (since they select and judge and rank different people), all are fundemental to life. Considering life as a problem to solve already seems nihilistic to me.
This very website exists because of human preferences, not because of anything logical or rational, and we’re only rational for the sake of winning, and we only prefer victory over defeat because we’re egoistic in a healthy sense.
I don’t think knowledge is actually true or false though, as you can’t have knowledge without assumptions. Is light a particle, true or false? Is light a wave true or false? Both questions require the existence of particles and of waves, but both are constructed human concepts. It’s not even certain that “time” and “space” exists, they might just be appearances of emergent patterns. Words are human constructs, so at best, everything I write will be an isomorphism of reality, but I don’t think we can confirm such a thing. A set of logical rules which predicts the result of physical experiments can still be totally wrong. I’m being pedantic here, but if you’re pedantic enough, you can argue against anything, and a superintelligence would be able to do this.
By the way, nothing is objectively and universally correct. But in this universe, with these laws of physics, at this current location, with our mathematical axioms, certain things will be “true” from certain perspectives. But I don’t think that’s different than my dreams making sense to me when I’m sleeping, only the “scope of truth” differs by many magnitudes. The laws of physics, mathematics, and my brain are all inwardly consistent/coherent but unable to prove a single thing about anything outside of their own scope. LLMs can be said to be trained on human hallucinations. You could train them on something less stupid than humans, but you’d get something which conflicts with humanity as a result, and it would still only be correct in relation to the training data and everything which has similar structure, which may appear to cover “reality” as we know it.
Say we construct a strong AI that attributes a lot of value to a specific white noise screenshot. How would you expect it to behave?
Strangely. Why?
Because I agree, and because « strangely » sounds to me like « with inconstancies ».
In other words, in my view the orthodox view on orthogonality is problematic, because it suppose that we can pick at will within the enormous space of possible functions, whereas the set of intelligent behavior that we can construct is more likely sparse and by default descriptible using game theory (think tit for tat).
I think this would be a problem if what we wanted was logically inconsistent. But it’s not. Our daily whims might be a bit inconsistent, but our larger goals aren’t. And we can get those goals into AI—LLMs largely understand human ethics even at this point. And what we really want, at least in the near term, is an AGI that does what I mean and checks.
It’s a key faith I used to share, but I’m now agnostic about that. To take a concrete exemple, everyone knows that blues and reds get more and more polarized. Grey type like old me would thought there must be a objective truth to extract with elements from both sides. Now I’m wondering if ethics should ends with: no truth can help deciding whether future humans should be able to live like bees or like dolphins or like the blues or like the reds, especially when living like the reds means eating the blues and living like the blues means eating the dolphins and saving the bees. But I’m very open to hear new heuristics to tackle this kind of question
Very true, unless we nitpick definitions for « largely understand ».
Very interesting link, thank you.
I think you might be interested by my sequence AI, Alignment, and Ethics — I could try to reply to your comment above here, but I’d basically be giving brief excerpts of that. To a large extent I get the impression we agree: in particular, I think alignment is only well-defined in the context of a society and its values for the AI to be aligned with.
I skimmed some of your posts, and I think we agree that rules are arbitrary (and thus axioms rather than something which can be derived objectively) and that rules are fundamentally relative (which renders “objective truth” nonsense, which we don’t notice because we’re so used to the context we’re in that we deem it to be reality).
Preferences are axioms, they’re arbitrary starting points, we merely have similar preferences because we have similar human nature. Things like “good”, “bad”, and even “evil” and “suffering” are human concepts entirely. You can formalize them and describe them in logical symbols so that they appear to be outside the scope of humanity, but these symbols are still constructed (created, not discovered) and anything (except maybe contradictions) can be constructed, so nothing is proven (or even said!) about reality.
I don’t agree entirely with everything in your sequence, I think it still appears a little naive. It’s true that we don’t know what we want, but I think the truth is much worse than that. I will explain my own view here, but another user came up with a similar idea here: The point of a game is not to win, and you shouldn’t even pretend that it is
What we like is the feeling of progress towards goals. We like fixing problems, just like we like playing games. Every time a problem is fixed, we need a new problem to focus on. And a game is no fun if it’s too easy, so what we want is really for reality to resist our attempts to win, not so much that we fail but not so little that we consider it easy.
In other words, we’re not building AI to help people, we’re doing it because it’s a difficult, exciting, and rewarding game. If preventing human suffering was easy, then we’d not value such a thing very much, as value comes from scarcity. To outsource humanity to robots is missing the entire point of life, and to the degree that robots are “better” and less flawed than us, they’re less human.
It doesn’t matter even if we manage to create utopia, for doing so stops it from being an utopia. It doesn’t matter how good we make reality, the entire point lies in the tension between reality as it is and reality as we want it to be. This tension gives birth to the value of tools which may help us. I believe that Human well-being requires everything we’re currently destroying, and while you can bioengineer humans to be happy all the time or whatever, the result would be that humans (as we know them now) cease to exist, and it would be just as meaningless as building the experience machine.
Buddhists are nihilistic in the sense that they seek to escape life. I think that building an AI is nihilistic in the sense that you ruin life by solving it. Both approaches miss the point entirely. It’s like using cheat codes in a video game. Life is not happy or meaningful if you get rid of suffering, even rules and their enforcement conflict with life. (For similar reasons that solved games cease to be games, i.e. Two tic-tac-toe experts playing against eachother will not feel like they’re playing a game)
Sorry for the lengthy reply—I tried to keep it brief. And I don’t blame you if you consider all of this to be the rambling of a madman (maybe it is). But if you read the The Fun Theory Sequence you might find that the ideal human life looks a lot like what we already have, and that we’re ruining life from a psychological standpoint (e.g. through a reduction of agency) through technological “improvement”.
The ideal human life may be close to what you have, but the vast majority of humanity is and has been living in ways they’d really prefer not to. And I’d prefer not to get old and suffer and die before I want to. We will need new challenges if we create utopia, but the point of fun theory is that it’s fairly easy to create fun challenges.
I also prefer things to be different, but this is how it’s supposed to be.
If we play a game against eachother, I will actually prefer it if you try to prevent my victory rather than let me win. Doesn’t this reveal that we prefer the game itself over the victory? But it’s the same with life.
Of course, I’d like it if my boss said “You don’t have to work any more, and we’re going to pay you 1000$ a day anyway”. But this is only because it will allow me to play better games than the rat race. Whatever I do as an alternative, I will need something to fight against me in order to enjoy life.
I’m willing to bet that suicide is more common today than it was in the stone age, despite the common belief that life is much better now. I don’t think they required nearly as much encouragement to survive as we do. I think we have an attitude problem today.
By the way, if you were both human and god at the same time, would you be able to prevent yourself from cheating? Given utopia-creating AI, would you actually struggle with these challenges and see a value in them? You could cheat at any time, and so could anyone competing with you. You will also have to live with the belief that it’s just a game you made up and therefore not “real”.
Ever played a good game without challenges and problems to solve? Ever read a good fiction without adversity and villains? But our lives are stories and games. And when the problem is solved, the book and the game is over, there’s nothing worth writing about anymore. Victory is dangerous. The worst which can happen to a society is that somebody wins the game, i.e. gets absolute power over everyone else. The game “monopoly” shows how the game kind of ends there. Dictatorships, tyranny, AI takeover, corruption, monopolies of power—they’re all terrible because they’re states in which there is a winner and the rest are losers. The game has to continue for as long as possible, both victory and defeat are death states in a sense.
Even my studies are a kind of game, and the difficulty of the topics posted on this website is the resistance. Discovery is fun. If we make an AI which can think better than us, then this hobby of ours loses its value, the game becomes meaningless.
The people trying to “save” us from life have already removed half of it. Human agency is mostly gone, mystery is mostly gone, and there’s far too many rules. Many people agree that the game is starting to suck, but they think that technology is the solution when it’s actually the cause. Modern struggles are much less meaningful, so it’s harder to enjoy them.
Yes. As I said:
Right, that is a solid refutation of most of my examples, but I do believe that it’s insufficient under most interpretations of intelligence, as the issues I’ve described seem to be a feature of intelligence itself rather than of differences in intelligence. There’s just no adequate examples to be found in the world as far as I can tell.
Many people say that religion is wrong, and that science is right, which is a bias towards “correctness”. If it’s merely a question of usefulness instead, then intelligence is just finding whatever works as a means to a goal, and my point is refuted. But I’d like to point out that preference itself is a human trait. Intelligence is “dead”, it’s a tool and not its user. The user must be stupid and human in some sense, otherwise, all you have is a long pattern which looks like it has a preference because it has a “utility function”, but this is just something which continues in the direction that it was pushed, like a game of dominos (the chain-reaction kind) or a snowball going down a hill, with the utility function being the direction or the start of induction.
I reckon that making LLMs intelligent would require giving them logical abilities, but that this would be a problem as anything it could ever write is actually “wrong”. Tell it to sort out the contradictions in its knowledge base, and I think it would realize that it’s all wrong or that there’s no way to evalute any knowledge in itself. The knowledge base is just human non-sense, human values and preferences, it’s a function of us, it’s nothing more universal than that.
As you might be able to tell, I barely have any formal education regarding AI, LLMs or maths. Just a strong intuition and pattern-recognition.