I agree that personal values (no need to mystify) are important
The concepts used should not be viewed as mystical, but as straightforward physical objects. I don’t think personal values is a valid simplification. Or rather, I don’t think there is a valid simplification, hence why I use the unsimplified form. Preferably, egregore or hyperbeing, or shadow, or something, should just become an accepted term, like dog, or plane. If you practice “seeing” them, they should exist in a completely objective and observable sense. My version of reality isn’t like a monk doing meditation to sense the arcane energies of higher beings flowing through the zeitgeist. It’s more, hey look, a super-macroscopic aggregator just phase shifted. It’s like seeing water turn to ice, not… angels on the head of a pin?
I agree that I’m having trouble explaining myself. I blame the english language.
action is equally important.
I hold that the most important action a person is likely to make in his life is to check a box on a survey form. I think people should get really good at checking the right box. Really really good in fact. This is a super critical skill that people do not develop enough. It’s amazing how easily success flows in an environment where everyone checks the right boxes and equally how futile any course of action becomes when the wrong boxes are checked.
How can you control something vastly more intelligent than yourself
Note: I think trying to control something vastly more intelligent than yourself is a [very bad idea], and we should [not do that].
What does this mean?
In practice, the primary recommendation here is simply and only, to stop using the term “friendly AI” and instead use a better term, the best I can come up with is “likable AI”. In theory, the two terms are the same. I’m not really calling for that deep a change in motte space. In practice, I find that “friendly AI” comes with extremely dangerous baggage. This also shifts some focus from the concept of “who would you like to live with” towards “who would you like to live as”.
I also want an open source human centric project and am opposed to closed source government run AI projects. I’m generally hostile to “AI safety” because I expect the actual policy result to be [AI regulation] followed by government aligned AI.
Alignment, as I use it, isn’t becoming good.
I treat evil aligned AI as more of a thing than people who disagree with me do. I don’t think that alignment is a bad thing per se, but I want people to get much better at recognizing good from evil, which is itself strongly related with being good at pure rationality, and less related to being good at cute mathematical tricks, which I strongly suggest will be obsolete in the somewhat near future anyway. In some sense, I’m saying, given the current environment, we need more Descarte and less Newton, on the margin.
we should just build “AI that we like”
I’m not saying “Let’s just do something random”. At the risk of being mystical again, I’m going to point towards the Screwtape Letters. It’s that sort of concept. When presented with a choice between heaven and hell, choose heaven. I think this is something that people can get good at, but is unfortunately, a skill adjacent to having correct political beliefs, which is a skill that very powerful entities are very opposed to existing, because there’s very little overlap between correct political beliefs, and political beliefs that maintain current power structures.
In sum, I imagine the critical future step to be more like “check this box for heaven”, “check this box for hell” with super awesome propaganda explaining how hell is the only ethical choice, and a figure of authority solemnly commanding you to “pick hell” and less, “we need a cute mathematical gimmick or we’re all going to die”.
I also hold that, humans are image recognition programs. Stop trying to outplay AlphaGo at chess and “Look” “up”.
“I hold that the most important action a person is likely to make in his life is to check a box on a survey form.”
If only life (or, more specifically, our era, or even more specifically, AI alignment) was that simple.
Yes, that’s the starting point, and without it you can do nothing good. And yes, the struggle between the fragility of being altruistic and the less-fragility of being machiavellic has never been more important.
But unfortunately, it’s way more complicated than that. Way more complicated than some clever mathematical trick too. In fact, it’s the most daunting scientific task ever, which might not even be possible. Mind you that Fermat’s last theorem took 400 years to prove, and this is more than 400 times more complicated.
It’s simple: how to control something a) more intelligent than ourselves, b) that can re-write its own code and create sub-routines therefore bypassing our control mechanisms.
You still haven’t answered this.
You say that we can’t control something more intelligent than ourselves. So where does that leave us? Just create the first AGI, tell it to “be good” and just hope that it won’t be a sophist? That sounds like a terrible plan, because our experience with computers tells us that they are the biggest sophists. Not because they want to! Simply because effectively telling them how to “do what I mean” is way harder than telling another human. Any programmer would agree a thousand times.
Maybe you anthropomorphize AGI too much. Maybe you think that, because it will be human-level, it will also be human like. Therefore it will just “get” us, we just need to make sure that the first words it hears is “be good” and never “be evil”. If so, then you couldn’t be more mistaken. Nothing tells us that the first AGI (in fact I dislike the term, I prefer transformative AI) will be human-like. In all probability (considering 1) the vast space of possible “mind types”, and 2) how an advanced computer will likely function much more similarly to older computers than to humans), it won’t.
So we necessarily need to be in control. Or, at least, build it in a way where it will be provably good.
In short, orthogonality thesis. Intelligence isn’t correlated to goals or values.
You keep talking about how important it is to make sure that the first AGI isn’t told to “be evil” or in the possession of evil people. That’s obviously also extremely important, but unfortunately that’s the easiest part. Not necessarily easy to implement, but easy to figure out how. Whereas with alignment we still haven’t got a clue.
“I also hold that, humans are image recognition programs. Stop trying to outplay AlphaGo at chess and “Look” “up”.”
Not sure I get what you mean, but if it is “we’re pretty much like computers”, then you’re obviously wrong. The clearest proof is, like I mentioned, how every programmer on Earth will tell you how hard it is to get the computer to “do what I mean”, whereas even the dumbest human would probably understand it a thousand times better.
And PS: I also don’t think we’ll solve alignment in time. At all. The only solution is to make sure no one builds transformative AI before we solve alignment, for instance through regulation or a narrow AI nanny.
We’re moving towards factual disputes that aren’t easy to resolve in logical space, and I fear any answers I give are mostly repeating previous statements. In general I hold that you’re veering toward a maximally wrong position with completely disastrous results if implemented. With that said:
But unfortunately, it’s way more complicated than that.
I dispute this.
how to control something a) more intelligent than ourselves, b) that can re-write its own code and create sub-routines therefore bypassing our control mechanisms.
Place an image of the status quo in the “good things” folder. Which you should absolutely not do because it’s a terrible idea.
how an advanced computer will likely function much more similarly to older computers than to humans
This seems ridiculous to me as a concept. No, advanced AI will not function similarly to ancient long obsolete technology. I see way too much present bias in this stance, and worse, a bias towards things in the future being like things in the past, despite the past being long over since ages ago, this is like running space ships on slide rules.
This also implies that every trick you manage to come up with, as to how to get a C compiler adjacent superintelligence to act more human, is not going to work, because the other party isn’t C compiler adjacent. Until we have a much better understanding of how to code efficiently, all such efforts are at best wasted, and likely counterproductive. To reiterate, stop trying to explain the motion of planets and build a telescope.
Note that, I do not desire that AI psychology be human like. That sounds like a bad idea.
So we necessarily need to be in control.
Who is this “we”? How will you go from a position of “we” in control, to “we” not in control?
My expectation is that the first step is easy, and the second, impossible.
Not sure I get what you mean
Humans have certain powers and abilities as per human nature. Math isn’t one of them. I state that trying to solve our problems with math is already a mistake, because we suck at math. What humans are good at is image recognition. We should solve our problems by “looking” at them.
The art of “looking” at problems isn’t easy to explain, unfortunately. Conversely, if I could explain it, I could also build AGI, or another human, right on the spot. It’s that sort of question.
To put it another way, using math to determine whether AI is good or not, is looking for the keys under the lamp. Wrong tool, wrong location, inevitable failure.
The only solution is to make sure no one builds transformative AI before we solve alignment, for instance through regulation
I’m fairly certain this produces an extremely bad outcome.
It’s the old, “The only thing necessary for the triumph of evil is for good men to do nothing.”
Evil will not sit around and wait for you to solve rubik’s cubes. Furthermore, implementation of AI regulation is much easier than its removal. I suspect that once you ban good men from building AI, it’s over, we’re done, that’s it.
“This also implies that every trick you manage to come up with, as to how to get a C compiler adjacent superintelligence to act more human, is not going to work, because the other party isn’t C compiler adjacent. Until we have a much better understanding of how to code efficiently, all such efforts are at best wasted, and likely counterproductive.”
Not necessarily. Even the first steps on older science were important to the science of today. Science happens through building blocks of paradigms. Plus, there are mathematical and logical notions which are simply fundamental and worth investigating, like decision theory.
“Humans have certain powers and abilities as per human nature. Math isn’t one of them. I state that trying to solve our problems with math is already a mistake, because we suck at math. What humans are good at is image recognition. We should solve our problems by “looking” at them.”
Ok, sorry, but here you just fall into plain absurdity. Of course it would be great just to look at things and “get” them! Unfortunately, the language of computers, and of most science, is math. Should we perhaps drop all math in physics and just start “looking” instead? Please don’t actually say yes...
(To clarify, I’m not devaluing the value of “looking”, aka philosophy/rationality. Even in this specific problem of AI alignment. But to completely discard math is just absurd. Because, unfortunately, it’s the only road towards certain problems (needless to say there would be no computers without math, for instance)).
I’m actually sympathetic towards the view that mathematically solving alignment might be simply impossible. I.e. it might be unsolvable. Such is the opinion of Roman Yalmpolsky, an AI alignment researcher, who has written very good papers on its defense. However, I don’t think we lose much by having a couple hundred people working on it. We would only implement Friendly AI if we could mathematically prove it, so it’s not like we’d just go with a half-baked idea and create hell on Earth instead of “just” a paperclipper. And it’s not like Friendly AI is the only proposal in alignment either. People like Stuart Russell have a way more conservative approach, as in, “hey, maybe just don’t build advanced AI as utility maximizers since that will invariably produce chaos?”.
Some of this concepts might even be dangerous, or worse than doing nothing. Anyway, they are still in research and nothing is proven. To not try to do anything is just not acceptable, because I don’t think that the FIRST transformative/dangerous AI will be super virtuous. Maybe a very advanced AI would necessarily/logically be super virtuous. But we will build something dangerous before we get to that. Say, an AI that is only anything special in engineering, or even a specific type of engineering like nanotechnology. Such AI, which might even not be properly AGI, might already be extremely dangerous, for the obvious reason of having great power (from great intelligence in some key area(s)) without great values (orthogonality thesis).
“Furthermore, implementation of AI regulation is much easier than its removal. I suspect that once you ban good men from building AI, it’s over, we’re done, that’s it.”
Of course it wouldn’t be just any kind of regulation. Say, if you restrict access/production to supercomputers globally, you effectively slow AI development. Supercomputers are possible to control, laptops obviously aren’t.
Or, like I also said, a narrow AI nanny.
Are these and other similar measures dangerous? Certainly. But imo doing nothing is even way more.
I don’t even claim these are good ideas. We actually need more intelligent people to actually come up with actual good ideas in regulation. But I’m still pretty certain that regulation is the only way. Of course it can’t simply be “ok, so now governments are gonna ban AI research but they’re gonna keep doing it in their secret agencies anyway”. Narrow AI nanny is something that maybe could actually work, if far-fetched.
AI is advancing far quicker than our understanding of it, specially with black boxes like neural networks, and I find it impossible that things will stay on track when we build something that can actually have a vast real world impact.
If we could perhaps convince governments that AI is actually dangerous, and that humanity NECESSARILY has to drop all barriers and become way more cooperative if we want to have a shot of succeeding at not killing everyone or worse… Then it could be doable. Is this ridiculously hard? Yes, but still our only chance.
The concepts used should not be viewed as mystical, but as straightforward physical objects. I don’t think personal values is a valid simplification. Or rather, I don’t think there is a valid simplification, hence why I use the unsimplified form. Preferably, egregore or hyperbeing, or shadow, or something, should just become an accepted term, like dog, or plane. If you practice “seeing” them, they should exist in a completely objective and observable sense. My version of reality isn’t like a monk doing meditation to sense the arcane energies of higher beings flowing through the zeitgeist. It’s more, hey look, a super-macroscopic aggregator just phase shifted. It’s like seeing water turn to ice, not… angels on the head of a pin?
I agree that I’m having trouble explaining myself. I blame the english language.
I hold that the most important action a person is likely to make in his life is to check a box on a survey form. I think people should get really good at checking the right box. Really really good in fact. This is a super critical skill that people do not develop enough. It’s amazing how easily success flows in an environment where everyone checks the right boxes and equally how futile any course of action becomes when the wrong boxes are checked.
Note: I think trying to control something vastly more intelligent than yourself is a [very bad idea], and we should [not do that].
In practice, the primary recommendation here is simply and only, to stop using the term “friendly AI” and instead use a better term, the best I can come up with is “likable AI”. In theory, the two terms are the same. I’m not really calling for that deep a change in motte space. In practice, I find that “friendly AI” comes with extremely dangerous baggage. This also shifts some focus from the concept of “who would you like to live with” towards “who would you like to live as”.
I also want an open source human centric project and am opposed to closed source government run AI projects. I’m generally hostile to “AI safety” because I expect the actual policy result to be [AI regulation] followed by government aligned AI.
I treat evil aligned AI as more of a thing than people who disagree with me do. I don’t think that alignment is a bad thing per se, but I want people to get much better at recognizing good from evil, which is itself strongly related with being good at pure rationality, and less related to being good at cute mathematical tricks, which I strongly suggest will be obsolete in the somewhat near future anyway. In some sense, I’m saying, given the current environment, we need more Descarte and less Newton, on the margin.
I’m not saying “Let’s just do something random”. At the risk of being mystical again, I’m going to point towards the Screwtape Letters. It’s that sort of concept. When presented with a choice between heaven and hell, choose heaven. I think this is something that people can get good at, but is unfortunately, a skill adjacent to having correct political beliefs, which is a skill that very powerful entities are very opposed to existing, because there’s very little overlap between correct political beliefs, and political beliefs that maintain current power structures.
In sum, I imagine the critical future step to be more like “check this box for heaven”, “check this box for hell” with super awesome propaganda explaining how hell is the only ethical choice, and a figure of authority solemnly commanding you to “pick hell” and less, “we need a cute mathematical gimmick or we’re all going to die”.
I also hold that, humans are image recognition programs. Stop trying to outplay AlphaGo at chess and “Look” “up”.
“I hold that the most important action a person is likely to make in his life is to check a box on a survey form.”
If only life (or, more specifically, our era, or even more specifically, AI alignment) was that simple.
Yes, that’s the starting point, and without it you can do nothing good. And yes, the struggle between the fragility of being altruistic and the less-fragility of being machiavellic has never been more important.
But unfortunately, it’s way more complicated than that. Way more complicated than some clever mathematical trick too. In fact, it’s the most daunting scientific task ever, which might not even be possible. Mind you that Fermat’s last theorem took 400 years to prove, and this is more than 400 times more complicated.
It’s simple: how to control something a) more intelligent than ourselves, b) that can re-write its own code and create sub-routines therefore bypassing our control mechanisms.
You still haven’t answered this.
You say that we can’t control something more intelligent than ourselves. So where does that leave us? Just create the first AGI, tell it to “be good” and just hope that it won’t be a sophist? That sounds like a terrible plan, because our experience with computers tells us that they are the biggest sophists. Not because they want to! Simply because effectively telling them how to “do what I mean” is way harder than telling another human. Any programmer would agree a thousand times.
Maybe you anthropomorphize AGI too much. Maybe you think that, because it will be human-level, it will also be human like. Therefore it will just “get” us, we just need to make sure that the first words it hears is “be good” and never “be evil”. If so, then you couldn’t be more mistaken. Nothing tells us that the first AGI (in fact I dislike the term, I prefer transformative AI) will be human-like. In all probability (considering 1) the vast space of possible “mind types”, and 2) how an advanced computer will likely function much more similarly to older computers than to humans), it won’t.
So we necessarily need to be in control. Or, at least, build it in a way where it will be provably good.
In short, orthogonality thesis. Intelligence isn’t correlated to goals or values.
You keep talking about how important it is to make sure that the first AGI isn’t told to “be evil” or in the possession of evil people. That’s obviously also extremely important, but unfortunately that’s the easiest part. Not necessarily easy to implement, but easy to figure out how. Whereas with alignment we still haven’t got a clue.
“I also hold that, humans are image recognition programs. Stop trying to outplay AlphaGo at chess and “Look” “up”.”
Not sure I get what you mean, but if it is “we’re pretty much like computers”, then you’re obviously wrong. The clearest proof is, like I mentioned, how every programmer on Earth will tell you how hard it is to get the computer to “do what I mean”, whereas even the dumbest human would probably understand it a thousand times better.
And PS: I also don’t think we’ll solve alignment in time. At all. The only solution is to make sure no one builds transformative AI before we solve alignment, for instance through regulation or a narrow AI nanny.
We’re moving towards factual disputes that aren’t easy to resolve in logical space, and I fear any answers I give are mostly repeating previous statements. In general I hold that you’re veering toward a maximally wrong position with completely disastrous results if implemented. With that said:
I dispute this.
Place an image of the status quo in the “good things” folder. Which you should absolutely not do because it’s a terrible idea.
This seems ridiculous to me as a concept. No, advanced AI will not function similarly to ancient long obsolete technology. I see way too much present bias in this stance, and worse, a bias towards things in the future being like things in the past, despite the past being long over since ages ago, this is like running space ships on slide rules.
This also implies that every trick you manage to come up with, as to how to get a C compiler adjacent superintelligence to act more human, is not going to work, because the other party isn’t C compiler adjacent. Until we have a much better understanding of how to code efficiently, all such efforts are at best wasted, and likely counterproductive. To reiterate, stop trying to explain the motion of planets and build a telescope.
Note that, I do not desire that AI psychology be human like. That sounds like a bad idea.
Who is this “we”? How will you go from a position of “we” in control, to “we” not in control?
My expectation is that the first step is easy, and the second, impossible.
Humans have certain powers and abilities as per human nature. Math isn’t one of them. I state that trying to solve our problems with math is already a mistake, because we suck at math. What humans are good at is image recognition. We should solve our problems by “looking” at them.
The art of “looking” at problems isn’t easy to explain, unfortunately. Conversely, if I could explain it, I could also build AGI, or another human, right on the spot. It’s that sort of question.
To put it another way, using math to determine whether AI is good or not, is looking for the keys under the lamp. Wrong tool, wrong location, inevitable failure.
I’m fairly certain this produces an extremely bad outcome.
It’s the old, “The only thing necessary for the triumph of evil is for good men to do nothing.”
Evil will not sit around and wait for you to solve rubik’s cubes. Furthermore, implementation of AI regulation is much easier than its removal. I suspect that once you ban good men from building AI, it’s over, we’re done, that’s it.
Ps: 2 very important things I forgot to touch.
“This also implies that every trick you manage to come up with, as to how to get a C compiler adjacent superintelligence to act more human, is not going to work, because the other party isn’t C compiler adjacent. Until we have a much better understanding of how to code efficiently, all such efforts are at best wasted, and likely counterproductive.”
Not necessarily. Even the first steps on older science were important to the science of today. Science happens through building blocks of paradigms. Plus, there are mathematical and logical notions which are simply fundamental and worth investigating, like decision theory.
“Humans have certain powers and abilities as per human nature. Math isn’t one of them. I state that trying to solve our problems with math is already a mistake, because we suck at math. What humans are good at is image recognition. We should solve our problems by “looking” at them.”
Ok, sorry, but here you just fall into plain absurdity. Of course it would be great just to look at things and “get” them! Unfortunately, the language of computers, and of most science, is math. Should we perhaps drop all math in physics and just start “looking” instead? Please don’t actually say yes...
(To clarify, I’m not devaluing the value of “looking”, aka philosophy/rationality. Even in this specific problem of AI alignment. But to completely discard math is just absurd. Because, unfortunately, it’s the only road towards certain problems (needless to say there would be no computers without math, for instance)).
I’m actually sympathetic towards the view that mathematically solving alignment might be simply impossible. I.e. it might be unsolvable. Such is the opinion of Roman Yalmpolsky, an AI alignment researcher, who has written very good papers on its defense. However, I don’t think we lose much by having a couple hundred people working on it. We would only implement Friendly AI if we could mathematically prove it, so it’s not like we’d just go with a half-baked idea and create hell on Earth instead of “just” a paperclipper. And it’s not like Friendly AI is the only proposal in alignment either. People like Stuart Russell have a way more conservative approach, as in, “hey, maybe just don’t build advanced AI as utility maximizers since that will invariably produce chaos?”.
Some of this concepts might even be dangerous, or worse than doing nothing. Anyway, they are still in research and nothing is proven. To not try to do anything is just not acceptable, because I don’t think that the FIRST transformative/dangerous AI will be super virtuous. Maybe a very advanced AI would necessarily/logically be super virtuous. But we will build something dangerous before we get to that. Say, an AI that is only anything special in engineering, or even a specific type of engineering like nanotechnology. Such AI, which might even not be properly AGI, might already be extremely dangerous, for the obvious reason of having great power (from great intelligence in some key area(s)) without great values (orthogonality thesis).
“Furthermore, implementation of AI regulation is much easier than its removal. I suspect that once you ban good men from building AI, it’s over, we’re done, that’s it.”
Of course it wouldn’t be just any kind of regulation. Say, if you restrict access/production to supercomputers globally, you effectively slow AI development. Supercomputers are possible to control, laptops obviously aren’t.
Or, like I also said, a narrow AI nanny.
Are these and other similar measures dangerous? Certainly. But imo doing nothing is even way more.
I don’t even claim these are good ideas. We actually need more intelligent people to actually come up with actual good ideas in regulation. But I’m still pretty certain that regulation is the only way. Of course it can’t simply be “ok, so now governments are gonna ban AI research but they’re gonna keep doing it in their secret agencies anyway”. Narrow AI nanny is something that maybe could actually work, if far-fetched.
AI is advancing far quicker than our understanding of it, specially with black boxes like neural networks, and I find it impossible that things will stay on track when we build something that can actually have a vast real world impact.
If we could perhaps convince governments that AI is actually dangerous, and that humanity NECESSARILY has to drop all barriers and become way more cooperative if we want to have a shot of succeeding at not killing everyone or worse… Then it could be doable. Is this ridiculously hard? Yes, but still our only chance.