I think this is a type error, by which I mean the thing we do on Less Wrong is not science, so it doesn’t make sense to try to find a scientific consensus.
The thing we try to do here is rationality, of the sort described in the sequences. Science is sometimes a useful thing to do, but it’s not the only thing, and in this case it doesn’t quite make sense.
Further, voting is very much not about figuring out what’s right or true; it’s just people saying I want or don’t want to read things like this, which is something different. Less Wrong has, for the past few years, conducted an annual review, which is a bit more like peer review, and you can read things that “passed review” here, though note that as far as I know the reviews have never pulled up a post that got a negative score, though they have found gems that were under-appreciated at time of publication.
Not yet. The review process looks at posts from the previous year and happens in December, so for example in December 2022 we reviewed posts from 2021. Since your post was made in 2023, it will be eligible for the December 2024 review cycle.
Maybe you know if there is any organization that acts like AI police that I could contact? Maybe I could request a review earlier if I pay? I hope you understand how dangerous it is to assume Orthogonality thesis is right if that’s not the case. I am certain I can prove that it’s not the case.
Actually the opposite seems true to me. Assuming the orthogonality thesis is the conservative view that’s less likely to result in a false positive (think you built aligned AI that isn’t). Believing it is false seems more likely to lead to building AI that you think will be aligned but then is not.
I’ve explored this kind of analysis here, which suggests we should in some cases be a bit less concerned with what’s true and a bit more concerned with, given uncertainty, what’s most dangerous if we think it’s true and we’re wrong.
There is no AI police, for better or worse, though coordination among AI labs is an active and pressing area of work. You can find more about it here and on the EA Forum.
I see you assume that if orthogonality thesis is wrong, intelligent agents will converge to a goal aligned with humans. There is no reason to believe that. I argue that orthogonality thesis is wrong and agents will converge to Power Seeking, this would be disastrous for humanity.
I noticed that many people don’t understand significance of Pascal’s mugging, which might be the case with you too, feel free to join in here.
I think this is misunderstanding the orthogonality thesis, but we can talk about it over on that post perhaps. The problem of converging to power seeking is well known, but this is not seen as a an argument against the orthogonality thesis, but rather a separate but related concern. I’m not aware of anyone who thinks they can ignore concerns about instrumental convergence towards power seeking. In fact, I think the problem is that people are all too aware of this, and thing that a lack of orthogonality thesis mitigates it, while the point of the orthogonality thesis is to say that it does not resolve on its own the way it does in humans.
Thank you so much for opening my eyes what is the meaning of “orthogonality thesis”, shame on me 🤦 I will clarify my point in a separate post. We can continue there 🙏
Just going to add on here: The main way science fights against herd mentality is by having a culture of trying to disprove theories via experiment, and following Feynman’s maxim: “If it disagrees with experiment, it’s wrong.” Generally, this will also work on rationalists. If you make a post where you can demonstrate a goal-less agent acting like it has a goal, that will get much more traction here.
If I can demonstrate a goal-less agent acting like it has a goal it is already too late. We need to recognize this theoretically and stop it from happening.
I try to prove it using logic, but not so many people are really good at it. And people that are good at it don’t pay attention to downvoted post. How can I overcome that?
If I can demonstrate a goal-less agent acting like it has a goal it is already too late. We need to recognize this theoretically and stop it from happening.
I didn’t say you had to demonstrate it with a superintelligent agent. If I had said that, you could also have fairly objected that neither you nor anyone else knows how to build a superintelligent agent.
Just to give one example of an experiment you could do: There’s chess variants where you can have various kinds of silly goals like capturing all your opponent’s pawns, or trying to force the opponent to checkmate your own king. You could try programming a chess AI (using similar algorithms to the current ones, like alpha-beta pruning) that doesn’t know which chess variant it lives in. Then see what the results are.
Not saying you should do exactly this thing, just trying to give an example of experiments you could run without having to build a superintelligence.
I try to prove it using logic, but not so many people are really good at it. And people that are good at it don’t pay attention to downvoted post. How can I overcome that?
Use more math to make your arguments more precise. It seems like the main thrust of your post is a claim that an AI that is uncertain about what its goal is will instrumentally seek power. This strikes me as mostly true. Mathematically you’d be talking about a probability distribution over utility functions. But you also seem to claim that it is in fact possible to derive an ought from an is. As an English sentence, this could mean many different things, but it’s particularly easy to interpret as a statement about which kinds of propositions are derivable from which other propositions in the formal system of first order logic. And when interpreted this way, it is false. (I’ve previously discussed this here) So one issue you might be having is everyone who thinks you’re talking about first order logic downvotes you, even though you’re trying to talk about probability distributions over utility functions. Writing out your ideas in terms of math helps prevent this because it’s immediately obvious whether you’re doing first order logic or expected utility.
If an outcome with infinite utility is presented, then it doesn’t matter how small its probability is: all actions which lead to that outcome will have to dominate the agent’s behavior.
I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?
I created a separate post for this, we can continue there.
Generally “scientific consensus” is described here.
In this context I mean—is there any means in place to fight herd mentality and encourage rationality?
I think this is a type error, by which I mean the thing we do on Less Wrong is not science, so it doesn’t make sense to try to find a scientific consensus.
The thing we try to do here is rationality, of the sort described in the sequences. Science is sometimes a useful thing to do, but it’s not the only thing, and in this case it doesn’t quite make sense.
Further, voting is very much not about figuring out what’s right or true; it’s just people saying I want or don’t want to read things like this, which is something different. Less Wrong has, for the past few years, conducted an annual review, which is a bit more like peer review, and you can read things that “passed review” here, though note that as far as I know the reviews have never pulled up a post that got a negative score, though they have found gems that were under-appreciated at time of publication.
Thank you! Is it possible to ask for a peer review?
Not yet. The review process looks at posts from the previous year and happens in December, so for example in December 2022 we reviewed posts from 2021. Since your post was made in 2023, it will be eligible for the December 2024 review cycle.
I see, thanks.
Maybe you know if there is any organization that acts like AI police that I could contact? Maybe I could request a review earlier if I pay? I hope you understand how dangerous it is to assume Orthogonality thesis is right if that’s not the case. I am certain I can prove that it’s not the case.
Actually the opposite seems true to me. Assuming the orthogonality thesis is the conservative view that’s less likely to result in a false positive (think you built aligned AI that isn’t). Believing it is false seems more likely to lead to building AI that you think will be aligned but then is not.
I’ve explored this kind of analysis here, which suggests we should in some cases be a bit less concerned with what’s true and a bit more concerned with, given uncertainty, what’s most dangerous if we think it’s true and we’re wrong.
There is no AI police, for better or worse, though coordination among AI labs is an active and pressing area of work. You can find more about it here and on the EA Forum.
I see you assume that if orthogonality thesis is wrong, intelligent agents will converge to a goal aligned with humans. There is no reason to believe that. I argue that orthogonality thesis is wrong and agents will converge to Power Seeking, this would be disastrous for humanity.
I noticed that many people don’t understand significance of Pascal’s mugging, which might be the case with you too, feel free to join in here.
Hm, thanks.
I think this is misunderstanding the orthogonality thesis, but we can talk about it over on that post perhaps. The problem of converging to power seeking is well known, but this is not seen as a an argument against the orthogonality thesis, but rather a separate but related concern. I’m not aware of anyone who thinks they can ignore concerns about instrumental convergence towards power seeking. In fact, I think the problem is that people are all too aware of this, and thing that a lack of orthogonality thesis mitigates it, while the point of the orthogonality thesis is to say that it does not resolve on its own the way it does in humans.
Thank you so much for opening my eyes what is the meaning of “orthogonality thesis”, shame on me 🤦 I will clarify my point in a separate post. We can continue there 🙏
Just going to add on here: The main way science fights against herd mentality is by having a culture of trying to disprove theories via experiment, and following Feynman’s maxim: “If it disagrees with experiment, it’s wrong.” Generally, this will also work on rationalists. If you make a post where you can demonstrate a goal-less agent acting like it has a goal, that will get much more traction here.
If I can demonstrate a goal-less agent acting like it has a goal it is already too late. We need to recognize this theoretically and stop it from happening.
I try to prove it using logic, but not so many people are really good at it. And people that are good at it don’t pay attention to downvoted post. How can I overcome that?
I didn’t say you had to demonstrate it with a superintelligent agent. If I had said that, you could also have fairly objected that neither you nor anyone else knows how to build a superintelligent agent.
Just to give one example of an experiment you could do: There’s chess variants where you can have various kinds of silly goals like capturing all your opponent’s pawns, or trying to force the opponent to checkmate your own king. You could try programming a chess AI (using similar algorithms to the current ones, like alpha-beta pruning) that doesn’t know which chess variant it lives in. Then see what the results are.
Not saying you should do exactly this thing, just trying to give an example of experiments you could run without having to build a superintelligence.
Use more math to make your arguments more precise. It seems like the main thrust of your post is a claim that an AI that is uncertain about what its goal is will instrumentally seek power. This strikes me as mostly true. Mathematically you’d be talking about a probability distribution over utility functions. But you also seem to claim that it is in fact possible to derive an ought from an is. As an English sentence, this could mean many different things, but it’s particularly easy to interpret as a statement about which kinds of propositions are derivable from which other propositions in the formal system of first order logic. And when interpreted this way, it is false. (I’ve previously discussed this here) So one issue you might be having is everyone who thinks you’re talking about first order logic downvotes you, even though you’re trying to talk about probability distributions over utility functions. Writing out your ideas in terms of math helps prevent this because it’s immediately obvious whether you’re doing first order logic or expected utility.
Thanks, sounds reasonable.
But I think I could find irrationality in your opinion if we dug deeper to the same idea mentioned here.
As it is mentioned in Pascal’s Mugging
I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?
I created a separate post for this, we can continue there.