If I can demonstrate a goal-less agent acting like it has a goal it is already too late. We need to recognize this theoretically and stop it from happening.
I didn’t say you had to demonstrate it with a superintelligent agent. If I had said that, you could also have fairly objected that neither you nor anyone else knows how to build a superintelligent agent.
Just to give one example of an experiment you could do: There’s chess variants where you can have various kinds of silly goals like capturing all your opponent’s pawns, or trying to force the opponent to checkmate your own king. You could try programming a chess AI (using similar algorithms to the current ones, like alpha-beta pruning) that doesn’t know which chess variant it lives in. Then see what the results are.
Not saying you should do exactly this thing, just trying to give an example of experiments you could run without having to build a superintelligence.
I try to prove it using logic, but not so many people are really good at it. And people that are good at it don’t pay attention to downvoted post. How can I overcome that?
Use more math to make your arguments more precise. It seems like the main thrust of your post is a claim that an AI that is uncertain about what its goal is will instrumentally seek power. This strikes me as mostly true. Mathematically you’d be talking about a probability distribution over utility functions. But you also seem to claim that it is in fact possible to derive an ought from an is. As an English sentence, this could mean many different things, but it’s particularly easy to interpret as a statement about which kinds of propositions are derivable from which other propositions in the formal system of first order logic. And when interpreted this way, it is false. (I’ve previously discussed this here) So one issue you might be having is everyone who thinks you’re talking about first order logic downvotes you, even though you’re trying to talk about probability distributions over utility functions. Writing out your ideas in terms of math helps prevent this because it’s immediately obvious whether you’re doing first order logic or expected utility.
If an outcome with infinite utility is presented, then it doesn’t matter how small its probability is: all actions which lead to that outcome will have to dominate the agent’s behavior.
I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?
I created a separate post for this, we can continue there.
I didn’t say you had to demonstrate it with a superintelligent agent. If I had said that, you could also have fairly objected that neither you nor anyone else knows how to build a superintelligent agent.
Just to give one example of an experiment you could do: There’s chess variants where you can have various kinds of silly goals like capturing all your opponent’s pawns, or trying to force the opponent to checkmate your own king. You could try programming a chess AI (using similar algorithms to the current ones, like alpha-beta pruning) that doesn’t know which chess variant it lives in. Then see what the results are.
Not saying you should do exactly this thing, just trying to give an example of experiments you could run without having to build a superintelligence.
Use more math to make your arguments more precise. It seems like the main thrust of your post is a claim that an AI that is uncertain about what its goal is will instrumentally seek power. This strikes me as mostly true. Mathematically you’d be talking about a probability distribution over utility functions. But you also seem to claim that it is in fact possible to derive an ought from an is. As an English sentence, this could mean many different things, but it’s particularly easy to interpret as a statement about which kinds of propositions are derivable from which other propositions in the formal system of first order logic. And when interpreted this way, it is false. (I’ve previously discussed this here) So one issue you might be having is everyone who thinks you’re talking about first order logic downvotes you, even though you’re trying to talk about probability distributions over utility functions. Writing out your ideas in terms of math helps prevent this because it’s immediately obvious whether you’re doing first order logic or expected utility.
Thanks, sounds reasonable.
But I think I could find irrationality in your opinion if we dug deeper to the same idea mentioned here.
As it is mentioned in Pascal’s Mugging
I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?
I created a separate post for this, we can continue there.