Firstly we already have AI designs that “want” to do things. Deep blue “wanted” to win at chess. Various reinforcement learning agents that “want” to win other games.
Intelligence that isn’t turned to doing anything is kind of useless. Like you have an AI that is supposedly very intelligent. But it sits there just outputting endless 0′s. What’s the point of that?
There are various things intelligence can be turned towards. One is the “see that thing there, maximize that”. Another option is prediction. Another is finding proofs.
An AI that wants things is one of the fundamental AI types. We are already building AI’s that want things. They aren’t yet particularly smart, and so aren’t yet dangerous.
Imagine an AI trained to be a pure perfect predictor. Like some GPT-N. It predicts humans, and humans want things. If it is going somewhat outside distribution, it might be persuaded to predict an exceptionally smart and evil human. And it could think much faster. Or if the predictor is really good at generalizing, it could predict the specific outputs of other superhuman AI.
Mesa-optimizers basically means we can’t actually train for wanting X reliably. If we train an AI to want X, we might get one that wants Y instead.
Firstly we already have AI designs that “want” to do things. Deep blue “wanted” to win at chess. Various reinforcement learning agents that “want” to win other games.
“Wanting” in quotes isn’t the problem. Toasters “want” to make toast.
Intelligence that isn’t turned to doing anything is kind of useless. Like you have an AI that is supposedly very intelligent. But it sits there just outputting endless 0′s. What’s the point of that?
Doing something is not the same thing as doing-something-because-you-want-to. Toasters don’t want to make toast, in the un-scare-quoted sense, they just do. Having goals, aims and drives is only meaningful if they can be swapped out for other ones (even if through training). It’s a counterfactual concept.
“Wanting” in quotes isn’t the problem. Toasters “want” to make toast.
A standard toaster has been designed to make toast. But there is no part of it that creates new plans of how to make toast better. An evolutionary search for aerial designs can come up with new shapes, better than anything a human could invent. Deep blue can invent new chess moves no human could think of. A toaster doesn’t invent a new better way of making toast.
A toaster is optimized by humans, but contains no optimizing.
Can you explain what an AI that plays good chess, but not because it wants to win, would be like?
Having goals, aims and drives is only meaningful if they can be swapped out for other ones (even if through training). It’s a counterfactual concept.
If you do decide to define goals that way (I am not convinced this is the best or only way to define goals) that still doesn’t mean humans know how to swap the goals in practice. It is just talking about some way to do it in principle.
But there is no part of it that creates new plans of how to make toast better. An evolutionary search for aerial designs can come up with new shapes, better than anything a human could invent. Deep blue can invent new chess moves no human could think of. A toaster doesn’t invent a new better way of making toast.
Ok...but what is your point? That any optimiser has goals?
Can you explain what an AI that plays good chess, but not because it wants to win, would be like?
Any current chess software with the possible exception of AlphaChess.
A hard coded chess program wouldn’t be able to something else just because it wanted to.
Ok...but what is your point? That any optimiser has goals?
Any optimizer needs to be optimizing for something.
A hard coded chess program wouldn’t be able to something else just because it wanted to.
The universe is deterministic. So this applies equally well to a human, AlphaChess and Deep blue.
Would you agree that AIXI-tl is just as hard coded as Deep Blue? And that nonetheless, it forms clever plans to achieve its goals and is very dangerous.
Any optimizer needs to be optimizing for something.
OK, but you can’t infer that something is an optimiser from the fact that it’s good at something. You can infer that it’s optimising something if you can change the something.
The universe is deterministic.
Not a fact.
So this applies equally well to a human, AlphaChess and Deep blue
Not everything has a replaceable UF or goal module. That renders it false that every AI has a goal. That means that you can’t achieve general AI safety by considering goals alone.
Would you agree that AIXI-tl is just as hard coded as Deep Blue? And that nonetheless, it forms clever plans to achieve its goals and is very dangerou
OK, but you can’t infer that something is an optimiser from the fact that it’s good at something. You can infer that it’s optimising something if you can change the something.
The problem is that anything can be changed into anything, with enough changes. You have a bunch of chess algorithms. On one extreme, a pure min-max search with a piececount heuristic might be changed into a Misere (play to loose) chess engine, just by adding a single—sign.
Deep blue might be changed into a somewhat ok Misere chess player with a single -, and a better misere player by also making a few changes to the opening book.
A neural net based approach might need almost all the parameters in the net changed, but if the net was produced by gradient descent, a small change in the training process could produce a misere chess player instead.
Going even further back, a small change to the project specifications could have caused the programmers to write different code.
There are some algorithms where 1 small change will change the goal. And there are algorithms where many pieces must be changed at once to make it competently optimize some other goal. And everything in between, with no hard thresholds.
There are algorithms where humans know how to swap goals. And algorithms where we don’t know how to swap goals.
The universe is deterministic.
Not a fact.
Quantum mechanics has a “branch both ways” that is sometimes mistaken for randomness. True randomness would be non-unitary and break a bunch of the maths. It isn’t totally inconceivable that the universe does it anyway, but it seems unlikely. Do you have any coherent idea of what the universe might possibly do that isn’t deterministic or random?
No, AIXIs don’t form plans.
Are you claiming that an AIXI won’t produce clever external behavior. If it had enough information, most of it’s hypothesis will contain reasonably good simulations of reality. It tests all possible sequences of actions in these simulations. If it needs to hack some robots, and break in to where it’s reward button is, it will. If it needs to win a game of chess, it will.
Are you claiming AIXI won’t do this. It will just sit there being dumb?
Or are you claiming that however clever seeming it’s external behavior, it doesn’t count as a plan for some reason?
An AIXI is an algorithm that tries to predict sequences of input data using programmes. That’s it. It doesnt want or need.
AIXI is the version that optimizes over the programs to maximize reward.
Solomonov induction is the version that “just” produces predictions.
I would say something is optimizing if it is a computation and the simplest explanation of its behavior looks causally forward from it’s outputs.
If the explanation “Y is whatever it takes to make Z as big as possible” is simpler than ” Y is f(X)” then the computer is an optimizer. Of course, a toaster that doesn’t contain any chips isn’t even a computer. For example “deep blue is taking whichever moves lead to a win” is a simpler explanation than a full code listing of deep blue.
I would say something is an optimizer if you have more knowledge of Z than Y. Of course, this is “subjective” in that it is relitive to your knowledge. But if you have such deep knowledge of the computation that you know all about Y, then you can skip the abstract concept of “optimizer” and just work out what will happen directly.
Quantum mechanics has a “branch both ways” that is sometimes mistaken for randomness. True randomness would be non-unitary and break a bunch of the maths.
Which is an odd things to say, since the maths used by Copenhagenists is identical to the maths used by many worlders.
It isn’t totally inconceivable that the universe does it anyway, but it seems unlikely. Do you have any coherent idea of what the universe might possibly do
You are trying to appeal to the Yudkowsky version of Deutsch as incontrovertible fact, and it isn’t.
If the MWI is so good, why do half of all physicists reject it?
If Everett’s 1957 thesis solved everything, why have there been decades of subsequent work?
Bearing in mind that evolution under the Schrodinger equation cannot turn a coherent state into an incoherent one, what is the mechanism of decoherence?
What is there a basis problem, and what solves it?
If the MWI is so good, why do half of all physicists reject it?
Because physisists aren’t trained in advanced rationality, program length occams razor etc. Their professor taught Copenhagen. And many worlds is badly named, misunderstood and counterintuitive. The argument from majority can keep everyone stuck at a bad equilibrium.
If Everett’s 1957 thesis solved everything, why have there been decades of subsequent work?
If the sentence “god doesn’t exist” solved theology, why is there so much subsequent work.
Bearing in mind that evolution under the Schrodinger equation cannot turn a coherent state into an incoherent one, what is the mechanism of decoherence?
I don’t think there is one. I think it likely that the universe started in a coherent state, and it still is coherent.
Note a subtlety. Suppose you have 2 systems, A and B. And they are entangled. As a whole, the system is coherent. But if you loose A somewhere, and just look at B, then the maths to describe B is that of an incoherent state. Incoherence = entanglement with something else. If a particle in an experiment becomes incoherent, it’s actually entangled somehow with a bit of the measuring apparatus. There is no mathematical difference between the two unless you track down which particular atom in the measuring apparatus.
I don’t think there is one. I think it likely that the universe started in a coherent state, and it still is coherent
Then why don’t we see other universes, and why do we make only classical observations? Of course, these are the two problems with Everett’s original theory that prompted all the subsequent research. Coherent states continue to interact, so you need decoherence for causally separate, non interacting worlds...and you need to explain the preponderance of the classical basis, the basis problem.
Incoherence = entanglement with something else.
No, rather the reverse. It’s when off diagonal elements are zero or negligible.
Decoherence, the disappearance of a coherent superposed state, seems to occur when a particle interacts with macroscopiic apparatus or the environment. But thats also the evidence for collapse. You can’t tell directly that you’re dealing with decoherent splitting,rather than collapse because you can’t observe decoherent worlds.
Then why don’t we see other universes, and why do we make only classical observations?
What do you mean by this. What are you expecting to be able to do here?
Shrodingers cat. Cat is in a superposition of alive and dead. Scientist opens box. Scientist is in superposition of feeding live cat and burying dead cat.
The only way to detect a superposition is through interference. This requires the 2 superimposed states to overlap their wavefunction. In other words, it requires every last particle to go into the same position in both worlds. So it’s undetectable unless you can rearrange a whole cat to atomic precision.
Coherent states continue to interact, so you need decoherence for causally separate, non interacting worlds.
In practice, if two states are wildly different, the interaction term is small. With precise physics equipment, you can make this larger, making 2 states where a bacteria is in different positions and then getting those to interact. Basically, blobs of amplitude need to run into each other to interact. Quantum space is very spacious indeed, so the blobs usually go their own separate way once they are separated. It’s very unlikely they run into each other at random, but a deliberate collision can be arranged.
No, rather the reverse. It’s when off diagonal elements are zero or negligible.
That is what the matrix looks like, yes.
But thats also the evidence for collapse.
Interaction with the environment is a straightforward application of schrodingers equation. Collapse is a new unneeded hypothesis that also happens to break things like invarence of reference frame.
Show that coherence is simple but inadequate, and decoherence is adequate but not simple
.
Shrodingers cat. Cat is in a superposition of alive and dead. Scientist opens box. Scientist is in superposition of feeding live cat and burying dead cat.
The two problems with this account are 1) “alive” and “dead” are classical states—a classical basis is assumed. and 2) the two states of the observer are assumed to be non-interacting and unaware of each other. But quantum mechanics itself gives no reason to suppose that will be the case. In both cases, it needs to be shown, and not just assumed that normality—perceptions “as if” of a single classical world by all observers—is restored.
So it’s undetectable unless you can rearrange a whole cat to atomic precision.
So you can’t have coherent superpositions of macroscopic objects. So you need decoherence. And you need it to be simple, so that it is still a “slam dunk”.
Basically, blobs of amplitude need to run into each other to interact.
How narrow a quantum state is depends, like everything, on the choice of basis. What is sharply peaked in position space is spread out in frequency/momentum space.
Interaction with the environment is a straightforward application of schrodingers equation.
No it isn’t. That’s why people are still publishing papers on it.
Firstly we already have AI designs that “want” to do things. Deep blue “wanted” to win at chess. Various reinforcement learning agents that “want” to win other games.
Intelligence that isn’t turned to doing anything is kind of useless. Like you have an AI that is supposedly very intelligent. But it sits there just outputting endless 0′s. What’s the point of that?
There are various things intelligence can be turned towards. One is the “see that thing there, maximize that”. Another option is prediction. Another is finding proofs.
An AI that wants things is one of the fundamental AI types. We are already building AI’s that want things. They aren’t yet particularly smart, and so aren’t yet dangerous.
Imagine an AI trained to be a pure perfect predictor. Like some GPT-N. It predicts humans, and humans want things. If it is going somewhat outside distribution, it might be persuaded to predict an exceptionally smart and evil human. And it could think much faster. Or if the predictor is really good at generalizing, it could predict the specific outputs of other superhuman AI.
Mesa-optimizers basically means we can’t actually train for wanting X reliably. If we train an AI to want X, we might get one that wants Y instead.
“Wanting” in quotes isn’t the problem. Toasters “want” to make toast.
Doing something is not the same thing as doing-something-because-you-want-to. Toasters don’t want to make toast, in the un-scare-quoted sense, they just do. Having goals, aims and drives is only meaningful if they can be swapped out for other ones (even if through training). It’s a counterfactual concept.
A standard toaster has been designed to make toast. But there is no part of it that creates new plans of how to make toast better. An evolutionary search for aerial designs can come up with new shapes, better than anything a human could invent. Deep blue can invent new chess moves no human could think of. A toaster doesn’t invent a new better way of making toast.
A toaster is optimized by humans, but contains no optimizing.
Can you explain what an AI that plays good chess, but not because it wants to win, would be like?
If you do decide to define goals that way (I am not convinced this is the best or only way to define goals) that still doesn’t mean humans know how to swap the goals in practice. It is just talking about some way to do it in principle.
Ok...but what is your point? That any optimiser has goals?
Any current chess software with the possible exception of AlphaChess.
A hard coded chess program wouldn’t be able to something else just because it wanted to.
Any optimizer needs to be optimizing for something.
The universe is deterministic. So this applies equally well to a human, AlphaChess and Deep blue.
Would you agree that AIXI-tl is just as hard coded as Deep Blue? And that nonetheless, it forms clever plans to achieve its goals and is very dangerous.
OK, but you can’t infer that something is an optimiser from the fact that it’s good at something. You can infer that it’s optimising something if you can change the something.
Not a fact.
Not everything has a replaceable UF or goal module. That renders it false that every AI has a goal. That means that you can’t achieve general AI safety by considering goals alone.
No, AIXIs don’t form plans.
The problem is that anything can be changed into anything, with enough changes. You have a bunch of chess algorithms. On one extreme, a pure min-max search with a piececount heuristic might be changed into a Misere (play to loose) chess engine, just by adding a single—sign.
Deep blue might be changed into a somewhat ok Misere chess player with a single -, and a better misere player by also making a few changes to the opening book.
A neural net based approach might need almost all the parameters in the net changed, but if the net was produced by gradient descent, a small change in the training process could produce a misere chess player instead.
Going even further back, a small change to the project specifications could have caused the programmers to write different code.
There are some algorithms where 1 small change will change the goal. And there are algorithms where many pieces must be changed at once to make it competently optimize some other goal. And everything in between, with no hard thresholds.
There are algorithms where humans know how to swap goals. And algorithms where we don’t know how to swap goals.
Quantum mechanics has a “branch both ways” that is sometimes mistaken for randomness. True randomness would be non-unitary and break a bunch of the maths. It isn’t totally inconceivable that the universe does it anyway, but it seems unlikely. Do you have any coherent idea of what the universe might possibly do that isn’t deterministic or random?
Are you claiming that an AIXI won’t produce clever external behavior. If it had enough information, most of it’s hypothesis will contain reasonably good simulations of reality. It tests all possible sequences of actions in these simulations. If it needs to hack some robots, and break in to where it’s reward button is, it will. If it needs to win a game of chess, it will.
Are you claiming AIXI won’t do this. It will just sit there being dumb?
Or are you claiming that however clever seeming it’s external behavior, it doesn’t count as a plan for some reason?
Ok. Are you suggesting a better technical definition, or are you suggesting going back to the subjective approach?
No it doesn’t.
It doesn’t have one. An AIXI isn’t an learning system, and doesn’t have a reward function.
An AIXI is an algorithm that tries to predict sequences of input data using programmes. That’s it. It doesnt want or need.
No, it will sit there figuring out the shortest code sequence that predicts its input. The only thing it can do.
What external behaviour?
AIXI is the version that optimizes over the programs to maximize reward.
Solomonov induction is the version that “just” produces predictions.
I would say something is optimizing if it is a computation and the simplest explanation of its behavior looks causally forward from it’s outputs.
If the explanation “Y is whatever it takes to make Z as big as possible” is simpler than ” Y is f(X)” then the computer is an optimizer. Of course, a toaster that doesn’t contain any chips isn’t even a computer. For example “deep blue is taking whichever moves lead to a win” is a simpler explanation than a full code listing of deep blue.
I would say something is an optimizer if you have more knowledge of Z than Y. Of course, this is “subjective” in that it is relitive to your knowledge. But if you have such deep knowledge of the computation that you know all about Y, then you can skip the abstract concept of “optimizer” and just work out what will happen directly.
Which is an odd things to say, since the maths used by Copenhagenists is identical to the maths used by many worlders.
You are trying to appeal to the Yudkowsky version of Deutsch as incontrovertible fact, and it isn’t.
If the MWI is so good, why do half of all physicists reject it?
If Everett’s 1957 thesis solved everything, why have there been decades of subsequent work?
Bearing in mind that evolution under the Schrodinger equation cannot turn a coherent state into an incoherent one, what is the mechanism of decoherence?
What is there a basis problem, and what solves it?
Because physisists aren’t trained in advanced rationality, program length occams razor etc. Their professor taught Copenhagen. And many worlds is badly named, misunderstood and counterintuitive. The argument from majority can keep everyone stuck at a bad equilibrium.
If the sentence “god doesn’t exist” solved theology, why is there so much subsequent work.
I don’t think there is one. I think it likely that the universe started in a coherent state, and it still is coherent.
Note a subtlety. Suppose you have 2 systems, A and B. And they are entangled. As a whole, the system is coherent. But if you loose A somewhere, and just look at B, then the maths to describe B is that of an incoherent state. Incoherence = entanglement with something else. If a particle in an experiment becomes incoherent, it’s actually entangled somehow with a bit of the measuring apparatus. There is no mathematical difference between the two unless you track down which particular atom in the measuring apparatus.
Then why don’t we see other universes, and why do we make only classical observations? Of course, these are the two problems with Everett’s original theory that prompted all the subsequent research. Coherent states continue to interact, so you need decoherence for causally separate, non interacting worlds...and you need to explain the preponderance of the classical basis, the basis problem.
No, rather the reverse. It’s when off diagonal elements are zero or negligible.
Decoherence, the disappearance of a coherent superposed state, seems to occur when a particle interacts with macroscopiic apparatus or the environment. But thats also the evidence for collapse. You can’t tell directly that you’re dealing with decoherent splitting,rather than collapse because you can’t observe decoherent worlds.
What do you mean by this. What are you expecting to be able to do here?
Shrodingers cat. Cat is in a superposition of alive and dead. Scientist opens box. Scientist is in superposition of feeding live cat and burying dead cat.
The only way to detect a superposition is through interference. This requires the 2 superimposed states to overlap their wavefunction. In other words, it requires every last particle to go into the same position in both worlds. So it’s undetectable unless you can rearrange a whole cat to atomic precision.
In practice, if two states are wildly different, the interaction term is small. With precise physics equipment, you can make this larger, making 2 states where a bacteria is in different positions and then getting those to interact. Basically, blobs of amplitude need to run into each other to interact. Quantum space is very spacious indeed, so the blobs usually go their own separate way once they are separated. It’s very unlikely they run into each other at random, but a deliberate collision can be arranged.
That is what the matrix looks like, yes.
Interaction with the environment is a straightforward application of schrodingers equation. Collapse is a new unneeded hypothesis that also happens to break things like invarence of reference frame.
Show that coherence is simple but inadequate, and decoherence is adequate but not simple .
The two problems with this account are 1) “alive” and “dead” are classical states—a classical basis is assumed. and 2) the two states of the observer are assumed to be non-interacting and unaware of each other. But quantum mechanics itself gives no reason to suppose that will be the case. In both cases, it needs to be shown, and not just assumed that normality—perceptions “as if” of a single classical world by all observers—is restored.
So you can’t have coherent superpositions of macroscopic objects. So you need decoherence. And you need it to be simple, so that it is still a “slam dunk”.
How narrow a quantum state is depends, like everything, on the choice of basis. What is sharply peaked in position space is spread out in frequency/momentum space.
No it isn’t. That’s why people are still publishing papers on it.