On the other hand, the agent knows a quined description of itself
I can’t remember if there was a specific problem that motivated this stipulation, so: Is this necessary? E.g. humans do not have an exact representation of themselves (they have an exact existence/they ‘just are’ and have an inexact and partially false mental model), yet they can still sort of locate instantiations of themselves within models if you point them to roughly the right bit of the model. It feels like maybe a sufficiently advanced general intelligence should be able to recognise itself given some incomplete knowledge of itself, at least once it is looking in the right place/located itself-that-it-does-not-yet-know-is-itself.
I guess perhaps the quining stipulation for UDT/the source code stipulation for FAI is not to say that it is strictly necessary, but rather to guarantee the intelligence is self-aware enough for us to make interesting statements about it? I.e. that it’s not an axiom per se but rather a tentatively stipulated property that makes it easier to talk about things but which is not claimed to be (necessarily) necessary?
On the other hand, a proof of the necessity of something as complete as source code would not surprise me.
...it requires us to describe our utility function at the base level of reality...
(1) Fix the set of possible worlds (i.e. models) that have positive credence[1]. Specify VNM preferences over all the events/outcomes within those models that we wish to consider. Then assuming the model we’re actually in[2] was in the set of hypotheses we fixed, then we have a utility function over the actual model.
(2) Fix the set of models M that have positive credence, and choose any[3] language L that is powerful enough to describe them. Let F be a function which takes m in M and phi in the set of formulas of L as inputs and outputs the interpretation of phi in M. E.g. if m is a Newtonian model, F(m,‘Me’) will return some configuration of billiard balls, where if m is a quantum model, F(m,‘Me’) will return some region of a universal wavefunction.
Now exhaustively specify preferences in the language, e.g. ‘Vanilla > Chocolate’. Then for each model (possible base level of reality), the interpretation function F will allow that model’s preferences to be generated. This can save up-front work because we can avoid figuring out F(m,phi) until we come across a particular m, so that we do not need to actually ‘compute’ every F(m, phi) in advance, whereas (1) requires specifying them up front. And this is probably more realistic in that it is more like the way humans (and possibly AGI’s) would figure out preferences over things in the model; have a vague idea of concepts in the language then cash them out as and when necessary.
[1]I say not measure, for I remain unconvinced that measure is a general enough concept for an FAI or even humans to make decisions
[2]Dubious and probably confused way to think about it since a key point is that we’re not ‘actually in’ any one model; somewhat more accurately we are every matching instantiation, and perhaps more accurately still we’re the superposition of all those matching instantiations
[3]It’s not trivial that it’s trivial that we get the same results from any choice of language so that we can indeed choose arbitrarily
maybe a sufficiently advanced general intelligence should be able to recognize itself given some incomplete knowledge of itself
Yeah, that would be good, but we have no idea how to do it mathematically yet. Humans have intuitions that approximate that, but evolution probably didn’t give us a mechanism that’s correct in general, we’ll have to come up with the right math ourselves.
Let F be a function which takes m in M and phi in the set of formulas of L as inputs and outputs the interpretation of phi in M
The big problem is defining F. If you squint the right way, you can view Paul’s idea as a way of asking the AI to figure out F.
I can’t remember if there was a specific problem that motivated this stipulation, so: Is this necessary? E.g. humans do not have an exact representation of themselves (they have an exact existence/they ‘just are’ and have an inexact and partially false mental model), yet they can still sort of locate instantiations of themselves within models if you point them to roughly the right bit of the model. It feels like maybe a sufficiently advanced general intelligence should be able to recognise itself given some incomplete knowledge of itself, at least once it is looking in the right place/located itself-that-it-does-not-yet-know-is-itself.
I guess perhaps the quining stipulation for UDT/the source code stipulation for FAI is not to say that it is strictly necessary, but rather to guarantee the intelligence is self-aware enough for us to make interesting statements about it? I.e. that it’s not an axiom per se but rather a tentatively stipulated property that makes it easier to talk about things but which is not claimed to be (necessarily) necessary?
On the other hand, a proof of the necessity of something as complete as source code would not surprise me.
(1) Fix the set of possible worlds (i.e. models) that have positive credence[1]. Specify VNM preferences over all the events/outcomes within those models that we wish to consider. Then assuming the model we’re actually in[2] was in the set of hypotheses we fixed, then we have a utility function over the actual model.
(2) Fix the set of models M that have positive credence, and choose any[3] language L that is powerful enough to describe them. Let F be a function which takes m in M and phi in the set of formulas of L as inputs and outputs the interpretation of phi in M. E.g. if m is a Newtonian model, F(m,‘Me’) will return some configuration of billiard balls, where if m is a quantum model, F(m,‘Me’) will return some region of a universal wavefunction.
Now exhaustively specify preferences in the language, e.g. ‘Vanilla > Chocolate’. Then for each model (possible base level of reality), the interpretation function F will allow that model’s preferences to be generated. This can save up-front work because we can avoid figuring out F(m,phi) until we come across a particular m, so that we do not need to actually ‘compute’ every F(m, phi) in advance, whereas (1) requires specifying them up front. And this is probably more realistic in that it is more like the way humans (and possibly AGI’s) would figure out preferences over things in the model; have a vague idea of concepts in the language then cash them out as and when necessary.
[1]I say not measure, for I remain unconvinced that measure is a general enough concept for an FAI or even humans to make decisions
[2]Dubious and probably confused way to think about it since a key point is that we’re not ‘actually in’ any one model; somewhat more accurately we are every matching instantiation, and perhaps more accurately still we’re the superposition of all those matching instantiations
[3]It’s not trivial that it’s trivial that we get the same results from any choice of language so that we can indeed choose arbitrarily
Yeah, that would be good, but we have no idea how to do it mathematically yet. Humans have intuitions that approximate that, but evolution probably didn’t give us a mechanism that’s correct in general, we’ll have to come up with the right math ourselves.
The big problem is defining F. If you squint the right way, you can view Paul’s idea as a way of asking the AI to figure out F.