And as for the assumptions, I’m more worried about the definitions: what happens when the AI realises that the definition of what a “human” is turns out to be flawed.
what happens when the AI realises that the definition of what a “human” is turns out to be flawed.
The AI’s definition of “human” should be computational. If it discovers new physics, it may find additional physical process that implement that computation, but it should not get confused.
Ontological crises seems to be a problem for AIs with utility functions over arrangements of particles, but it doesn’t make much sense to me to specify our utility function that way. We don’t think of what we want as arrangements of particles, we think at a much higher level of abstraction and we would be happy with any underlying physics that implemented the features of that abstraction level. Our preferences at that high level are what should generate our preferences in terms of ontologically basic stuff whatever ontology the AI ends up using.
Our preferences at that high level are what should generate our preferences in terms of ontologically basic stuff whatever ontology the AI ends up using.
I am not sure that the higher-level of abstraction saves you from sliding into an ontological black hole. My analogy is in physics: Classical electromagnetism leads to the ultraviolet catastrophe, making this whole higher classical level unstable, until you get the lower levels “right”.
I can easily imagine that an attempt to specify a utility function over “a much higher level of abstraction” would result in a sort of “ultraviolet catastrophe” where the utility function can become unbounded at one end of the spectrum, until you fix the lower levels of abstraction.
Not sure if this is what you are asking, but a paperclip maximizer not familiar with general relativity risks creating a black hole out of paper clips, losing all its hard work as a result.
That would be a problem of the AI not being able to accurately predict the consequences of its actions because it doesn’t know enough physics. An ontological crises would involve the paperclip maximizer learning new physics and therefor getting confused about what a paperclip is and maximizing something else.
Example: An AI is introduces to a large quantity of metal, and told to make paperclips. Since the AI is confined in a metal-only environment, “paperclip” is defined only as a shape.
The AI escapes from the box, and encounters a lake. It then spends some time trying to create paperclip shapes from water. After a bit of experimentation, it finds that freezing the water to ice allows it to create paperclip shapes. Moreover, it finds that any substance provided with enough heat will melt.
Therefore, in order to better create paperclip shapes from other, possibly undiscovered materials, the AI puts out the sun, and otherwise seeks to minimise the amount of heat in the universe.
Pretty sure that freezing stuff would cost lots of negentropy which Clippy could spend to make many more paperclips out of already solid materials instead.
That is an example of a paperclip maximizer failing an ontological crises. It doesn’t seem to illustrate Shminux’s concept of an “ultraviolet catastrophe”, though.
I think that the concept of an ontological crises metaphorically similar to the ultraviolet catastrophe is confused, and I don’t expect to find a good example. I suspect that Shminux was thinking more of problems of inaccurate predictions from incomplete physics than utility functions that don’t translate correctly to new ontologies when he proposed it.
To be clear, the issue here is that it inadvertently hastens the heat death of the universe,and generally lowers it’s ability to create paperclips, right?
It’s just an example of an ontological crisis; the AI is learning new physics (cold causes water to freeze), and is not certain of what a paperclip is, and is therefore maximising something else (coldness).
and is therefore maximising something else (coldness).
The thing the paperclip maximizer is maximizing instead of paperclips is paperclip-shaped objects made out of the wrong material. Coldness is just an instrumental value, and the example could be simplified and made more plausible by taking that part out. ETA: And the relevant new physics is not that cold water freezes but that materials other than metal exist.
The thing the paperclip maximizer is maximizing instead of paperclips is paperclip-shaped objects made out of the wrong material. Coldness is just an instrumental value,
A good point. I hadn’t thought of it that way, but you are correct.
And the relevant new physics is not that cold water freezes but that materials other than metal exist.
Oh, right. But … it’s actually maximizing solids, which is instrumental to maximizing paperclip-shaped objects, which is what it was programmed to do in the first place. Right?
I think it largely comes down to how you handle divergent resources. For the ultraviolet catastrophe, let’s use the example of… the ultraviolet catastrophe.
Let’s suppose that the AI had a use for materials that emitted infinite power in thermal radiation. In fact, as the power emitted went up, the usefulness went up without bound. Photonic rocket engines for exploring the stars, perhaps, or how fast you could loop a computational equivalent of a paper clip being produced.
Now, the AI knows that the ultraviolet catastrophe doesn’t actually occur, with very good certainty. But it could get Pascal’s wagered here—it takes actions weighted both by the probability, and by the impact the action could have. So it assigns a divergent weight to actions that benefit divergently from the ultraviolet catastrophe, and builds a infinite-power computer that it knows won’t work.
So it assigns a divergent weight to actions that benefit divergently from the ultraviolet catastrophe, and builds a infinite-power computer that it knows won’t work.
How is this different to accepting a bet it “knows” it will lose? We may know with certainty that it doesn’t live in a classical universe, because we specified the problem, but the AI doesn’t.
Well, from the perspective of the AI, it’s behaving perfectly rationally. It finds the highest-probability thing that could give it infinite reward, and then prepares for that, no matter how small the probability is. It only seems strange to us humans because (1) we’re Allais-ey, and (2) it is a clear case of logical, one-shot probability, which is less intuitive.
If our AI models the world with one set of laws at a time, rather than having a probability distribution over laws, then this behavior could pop up as a surprise.
The AI’s definition of “human” should be computational. If it discovers new physics, it may find additional physical process that implement that computation, but it should not get confused.
What if it discovers new math? Less likely, I know, but...
That model of me forced me to think of a better response :-)
http://lesswrong.com/lw/gmx/domesticating_reduced_impact_ais/8he2
And as for the assumptions, I’m more worried about the definitions: what happens when the AI realises that the definition of what a “human” is turns out to be flawed.
The AI’s definition of “human” should be computational. If it discovers new physics, it may find additional physical process that implement that computation, but it should not get confused.
Ontological crises seems to be a problem for AIs with utility functions over arrangements of particles, but it doesn’t make much sense to me to specify our utility function that way. We don’t think of what we want as arrangements of particles, we think at a much higher level of abstraction and we would be happy with any underlying physics that implemented the features of that abstraction level. Our preferences at that high level are what should generate our preferences in terms of ontologically basic stuff whatever ontology the AI ends up using.
Right—that’s the obvious angle of attack for handling ontological crises.
I am not sure that the higher-level of abstraction saves you from sliding into an ontological black hole. My analogy is in physics: Classical electromagnetism leads to the ultraviolet catastrophe, making this whole higher classical level unstable, until you get the lower levels “right”.
I can easily imagine that an attempt to specify a utility function over “a much higher level of abstraction” would result in a sort of “ultraviolet catastrophe” where the utility function can become unbounded at one end of the spectrum, until you fix the lower levels of abstraction.
Can you give me an example of an ultraviolet catastrophe, say for paperclips?
Not sure if this is what you are asking, but a paperclip maximizer not familiar with general relativity risks creating a black hole out of paper clips, losing all its hard work as a result.
That would be a problem of the AI not being able to accurately predict the consequences of its actions because it doesn’t know enough physics. An ontological crises would involve the paperclip maximizer learning new physics and therefor getting confused about what a paperclip is and maximizing something else.
Example: An AI is introduces to a large quantity of metal, and told to make paperclips. Since the AI is confined in a metal-only environment, “paperclip” is defined only as a shape.
The AI escapes from the box, and encounters a lake. It then spends some time trying to create paperclip shapes from water. After a bit of experimentation, it finds that freezing the water to ice allows it to create paperclip shapes. Moreover, it finds that any substance provided with enough heat will melt.
Therefore, in order to better create paperclip shapes from other, possibly undiscovered materials, the AI puts out the sun, and otherwise seeks to minimise the amount of heat in the universe.
Is that what you’re looking for?
Pretty sure that freezing stuff would cost lots of negentropy which Clippy could spend to make many more paperclips out of already solid materials instead.
That is an example of a paperclip maximizer failing an ontological crises. It doesn’t seem to illustrate Shminux’s concept of an “ultraviolet catastrophe”, though.
You are correct. Can you suggest an example that resolves that shortcoming?
I think that the concept of an ontological crises metaphorically similar to the ultraviolet catastrophe is confused, and I don’t expect to find a good example. I suspect that Shminux was thinking more of problems of inaccurate predictions from incomplete physics than utility functions that don’t translate correctly to new ontologies when he proposed it.
To be clear, the issue here is that it inadvertently hastens the heat death of the universe,and generally lowers it’s ability to create paperclips, right?
It’s just an example of an ontological crisis; the AI is learning new physics (cold causes water to freeze), and is not certain of what a paperclip is, and is therefore maximising something else (coldness).
The thing the paperclip maximizer is maximizing instead of paperclips is paperclip-shaped objects made out of the wrong material. Coldness is just an instrumental value, and the example could be simplified and made more plausible by taking that part out. ETA: And the relevant new physics is not that cold water freezes but that materials other than metal exist.
A good point. I hadn’t thought of it that way, but you are correct.
Exactly, yes.
Oh, right. But … it’s actually maximizing solids, which is instrumental to maximizing paperclip-shaped objects, which is what it was programmed to do in the first place. Right?
Yyyyyeeeees. That’s a fair statement of the situation.
Just checking I understand it this time, thanks :-)
Oh, OK. What are the abstraction levels a paperclip maximizer might use?
Hm.
I think it largely comes down to how you handle divergent resources. For the ultraviolet catastrophe, let’s use the example of… the ultraviolet catastrophe.
Let’s suppose that the AI had a use for materials that emitted infinite power in thermal radiation. In fact, as the power emitted went up, the usefulness went up without bound. Photonic rocket engines for exploring the stars, perhaps, or how fast you could loop a computational equivalent of a paper clip being produced.
Now, the AI knows that the ultraviolet catastrophe doesn’t actually occur, with very good certainty. But it could get Pascal’s wagered here—it takes actions weighted both by the probability, and by the impact the action could have. So it assigns a divergent weight to actions that benefit divergently from the ultraviolet catastrophe, and builds a infinite-power computer that it knows won’t work.
How is this different to accepting a bet it “knows” it will lose? We may know with certainty that it doesn’t live in a classical universe, because we specified the problem, but the AI doesn’t.
Well, from the perspective of the AI, it’s behaving perfectly rationally. It finds the highest-probability thing that could give it infinite reward, and then prepares for that, no matter how small the probability is. It only seems strange to us humans because (1) we’re Allais-ey, and (2) it is a clear case of logical, one-shot probability, which is less intuitive.
If our AI models the world with one set of laws at a time, rather than having a probability distribution over laws, then this behavior could pop up as a surprise.
Precisely. That’s all I was saying.
What if it discovers new math? Less likely, I know, but...