Stuart, have you looked at AIs that don’t have utility functions?
I don’t think people want their leaders to follow deontological rules. It’s more like “I wish they would follow rule X whenever possible.” The last part if pretty important. “When possible” means “when it doesn’t lead the negative utility outcomes.” Or rather, “if they just followed this rule they’d be more likely to end up in good outcomes vs bad outcomes.”
This are all ways of describing a heuristic, not a hard utility rule. No big surprise there, because humans are composed of heuristics, not hard unbreakable rules.
Stuart, have you looked at AIs that don’t have utility functions?
They tend not to be stable; and there are a few suggestions floating around. But this design might result in such an AI; it might have a utility function, but wouldn’t be a mindless maximiser.
Yes, well that is a tautology. What do you mean by stable? I assume you mean value-stable, which can be interpreted as maximizes-the-same-function-over-time. Something which does not behave as a utility maximizer therefore is pretty much by definition not “stable”. By technical definition, at least.
My point was more that this “instability” is in fact the desirable outcome—people wouldn’t want technical-stability, they’d want perhaps a heuristic machine with sensible defaults and rational update procedures.
There are other ways of interpreting value stability; a satisficer is one example. But those don’t tend to be stable
That statement does not make sense. I hope if you read it with a fresh mind you can see why. “There are other ways of defining stable, but they are not stable.” Perhaps you need to taboo the word stable here?
And would those defaults and update procedures remain stable themselves?
No, and that’s the whole point! Stability is scary. Stability leads to Clippy. People wouldn’t want stable. They’d want sensible. Sensible updates its behavior based on new information.
“There are some agents that are defined to have constant value systems, where, nonetheless, the value system will drift in practice”.
Stability leads to Clippy.
There are many bad stable outcomes. And an unstable update system will eventually fall into one of them, because they’re attractor states. To avoid this, you need to define “sensible” in such a way as the agent never enters such states. You’ve effectively promoting a different kind of goal stability—a zone of stability, rather than a single point. It’s not intrinsically a bad idea, but it’s not clear that its easer than finding a single idea goal system. And it’s very underdefined at his point.
“There are some agents that are defined to have constant value systems, where, nonetheless, the value system will drift in practice”.
Ok, we are now quite deep in a threat that started with me pointing out that a constant value system might be a bad thing! People want machines whose actions align with their own morality, and humans don’t have constant value systems (maybe this is where we disagree?).
There are many bad stable outcomes. And an unstable update system will eventually fall into one of them, because they’re attractor states.
Why don’t we seem humans drifting into being sociopaths? E.g. starting as a normal, well adjusted human being and then becoming sociopaths as they get older?
Why don’t we seem humans drifting into being sociopaths? E.g. starting as a normal, well adjusted human being and then becoming sociopaths as they get older?
That’s an interesting question, partially because we’d want to copy that and implement it in AI. A large part of it seems to be social pressure, and lack of power: people must respond to social pressure, because they don’t have the power to ignore it (a superintelligent AI would be very different, as would a superintelligent human). This is also connected with some evolutionary instincts, which cause us to behave in many ways as if we were in a tribal society with high costs to deviant behaviour—even if this is no longer the case.
The other main reason is evolution itself: very good at producing robustness, terrible at efficiency. If/when humans start self modifying freely, I’d start being worried about that tendency for them too...
Stuart, have you looked at AIs that don’t have utility functions?
Such AIs would not satisfy the axioms of VNM-rationality, meaning their preferences wouldn’t be structured intuitively, meaning… well, I’m not sure what, exactly, but since “intuitively” generally refers to human intuition, I think humanity probably wouldn’t like that.
Since human beings are not utility maximizes and intuition is based on comparison to our own reference class experience, I question your assumption that only VNM-rational agents would behave intuitively.
I’m not sure humans aren’t utility maximizers. They simply don’t maximize utility over worldstates. I do feel, however, that it’s plausible humans are utility maximizers over brainstates.
(Also, even if humans aren’t utility maximizers, that doesn’t mean they will find the behavior other non-utility-maximizing agents intuitive. Humans often find the behavior of other humans extraordinarily unintuitive, for example—and these are identical brain designs we’re talking about, here. If we start considering larger regions in mindspace, there’s no guarantee that humans would like a non-utility-maximizing AI.)
Stuart, have you looked at AIs that don’t have utility functions?
I don’t think people want their leaders to follow deontological rules. It’s more like “I wish they would follow rule X whenever possible.” The last part if pretty important. “When possible” means “when it doesn’t lead the negative utility outcomes.” Or rather, “if they just followed this rule they’d be more likely to end up in good outcomes vs bad outcomes.”
This are all ways of describing a heuristic, not a hard utility rule. No big surprise there, because humans are composed of heuristics, not hard unbreakable rules.
They tend not to be stable; and there are a few suggestions floating around. But this design might result in such an AI; it might have a utility function, but wouldn’t be a mindless maximiser.
Yes, well that is a tautology. What do you mean by stable? I assume you mean value-stable, which can be interpreted as maximizes-the-same-function-over-time. Something which does not behave as a utility maximizer therefore is pretty much by definition not “stable”. By technical definition, at least.
My point was more that this “instability” is in fact the desirable outcome—people wouldn’t want technical-stability, they’d want perhaps a heuristic machine with sensible defaults and rational update procedures.
There are other ways of interpreting value stability; a satisficer is one example. But those don’t tend to be stable: http://lesswrong.com/lw/854/satisficers_want_to_become_maximisers/
And would those defaults and update procedures remain stable themselves?
That statement does not make sense. I hope if you read it with a fresh mind you can see why. “There are other ways of defining stable, but they are not stable.” Perhaps you need to taboo the word stable here?
No, and that’s the whole point! Stability is scary. Stability leads to Clippy. People wouldn’t want stable. They’d want sensible. Sensible updates its behavior based on new information.
“There are some agents that are defined to have constant value systems, where, nonetheless, the value system will drift in practice”.
There are many bad stable outcomes. And an unstable update system will eventually fall into one of them, because they’re attractor states. To avoid this, you need to define “sensible” in such a way as the agent never enters such states. You’ve effectively promoting a different kind of goal stability—a zone of stability, rather than a single point. It’s not intrinsically a bad idea, but it’s not clear that its easer than finding a single idea goal system. And it’s very underdefined at his point.
Ok, we are now quite deep in a threat that started with me pointing out that a constant value system might be a bad thing! People want machines whose actions align with their own morality, and humans don’t have constant value systems (maybe this is where we disagree?).
Why don’t we seem humans drifting into being sociopaths? E.g. starting as a normal, well adjusted human being and then becoming sociopaths as they get older?
That’s an interesting question, partially because we’d want to copy that and implement it in AI. A large part of it seems to be social pressure, and lack of power: people must respond to social pressure, because they don’t have the power to ignore it (a superintelligent AI would be very different, as would a superintelligent human). This is also connected with some evolutionary instincts, which cause us to behave in many ways as if we were in a tribal society with high costs to deviant behaviour—even if this is no longer the case.
The other main reason is evolution itself: very good at producing robustness, terrible at efficiency. If/when humans start self modifying freely, I’d start being worried about that tendency for them too...
Such AIs would not satisfy the axioms of VNM-rationality, meaning their preferences wouldn’t be structured intuitively, meaning… well, I’m not sure what, exactly, but since “intuitively” generally refers to human intuition, I think humanity probably wouldn’t like that.
Since human beings are not utility maximizes and intuition is based on comparison to our own reference class experience, I question your assumption that only VNM-rational agents would behave intuitively.
I’m not sure humans aren’t utility maximizers. They simply don’t maximize utility over worldstates. I do feel, however, that it’s plausible humans are utility maximizers over brainstates.
(Also, even if humans aren’t utility maximizers, that doesn’t mean they will find the behavior other non-utility-maximizing agents intuitive. Humans often find the behavior of other humans extraordinarily unintuitive, for example—and these are identical brain designs we’re talking about, here. If we start considering larger regions in mindspace, there’s no guarantee that humans would like a non-utility-maximizing AI.)