steven0461: Who cares about the question what the robot “actually wants”? Certainly not the robot. Humans care about the question what they “actually want”, but that’s because they have additional structure that this robot lacks.
Wei_Dai: In other words, our “actual values” come from our being philosophers, not our being consequentialists.
That’s the right answer as far as I can tell. Humans do have a part that “actually wants” something—we can introspect on our own desires—and the thermostat analogy discards it. Yes, that means any good model of our desires must also be a model of our introspective abilities, which makes the problem much harder.
I mostly agree, though you can really tell me we have the right answer once we can program it into a computer :) Human introspection is good at producing verbal behavior, but is less good at giving you a utility function on states of the universe. Part of the problem is that it’s not like we have “a part of ourselves that does introspection” like it’s some kind of orb inside our skulls—breaking human cognition into parts like that is yet another abstraction that has some free parameters to it.
Sure. Though learning from verbal descriptions of hypothetical behavior doesn’t seem much harder than learning from actual behavior—they’re both about equally far from “utility function on states of the universe” :-)
I hope so! IRL and CIRL are really nice frameworks for learning from general behavior, and as far as I can tell, learning from verbal behavior requires a simultaneous model of verbal and general behavior, with some extra parts that I don’t understand yet.
We’ve been over this:
That’s the right answer as far as I can tell. Humans do have a part that “actually wants” something—we can introspect on our own desires—and the thermostat analogy discards it. Yes, that means any good model of our desires must also be a model of our introspective abilities, which makes the problem much harder.
I mostly agree, though you can really tell me we have the right answer once we can program it into a computer :) Human introspection is good at producing verbal behavior, but is less good at giving you a utility function on states of the universe. Part of the problem is that it’s not like we have “a part of ourselves that does introspection” like it’s some kind of orb inside our skulls—breaking human cognition into parts like that is yet another abstraction that has some free parameters to it.
Sure. Though learning from verbal descriptions of hypothetical behavior doesn’t seem much harder than learning from actual behavior—they’re both about equally far from “utility function on states of the universe” :-)
I hope so! IRL and CIRL are really nice frameworks for learning from general behavior, and as far as I can tell, learning from verbal behavior requires a simultaneous model of verbal and general behavior, with some extra parts that I don’t understand yet.