I mostly agree, though you can really tell me we have the right answer once we can program it into a computer :) Human introspection is good at producing verbal behavior, but is less good at giving you a utility function on states of the universe. Part of the problem is that it’s not like we have “a part of ourselves that does introspection” like it’s some kind of orb inside our skulls—breaking human cognition into parts like that is yet another abstraction that has some free parameters to it.
Sure. Though learning from verbal descriptions of hypothetical behavior doesn’t seem much harder than learning from actual behavior—they’re both about equally far from “utility function on states of the universe” :-)
I hope so! IRL and CIRL are really nice frameworks for learning from general behavior, and as far as I can tell, learning from verbal behavior requires a simultaneous model of verbal and general behavior, with some extra parts that I don’t understand yet.
I mostly agree, though you can really tell me we have the right answer once we can program it into a computer :) Human introspection is good at producing verbal behavior, but is less good at giving you a utility function on states of the universe. Part of the problem is that it’s not like we have “a part of ourselves that does introspection” like it’s some kind of orb inside our skulls—breaking human cognition into parts like that is yet another abstraction that has some free parameters to it.
Sure. Though learning from verbal descriptions of hypothetical behavior doesn’t seem much harder than learning from actual behavior—they’re both about equally far from “utility function on states of the universe” :-)
I hope so! IRL and CIRL are really nice frameworks for learning from general behavior, and as far as I can tell, learning from verbal behavior requires a simultaneous model of verbal and general behavior, with some extra parts that I don’t understand yet.