cousin_it comments on Humans aren’t agents—what then for value learning?

cousin_it 17 Mar 2019 8:59 UTC
4 points
We’ve been over this:

steven0461: Who cares about the question what the robot “actually wants”? Certainly not the robot. Humans care about the question what they “actually want”, but that’s because they have additional structure that this robot lacks.

Wei_Dai: In other words, our “actual values” come from our being philosophers, not our being consequentialists.

That’s the right answer as far as I can tell. Humans do have a part that “actually wants” something—we can introspect on our own desires—and the thermostat analogy discards it. Yes, that means any good model of our desires must also be a model of our introspective abilities, which makes the problem much harder.
- Charlie Steiner 17 Mar 2019 21:55 UTC
  1 point
  Parent
  I mostly agree, though you can really tell me we have the right answer once we can program it into a computer :) Human introspection is good at producing verbal behavior, but is less good at giving you a utility function on states of the universe. Part of the problem is that it’s not like we have “a part of ourselves that does introspection” like it’s some kind of orb inside our skulls—breaking human cognition into parts like that is yet another abstraction that has some free parameters to it.
  - cousin_it 18 Mar 2019 12:00 UTC
    2 points
    Parent
    Sure. Though learning from verbal descriptions of hypothetical behavior doesn’t seem much harder than learning from actual behavior—they’re both about equally far from “utility function on states of the universe” :-)
    - Charlie Steiner 18 Mar 2019 16:56 UTC
      1 point
      Parent
      I hope so! IRL and CIRL are really nice frameworks for learning from general behavior, and as far as I can tell, learning from verbal behavior requires a simultaneous model of verbal and general behavior, with some extra parts that I don’t understand yet.