Charlie Steiner comments on Classification of AI alignment research: deconfusion, “good enough” non-superintelligent AI alignment, superintelligent AI alignment

Charlie Steiner 16 Jul 2020 23:20 UTC
2 points
That is not an assumption, it is an implication of the use of the concept “tree” to make predictions.
I would disagree in spirit—an AI can happily find a referent to the “tree” token that depends on context in a way that works like a word with multiple possible definitions.
Picking an architecture which matches the structure of our universe closely enough to perform well with limited data is a key problem
I hope this is where we can start agreeing. Because the problem isn’t just finding something that performs well according to a known scoring rule. We don’t quite know how to implement the notion “this method for learning human values performs well” on a computer without basically already referring to some notion of human values for “performs well.”
We either need to ground “performs well” in some theory of humans as approximate agents that doesn’t need to know about their values, or we need some theory that avoids the chicken-and-egg problem altogether by simultaneously learning human models and the standards to judge them by.
- johnswentworth 17 Jul 2020 2:35 UTC
  2 points
  Parent
  I hope this is where we can start agreeing. Because the problem isn’t just finding something that performs well according to a known scoring rule. We don’t quite know how to implement the notion “this method for learning human values performs well” on a computer without basically already referring to some notion of human values for “performs well.”
  To clarify, when said “performs well”, I did not mean “learns human values well”, nor did I have any sort of scoring rule in mind. I intended to mean that the algorithm learns patterns which are actually present in the world—much like earlier when I talked about “the human-labelling-algorithm ‘working correctly’”.
  Probably not the best choice of words on my part; sorry for causing a tangent.
  I would disagree in spirit—an AI can happily find a referent to the “tree” token that depends on context in a way that works like a word with multiple possible definitions.
  I’m sure it could, but I am claiming that such a thing would have worse predictive power. Roughly speaking: if there’s one notion of tree that includes saplings, and another that includes logs, then the model misses the ability to learn facts about saplings by examining logs. Conversely, if it doesn’t miss those sorts of things, then it isn’t actually behaving like a word with multiple possible referents. (I don’t actually think it’s that simple—the referent of “tree” is more than just a comparison class—but it hopefully suffices to make the point.)
  - Charlie Steiner 17 Jul 2020 5:59 UTC
    2 points
    Parent
    To clarify, when said “performs well”, I did not mean “learns human values well”, nor did I have any sort of scoring rule in mind. I intended to mean that the algorithm learns patterns which are actually present in the world—much like earlier when I talked about “the human-labelling-algorithm ‘working correctly’”.
    Ah well. I’ll probably argue with you more about this elsewhere, then :)