RHollerith comments on Best arguments against the Natural Abstractions Hypothesis applying to human values?

RHollerith 7 May 2022 15:43 UTC
2 points
By “pessimistic about the NAH”, do you mean, “does not believe the NAH”, or, “pessimistic that the fact that the AGI will have the same abstractions we have is a valuable clue for how to align the AGI”?
- Not Relevant 7 May 2022 17:32 UTC
  1 point
  Parent
  I mean “does not believe the NAH”, ie does not think that if you fine tune GPT-6 to predict “in this scenario would this action be perceived as a betrayal by a human?” that the LM would get it right essentially every time 3 random humans would agree on it.
  - RHollerith 8 May 2022 20:21 UTC
    2 points
    Parent
    Then I cannot answer your question because I’m not pessimistic about the NAH.