Not Relevant comments on Best arguments against the Natural Abstractions Hypothesis applying to human values?

Not Relevant 7 May 2022 17:32 UTC
1 point
I mean “does not believe the NAH”, ie does not think that if you fine tune GPT-6 to predict “in this scenario would this action be perceived as a betrayal by a human?” that the LM would get it right essentially every time 3 random humans would agree on it.
- RHollerith 8 May 2022 20:21 UTC
  2 points
  Parent
  Then I cannot answer your question because I’m not pessimistic about the NAH.