I mean “does not believe the NAH”, ie does not think that if you fine tune GPT-6 to predict “in this scenario would this action be perceived as a betrayal by a human?” that the LM would get it right essentially every time 3 random humans would agree on it.
I mean “does not believe the NAH”, ie does not think that if you fine tune GPT-6 to predict “in this scenario would this action be perceived as a betrayal by a human?” that the LM would get it right essentially every time 3 random humans would agree on it.
Then I cannot answer your question because I’m not pessimistic about the NAH.