By “pessimistic about the NAH”, do you mean, “does not believe the NAH”, or, “pessimistic that the fact that the AGI will have the same abstractions we have is a valuable clue for how to align the AGI”?
I mean “does not believe the NAH”, ie does not think that if you fine tune GPT-6 to predict “in this scenario would this action be perceived as a betrayal by a human?” that the LM would get it right essentially every time 3 random humans would agree on it.
By “pessimistic about the NAH”, do you mean, “does not believe the NAH”, or, “pessimistic that the fact that the AGI will have the same abstractions we have is a valuable clue for how to align the AGI”?
I mean “does not believe the NAH”, ie does not think that if you fine tune GPT-6 to predict “in this scenario would this action be perceived as a betrayal by a human?” that the LM would get it right essentially every time 3 random humans would agree on it.
Then I cannot answer your question because I’m not pessimistic about the NAH.