Therefore rational beliefs are contagious, among honest folk who believe each other to be honest. And it’s why a claim that your beliefs are not contagious—that you believe for private reasons which are not transmissible—is so suspicious. If your beliefs are entangled with reality, they should be contagious among honest folk.
I don’t get this inference. seems like the belief itself is the evidence—and you entangle your friend with the object of your belief just by telling them your belief—regardless if you can explain the reasons? (private beliefs seem to me suspicious on other grounds)
I don’t quite understand the distinction your’e drawing here.
In both cases the AI was never trying to pursue happiness. In both cases it was pursuing something else, shmappiness, that correlated strongly with causing happiness in the training but not deployment environments. In both cases strength matters for making this disastrous as it will find more disastrous ways of pursuing schmappiness, It’s just that the it is pursuing different varieties of shmappiness in the different cases.
I don’t have a view on whether “goal misgeneralisation” as a term is optimal for this kind of thing.