Assumption 1: Most of us are not saints.
Assumption 2: AI safety is a public good.[1]
[..simple standard incentives..]
Implication: The AI safety researcher, eventually finding himself rather too unlikely to individually be pivotal on either side, may rather ‘rationally’[2] switch to ‘standard’ AI work.[3]
So: A rather simple explanation seems to suffice to make sense of the big picture basic pattern you describe.
Doesn’t mean, the inner tension you point out isn’t interesting. But I don’t think very deep psychological factors needed to explain the general ‘AI safety becomes AI instead’ tendency, which I had the impression the post was meant to suggest.
- ^
Or, unaligned/unloving/whatever AGI a public bad.
- ^
I mean: individually ‘rational’ once we factor in another trait—Assumption 1b: The unfathomable scale of potential aggregate disutility from AI gone wrong, bottoms out into a constrained ‘negative’ individual utility in terms of the emotional value non-saint Joe places on it. So a 0.1 permille probability of saving the universe may individually rationally be dominated by mundane stuff like having an still somewhat cool and well-paying job or something.
- ^
The switch may psychologically be even easier if the employer had started out as actually well-intent and may now still have a bit of an ambiguous flair.
This is provably wrong. This route will never offer any test on consciousness:
Suppose for a second that xAI in 2027, a very large LLM, will be stunning you by uttering C, where C = more profound musings about your and her own consciousness than you’ve ever even imagined!
For a given set of random variable draws R used in the randomized output generation of xAI’s uttering, S the xAI structure you’ve designed (transformers neuron arrangements or so), T the training you had given it:
What is P(C | {xAI conscious, R, S, T})? It’s 100%.
What is P(C | {xAI not conscious, R, S, T})? It’s of course also 100%. Schneider’s claims you refer to don’t change that. You know you can readily track what the each element within xAI is mathematically doing, how the bits propagate, and, if examining it in enough detail, you’d find exactly the output you observe, without resorting to any concept of consciousness or whatever.
As the probability of what you observe is exactly the same with or without consciousness in the machine, there’s no way to infer from xAI’s uttering whether it’s conscious or not.
Combining this with the fact that, as you write, biological essentialism seems odd too, does of course create a rather unbearable tension, that many may still be ignoring. When we embrace this tension, some see raise illusionism-type questions, however strange those may feel (and if I dare guess, illusionist type of thinking may already be, or may grow to be, more popular than the biological essentialism you point out, although on that point I’m merely speculating).