I’m curious on your thoughts of this notion of perennial philosophy and convergence of beliefs. One interpretation that I have of perennial philosophy is purely empirical: imagine that we have two “belief systems”. We could define a belief system as a set of statements about the way the world works and valuation of world states (i.e. statements “if X then Y could happen” and “Z is good to have”). You can probably formalize it some other way, but I think this is a reasonable starter pack to keep it simple. (You can also imagine further formalizing it by using numbers or lattices for values and probabilities and some well-defined FSM to model parts of the world.) We could say that the religions have converged if they share a lot of features, by which I mean that for some definition of a feature the feature is present in both belief systems. We can define a feature in many ways, but for our simple thought experiment it can be effectively a state or relation between states in the two world views. For example, we could imagine that a feature is a function from states and their values/causal relations such that under the mapping it remains unchanged (i.e. there is some notion of this mapping being like an isomorphism on the projection of the set via the function). For example, in one belief system you might have some sort of “god” character that is somehow the cause of many things. The function here could be “(int(god is cause of x1) + int(god is cause of x2) + …) / num_objects”. If we map common objects (this spoon) to themselves in the other system (still the spoon) and god to god, we will see that in both systems the function representing how causal god is, remains close to 1, and so we may say that both systems have a notion of a “god” and therefore there has been some degree of convergence in the “having a god” stuff for the two systems.
So now with all this formal BS out of the way, which I had to do, because it highlights what is missing, the question is clear: under some reasonable such definition of what convergence means, how do you decide whether two religions have converged? The vibe I get from the perennial philosophy believers that I have thus spoken to is that “you have to go through the journey to understand” and generally it appears to be a sort of dispositional convergence, at least on face value—though I do not observe people of very different religions, who claim to have convergence, conviving for a long time (meaning that it is not verifiable whether indeed, the dispositions are truly something that we could call converged). Of course, it may be possible to find mappings that claim that two belief systems have converged or not when the opposite is a more honest appreciation.
Obviously no one is going to come out here and create a mathematical definition and just “be right” (idt that’s even a fair thing to consider to be possible), but I do not particularly like making such assertions totally “on vibes”. Often people will say that they are “spiritual” and that “spirituality” helped them overcome some psychological challenge or who knows what, but what does “spiritual” mean here? Often it’s associated with some belief system that we would, as laymen, call religious or spiritual (i.e. in the enumerable list of christianity and its sub-branches, buddhism and it’s, etc...), but who is to say that it is not only some part of the phenomenon that person experienced, which happened to be caused by the spiritual system which was present at the time and place, that was the truer cause of the change of psyche? It seems compelling to me to want to decouple these “core truths” from the religions that hold them so as to have them presentable in a more neutral way, since in the alternative world where you must “go through the journey” of spirituality via some specific religion, you cannot know beforehand that you won’t be effectively brainwashed—and you cannot even know afterwards either… you can only get faint hints at it during the process.
So this is not to say that that anyone is getting brainwashed or that anything is good or bad or that anything should be embraced or not. I’m just saying that from an outside perspective, it is not verifiable whether religions actually converge, without diving into this stuff. However, it is is also not verifiable whether diving in is actually good, and it’s not verifiable whether afterwords it even will be verifiable. Maybe I’m stumbling into some core metaphysical whirlwind of “you cannot know anything” but I do truly believe that a more systematic exposition of how we should interpret spirituality, trapped priors, convergence, and the like is possible and would enable more productive discussion.
PS I think you’ve touched on something tangential in the statement that you should do this with trusted people. That’s trying to bootstrap, however, a resistance to manipulative misappropriation of spirituality, whereas what I’m saying I would also like more of a logical bootstrapping to the whole notion of spirituality and ideas like “convergence” so that one can leave the conversation with solid conclusions, knowing their limitations, and having a higher level of actionability.
PPS: I feel like treating a belief system, like “rationality” as a machine/tool: something which has a certain reach, certain limitations, and that usually behaves as expected in most situations but might have some bugs, is a good way to go. This will make it easier to decouple rationality with, say, spiritual traditions. At each point of time and space you can basically decide on common sense which of these two machines/tools is best to apply. Each different tool can be shown, hopefully, to be good for some cases and thus most decision making happens on the routing level: which tool to use. If you understand the tool from a third person point of view there is less of a tendency to rely on it in the wrong cases purely on dogma.
That’s great! Activation/representational steering is definitely important, but I wonder if it being applied right now to improve safety. I’ve read only a little bit of the literature, so maybe I’ll just find out later :P
The fact that refusal steering is possible definitely opens the possibility to gradient-based optimization attacks, or may make it possible to explain why some attacks work. Maybe you can use this to build a jailbreak detector of some kind? I do think it’s important to push to try and get techniques usable in the real world, though I also understand that science is not so linear. Where and how do you think DM’s research could get more real world grounding? (Or do you think that it’s all well and good as it stands?)