Thus, AI safety did not end up serving a “reality tether” function for us, or at least not sufficiently.
Due to AI safety’s absence of short feedback loops, it seems obvious to me that discussing AI safety in a rationalty training camp would pull participants away from reality (and, ironically, rationality). I predict any training camp that attempted to mix rationality with AI safety would fall into the same trap.
A rationality camp is a cool idea. An AI safety camp is a cool idea. But a rationality camp + AI safety camp is like mixing oxygen with hydrogen.
I mean… “are you making progress on how to understand what intelligence is, or other basic foundational issues to thinking about AI” does have somewhat accessible feedback loops sometimes, and did seem to me to feed back in on the rationality curriculum in useful ways.
I suspect that if we keep can our motives pure (can avoid Goodhardting on power/control/persuasion, or on “appearance of progress” of various other sorts), AI alignment research and rationality research are a great combination. One is thinking about how to build aligned intelligence in a machine, the other is thinking about how to build aligned intelligence in humans and groups of humans. There are strong analogies in the subject matter that are great to geek out about and take inspiration from, and somewhat different tests/checks you can run on each. IMO Eliezer did some great thinking on both human rationality and the AI alignment problem, and on my best guess each was partially causal of the other for him.
One is thinking about how to build aligned intelligence in a machine, the other is thinking about how to build aligned intelligence in humans and groups of humans.
Is this true though? Teaching rationality improves capability in people but shouldn’t necessarily align them. People are not AIs, but their morality doesn’t need to converge under reflection.
And even if the argument is “people are already aligned with people”, you still are working on capabilities when dealing with people and on alignment when dealing with AIs.
Teaching rationality looks more similar to AI capabilities research than AI alignment research to me.
Teaching rationality looks more similar to AI capabilities research than AI alignment research to me.
I love this question. Mostly because your model seems pretty natural and clear, and yet I disagree with it.
To me it looks more like AI alignment research, in that one is often trying to align internal processes with e.g. truth-seeking, so that a person ends up doing reasoning instead of rationalization. Or, on the group level, so that people can work together to form accurate maps and build good things, instead of working to trick each other into giving control to particular parties, assigning credit or blame to particular parties, believing that a given plan will work and so allowing that plan to move forward for reasons that’re more political than epistemic, etc.
That is, humans in practice seem to me to be partly a coalition of different subprocesses that by default waste effort bamboozling one another, or pursuing “lost purposes” without propagating the updates all the way, or whatnot. Human groups even more so.
I separately sort of think that in practice, increasing a person’s ability to see and reason and care (vs rationalizing and blaming-to-distract-themselves and so on) probably helps with ethical conduct, although I agree this is not at all obvious, and I have not made any persuasive arguments for it and do not claim it as “public knowledge.”
Ah, I see your point now, and it makes sense. If I had to summarize it (and reword it in a way that appeals to my intuition), I’d say that the choice of seeking the truth is not just about “this helps me,” but about “this is what I want/ought to do/choose”. Not just about capabilities. I don’t think I disagree at this point, although perhaps I should think about it more.
I had the suspicion that my question would be met with something at least a bit removed inference-wise from where I was starting, since my model seemed like the most natural one, and so I expected someone who routinely thinks about this topic to have updated away from it rather than not having thought about it.
Regarding the last paragraph: I already believed your line “increasing a person’s ability to see and reason and care (vs rationalizing and blaming-to-distract-themselves and so on) probably helps with ethical conduct.” It didn’t seem to bear on the argument in this case because it looks like you are getting alignment for free by improving capabilities (if you reason with my previous model, otherwise it looks like your truth-alignment efforts somehow spill over to other values, which is still getting something for free due to how humans are built I’d guess).
Also… now that I think about it, what Harry was doing with Draco in HPMOR looks a lot like aligning rather than improving capabilities, and there were good spill-over effects (which were almost the whole point in that case perhaps).
Due to AI safety’s absence of short feedback loops, it seems obvious to me that discussing AI safety in a rationalty training camp would pull participants away from reality (and, ironically, rationality). I predict any training camp that attempted to mix rationality with AI safety would fall into the same trap.
A rationality camp is a cool idea. An AI safety camp is a cool idea. But a rationality camp + AI safety camp is like mixing oxygen with hydrogen.
I mean… “are you making progress on how to understand what intelligence is, or other basic foundational issues to thinking about AI” does have somewhat accessible feedback loops sometimes, and did seem to me to feed back in on the rationality curriculum in useful ways.
I suspect that if we keep can our motives pure (can avoid Goodhardting on power/control/persuasion, or on “appearance of progress” of various other sorts), AI alignment research and rationality research are a great combination. One is thinking about how to build aligned intelligence in a machine, the other is thinking about how to build aligned intelligence in humans and groups of humans. There are strong analogies in the subject matter that are great to geek out about and take inspiration from, and somewhat different tests/checks you can run on each. IMO Eliezer did some great thinking on both human rationality and the AI alignment problem, and on my best guess each was partially causal of the other for him.
Is this true though? Teaching rationality improves capability in people but shouldn’t necessarily align them. People are not AIs, but their morality doesn’t need to converge under reflection.
And even if the argument is “people are already aligned with people”, you still are working on capabilities when dealing with people and on alignment when dealing with AIs.
Teaching rationality looks more similar to AI capabilities research than AI alignment research to me.
I love this question. Mostly because your model seems pretty natural and clear, and yet I disagree with it.
To me it looks more like AI alignment research, in that one is often trying to align internal processes with e.g. truth-seeking, so that a person ends up doing reasoning instead of rationalization. Or, on the group level, so that people can work together to form accurate maps and build good things, instead of working to trick each other into giving control to particular parties, assigning credit or blame to particular parties, believing that a given plan will work and so allowing that plan to move forward for reasons that’re more political than epistemic, etc.
That is, humans in practice seem to me to be partly a coalition of different subprocesses that by default waste effort bamboozling one another, or pursuing “lost purposes” without propagating the updates all the way, or whatnot. Human groups even more so.
I separately sort of think that in practice, increasing a person’s ability to see and reason and care (vs rationalizing and blaming-to-distract-themselves and so on) probably helps with ethical conduct, although I agree this is not at all obvious, and I have not made any persuasive arguments for it and do not claim it as “public knowledge.”
Ah, I see your point now, and it makes sense. If I had to summarize it (and reword it in a way that appeals to my intuition), I’d say that the choice of seeking the truth is not just about “this helps me,” but about “this is what I want/ought to do/choose”. Not just about capabilities. I don’t think I disagree at this point, although perhaps I should think about it more.
I had the suspicion that my question would be met with something at least a bit removed inference-wise from where I was starting, since my model seemed like the most natural one, and so I expected someone who routinely thinks about this topic to have updated away from it rather than not having thought about it.
Regarding the last paragraph: I already believed your line “increasing a person’s ability to see and reason and care (vs rationalizing and blaming-to-distract-themselves and so on) probably helps with ethical conduct.” It didn’t seem to bear on the argument in this case because it looks like you are getting alignment for free by improving capabilities (if you reason with my previous model, otherwise it looks like your truth-alignment efforts somehow spill over to other values, which is still getting something for free due to how humans are built I’d guess).
Also… now that I think about it, what Harry was doing with Draco in HPMOR looks a lot like aligning rather than improving capabilities, and there were good spill-over effects (which were almost the whole point in that case perhaps).