If ontology and indexicality are the two biggest problems with aligning a highly capable AGI (long-horizon consequentialist agent), another possible path forward is to create philosophically competent tool-like AI assistants to help solve these problems. And a potential source of optimism about alignment difficulty is that AI assistants (such as the ones OpenAI plans to build to do alignment research) might be philosophically competent by default (e.g., because the LLMs they are based on will have learned to do philosophical reasoning from their training data).
I personally think it’s risky to rely on automated philosophical reasoning without first understanding the nature of philosophy and reasoning (i.e., without having solved metaphilosophy), and I have some reason to think that philosophical reasoning might be especially hard for ML to learn, but also think there’s some substantial (>10%) chance that we could just get lucky on AIs being philosophically competent, or at least don’t know how to rule this out. (In other words I don’t see how to reach Eliezer’s level of p(doom) through this line of argument.)
Have you thought about these questions, and also, do you have any general views about plans like OpenAI’s, to use AI to help solve AI alignment?
I think use of AI tools could have similar results to human cognitive enhancement, which I expect to basically be helpful. They’ll have more problems with things that are enhanced by stuff like “bigger brain size” rather than “faster thought” and “reducing entropic error rates / wisdom of the crowds” because they’re trained on humans. One can in general expect more success on this sort of thing by having an idea of what problem is even being solved. There’s a lot of stuff that happens in philosophy departments that isn’t best explained by “solving the problem” (which is under-defined anyway) and could be explained by motives like “building connections”, “getting funding”, “being on the good side of powerful political coalitions”, etc. So psychology/sociology of philosophy seems like an approach to understand what is even being done when humans say they’re trying to solve philosophy problems.
If ontology and indexicality are the two biggest problems with aligning a highly capable AGI (long-horizon consequentialist agent), another possible path forward is to create philosophically competent tool-like AI assistants to help solve these problems. And a potential source of optimism about alignment difficulty is that AI assistants (such as the ones OpenAI plans to build to do alignment research) might be philosophically competent by default (e.g., because the LLMs they are based on will have learned to do philosophical reasoning from their training data).
I personally think it’s risky to rely on automated philosophical reasoning without first understanding the nature of philosophy and reasoning (i.e., without having solved metaphilosophy), and I have some reason to think that philosophical reasoning might be especially hard for ML to learn, but also think there’s some substantial (>10%) chance that we could just get lucky on AIs being philosophically competent, or at least don’t know how to rule this out. (In other words I don’t see how to reach Eliezer’s level of p(doom) through this line of argument.)
Have you thought about these questions, and also, do you have any general views about plans like OpenAI’s, to use AI to help solve AI alignment?
I think use of AI tools could have similar results to human cognitive enhancement, which I expect to basically be helpful. They’ll have more problems with things that are enhanced by stuff like “bigger brain size” rather than “faster thought” and “reducing entropic error rates / wisdom of the crowds” because they’re trained on humans. One can in general expect more success on this sort of thing by having an idea of what problem is even being solved. There’s a lot of stuff that happens in philosophy departments that isn’t best explained by “solving the problem” (which is under-defined anyway) and could be explained by motives like “building connections”, “getting funding”, “being on the good side of powerful political coalitions”, etc. So psychology/sociology of philosophy seems like an approach to understand what is even being done when humans say they’re trying to solve philosophy problems.