Imagine that one day we or our descendants build or become superintelligent super-competent philosophers who after exhaustively investigating moral philosophy for millions of years, decide that some moral theory or utility function is definitely right.
But what is the reason to think that we or our descendants would have a better chance of finding this kind of “definitely right” moral theory or utility function than other AIs or their descendants?
In some sense, the point of OP is that the difference between “us” and “not-us” here might be more nebulous than we usually believe, and that a more equal treatment is called for.
Otherwise, one might also argue (in a symmetric fashion) that we would destroy moral option value by preventing other entities who might have a better chance of building or becoming “superintelligent super-competent philosophers” from having a shot at that...
But what is the reason to think that we or our descendants would have a better chance of finding this kind of “definitely right” moral theory or utility function than other AIs or their descendants?
Humans have a history of making philosophical progress. We lack similar empirical evidence for AIs. I’ll reevaluate my position if that changes, with the caveat that I want some reassurance that the AI is doing correct philosophy, not just optimizing to persuade me or humans in general (which I’m afraid will be the default).
What is the right morality may be partly or wholly subjective (I’m not sure), in which case AIs will end up converging to different moral conclusions from us, independently of philosophical competence, and from our perspective, the right thing to do would be to follow our own conclusions.
But I don’t know to what extent productive studies in philosophy at the top level of competence in philosophy are at all compatible with safety concerns. It’s not an accident that people using base models show nice progress in joint human-AI philosophical brainstorms, whereas people using tamed models seem to be saying that those models are not creative enough, and that those models don’t think in sufficiently non-standard ways.
It’s might be a fundamental problem which might not have anything to do with human-AI differences. For example, Nietzsche is an important radical philosopher, and we need biological or artificial philosophers performing not just on that level, but on a higher level than that, if we want them to properly address fundamental problems, but Nietzsche is not “safe” in any way, shape, or form.
Humans have a history of making philosophical progress. We lack similar empirical evidence for AIs.
Hybrid philosophical discourse done by human-AI collaborations can be very good. For example, I feel that Janus has been doing very strong work in this sense with base models (so, not with RLHF’d, Constitutional, or otherwise “lesioned” and “mode-collapsed” models we tend to mostly use these days).
But, indeed, this does not tell us much about what would AIs do on their own.
But what is the reason to think that we or our descendants would have a better chance of finding this kind of “definitely right” moral theory or utility function than other AIs or their descendants?
In some sense, the point of OP is that the difference between “us” and “not-us” here might be more nebulous than we usually believe, and that a more equal treatment is called for.
Otherwise, one might also argue (in a symmetric fashion) that we would destroy moral option value by preventing other entities who might have a better chance of building or becoming “superintelligent super-competent philosophers” from having a shot at that...
Humans have a history of making philosophical progress. We lack similar empirical evidence for AIs. I’ll reevaluate my position if that changes, with the caveat that I want some reassurance that the AI is doing correct philosophy, not just optimizing to persuade me or humans in general (which I’m afraid will be the default).
So far AI capabilities seem more tilted towards technological progress than philosophical progress (compared to humans). See also AI doing philosophy = AI generating hands? for more reasons to worry about this. Under these circumstances it seems very easy to permanently mess up the trajectory of philosophical progress, for example by locking in one’s current conception of what’s right, or inventing new technology capable of corrupting everyone’s values without knowing how to defend against that.
What is the right morality may be partly or wholly subjective (I’m not sure), in which case AIs will end up converging to different moral conclusions from us, independently of philosophical competence, and from our perspective, the right thing to do would be to follow our own conclusions.
But I don’t know to what extent productive studies in philosophy at the top level of competence in philosophy are at all compatible with safety concerns. It’s not an accident that people using base models show nice progress in joint human-AI philosophical brainstorms, whereas people using tamed models seem to be saying that those models are not creative enough, and that those models don’t think in sufficiently non-standard ways.
It’s might be a fundamental problem which might not have anything to do with human-AI differences. For example, Nietzsche is an important radical philosopher, and we need biological or artificial philosophers performing not just on that level, but on a higher level than that, if we want them to properly address fundamental problems, but Nietzsche is not “safe” in any way, shape, or form.
Thanks, that’s very informative.
Hybrid philosophical discourse done by human-AI collaborations can be very good. For example, I feel that Janus has been doing very strong work in this sense with base models (so, not with RLHF’d, Constitutional, or otherwise “lesioned” and “mode-collapsed” models we tend to mostly use these days).
But, indeed, this does not tell us much about what would AIs do on their own.