I imagine most labs wouldn’t commit to [we only get to run this training process if Eliezer thinks it’s good for global safety]
Who? Science has never worked by means of deferring to a designated authority figure. I agree, of course, that we want people to do things that make the world less rather than more likely to be destroyed. But if you have a case that a given course of action is good or bad, you should expect to be able to argue that case to knowledgeable people who have never heard of this Eliza person, whoever she is.
I remember reading a few goodblogposts about this topic by a guest author on Robin Hanson’s blog back in ’aught-seven.
This was just an example of a process I expect labs wouldn’t commit to, not (necessarily!) a suggestion.
The key criterion isn’t even appropriate levels of understanding, but rather appropriate levels of caution—and of sufficient respect for what we don’t know. The criterion [...if aysja thinks it’s good for global safety] may well be about as good as [...if Eliezer thinks it’s good for global safety].
It’s much less about [Thisperson knows], than about [This person knows that no-one knows, and has integrated this knowledge into their decision-making].
Importantly, a cautious person telling an incautious person “you really need to be cautious here” is not going to make the incautious person cautious (perhaps slightly more cautious than their baseline—but it won’t change the way they think).
A few other thoughts:
Scientific intuitions will tend to be towards doing what uncovers information efficiently. If an experiment uncovers some highly significant novel unknown that no-one was expecting, that’s wonderful from a scientific point of view. This is primarily about risk, not about science. Here the novel unknown that no-one was expecting may not lead to a load of interesting future work, since we all might be dead. We shouldn’t expect the intuitions or practices of science to robustly point the right way here.
There is no rule that says the world must play fair and ensure that it gives us compelling evidence that a certain path forward will get us killed, before we take the path that gets us killed. The only evidence available may be abstract, indirect and gesture at unknown unknowns.
The situation in ML is unprecedented, in that organizations are building extremely powerful systems that no-one understands. The “experts” [those who understand the systems best] are not experts [those who understand the systems well]. There’s no guarantee that anyone has the understanding to make the necessary case in concrete terms.
If you have a not-fully-concrete case for a certain course of action, experts are divided on that course of action, and huge economic incentives point in the other direction, you shouldn’t be shocked when somewhat knowledgeable people with huge economic incentives follow those economic incentives.
The purpose of committing to follow the outcome of an external process is precisely that it may commit you to actions that you wouldn’t otherwise take. A commitment to consult with x, hear a case from y, etc is essentially empty (if you wouldn’t otherwise seek this information, why should anyone assume you’ll be listening? If you’d seek it without the commitment, what did the commitment change?).
To the extent that decision-makers are likely to be overconfident, a commitment to defer to a less often overconfident system can be helpful. This Dario quote (full context here) doesn’t exactly suggest there’s no danger of overconfidence:
“I mean one way to think about it is like the responsible scaling plan doesn’t slow you down except where it’s absolutely necessary. It only slows you down where it’s like there’s a critical danger in this specific place, with this specific type of model, therefore you need to slow down.”
Earlier there’s: ”...and as we go up the scale we may actually get to the point where you have to very affirmatively show the safety of the model. Where you have to say yes, like you know, I’m able to look inside this model, you know with an x-ray, with interpretability techniques, and say ’yep, I’m sure that this model is not going to engage in this dangerous behaviour because, you know, there isn’t any circuitry for doing this, or there’s this reliable suppression circuitry...”
But this doesn’t address the possibility of being wrong about how early it was necessary to affirmatively show safety. Nor does it give me much confidence that “affirmatively show the safety of the model” won’t in practice mean something like “show that the model seems safe according to our state-of-the-art interpretability tools”.
Compare that to the confidence I’d have if the commitment were to meet the bar where e.g. Wei Dai agrees that you’ve “affirmatively shown the safety of the model”. (and, again, most of this comes down to Wei Dai being appropriately cautious and cognizant of the limits of our knowledge)
Who? Science has never worked by means of deferring to a designated authority figure. I agree, of course, that we want people to do things that make the world less rather than more likely to be destroyed. But if you have a case that a given course of action is good or bad, you should expect to be able to argue that case to knowledgeable people who have never heard of this Eliza person, whoever she is.
I remember reading a few good blog posts about this topic by a guest author on Robin Hanson’s blog back in ’aught-seven.
This was just an example of a process I expect labs wouldn’t commit to, not (necessarily!) a suggestion.
The key criterion isn’t even appropriate levels of understanding, but rather appropriate levels of caution—and of sufficient respect for what we don’t know. The criterion [...if aysja thinks it’s good for global safety] may well be about as good as [...if Eliezer thinks it’s good for global safety].
It’s much less about [This person knows], than about [This person knows that no-one knows, and has integrated this knowledge into their decision-making].
Importantly, a cautious person telling an incautious person “you really need to be cautious here” is not going to make the incautious person cautious (perhaps slightly more cautious than their baseline—but it won’t change the way they think).
A few other thoughts:
Scientific intuitions will tend to be towards doing what uncovers information efficiently. If an experiment uncovers some highly significant novel unknown that no-one was expecting, that’s wonderful from a scientific point of view.
This is primarily about risk, not about science. Here the novel unknown that no-one was expecting may not lead to a load of interesting future work, since we all might be dead.
We shouldn’t expect the intuitions or practices of science to robustly point the right way here.
There is no rule that says the world must play fair and ensure that it gives us compelling evidence that a certain path forward will get us killed, before we take the path that gets us killed. The only evidence available may be abstract, indirect and gesture at unknown unknowns.
The situation in ML is unprecedented, in that organizations are building extremely powerful systems that no-one understands. The “experts” [those who understand the systems best] are not experts [those who understand the systems well]. There’s no guarantee that anyone has the understanding to make the necessary case in concrete terms.
If you have a not-fully-concrete case for a certain course of action, experts are divided on that course of action, and huge economic incentives point in the other direction, you shouldn’t be shocked when somewhat knowledgeable people with huge economic incentives follow those economic incentives.
The purpose of committing to follow the outcome of an external process is precisely that it may commit you to actions that you wouldn’t otherwise take. A commitment to consult with x, hear a case from y, etc is essentially empty (if you wouldn’t otherwise seek this information, why should anyone assume you’ll be listening? If you’d seek it without the commitment, what did the commitment change?).
To the extent that decision-makers are likely to be overconfident, a commitment to defer to a less often overconfident system can be helpful. This Dario quote (full context here) doesn’t exactly suggest there’s no danger of overconfidence:
“I mean one way to think about it is like the responsible scaling plan doesn’t slow you down except where it’s absolutely necessary. It only slows you down where it’s like there’s a critical danger in this specific place, with this specific type of model, therefore you need to slow down.”
Earlier there’s:
”...and as we go up the scale we may actually get to the point where you have to very affirmatively show the safety of the model. Where you have to say yes, like you know, I’m able to look inside this model, you know with an x-ray, with interpretability techniques, and say ’yep, I’m sure that this model is not going to engage in this dangerous behaviour because, you know, there isn’t any circuitry for doing this, or there’s this reliable suppression circuitry...”
But this doesn’t address the possibility of being wrong about how early it was necessary to affirmatively show safety.
Nor does it give me much confidence that “affirmatively show the safety of the model” won’t in practice mean something like “show that the model seems safe according to our state-of-the-art interpretability tools”.
Compare that to the confidence I’d have if the commitment were to meet the bar where e.g. Wei Dai agrees that you’ve “affirmatively shown the safety of the model”. (and, again, most of this comes down to Wei Dai being appropriately cautious and cognizant of the limits of our knowledge)