It’s easy to imagine AIXI-like Bayesian EU maximizers that are powerful optimizers but incapable of solving philosophical problems like consciousness, decision theory, and foundations of mathematics, which seem to be necessary in order to build an FAI. It’s possible that that’s wrong, that one can’t actually get to “not very superintelligent AIs” unless they possessed the same level of philosophical ability that humans have, but it certainly doesn’t seem safe to assume this.
Such systems, hemmed in and restrained, could certainly work on better AI designs, and predict human philosophical judgments. Predicting human philosophical judgments accurately and reporting those predictions is close enough.
Nick considered and discarded before settling on “AI control”.
“Control problem.”
It seems like he’d want to run at least some of the more novel or potentially controversial ideas in his book by a wider audience, before committing them permanently to print.)
He circulates them to reviewers, in wider circles as the book becomes more developed. And blogging half-finished idea on the internet is exactly what one shouldn’t do if one is worried about committing controversial ideas to print.
And blogging half-finished idea on the internet is exactly what one shouldn’t do if one is worried about committing controversial ideas to print.
In case this is why you don’t tend to talk about your ideas in public either, except in terse (and sometimes cryptic) comments or in fully polished papers, I wanted to note that I’ve never had a cause to regret blogging (or posting to mailing lists) any of my half-finished ideas. As long as your signal to noise ratio is fairly high, people will remember the stuff you get right and forget the stuff you get wrong. The problem I see with committing ideas to print (as in physical books) is that books don’t come with comments attached pointing out all the parts that are wrong or questionable.
Such systems, hemmed in and restrained, could certainly work on better AI designs, and predict human philosophical judgments. Predicting human philosophical judgments accurately and reporting those predictions is close enough.
If such a system is powerful enough to predict human philosophical judgments using its general intelligence, without specifically having been programmed with a correct solution for metaphilosophy, it seems very likely that it would already be strongly superintelligent in many other fields, and hence highly dangerous.
(Since you seem to state this confidently but don’t give much detail, I wonder if you’ve discussed the idea elsewhere at greater length. For example I’m assuming that you’d ask the AI to answer questions like “What would Eliezer conclude about second-order logic after thinking undisturbed about it for 100 years?” but maybe you have something else in mind?)
He circulates them to reviewers, in wider circles as the book becomes more developed. And blogging half-finished idea on the internet is exactly what one shouldn’t do if one is worried about committing controversial ideas to print.
I guess I actually meant “potentially wrong” rather than “controversial”, and I was suggesting that he blog about them after privately circulating to reviewers, but before publishing in print.
For example I’m assuming that you’d ask the AI to answer questions like “What would Eliezer conclude about second-order logic after thinking undisturbed about it for 100 years?” but maybe you have something else in mind?)
The thought is much more bite-sized and tractable questions to work with less individually capable systems (with shorter time horizons, etc) like: “find a machine-checkable proof of this lemma” or “I am going to read one of these 10 papers to try to shed light on my problem using random selection, score each with the predicted rating I will give the paper’s usefulness after reading it.” I discussed this in a presentation at the FHI (focused on WBE, where the issue of unbalanced abilities relative to humans does not apply), and the concepts will be discussed in Nick’s book.
Based on the two examples you give, which seem to suggest a workflow with a substantial portion still being done by humans (perhaps even the majority of the work in the case of the more philosophical parts of the problem), I don’t see how you’d arrive at this earlier conclusion:
If 10-50 humans can solve AI safety (and build AGI!) in less than 50 years, then 100-500 not very superhuman AIs at 1200x speedup should be able to do so in less than a month
Do you have any materials from the FHI presentation or any other writeup that you can share, that might shed more light? If not, I guess I can wait for the book...
It’s hard to discuss your specific proposal without understanding it in more detail, but in general I worry that the kind of AI you suggest would be much better at helping to improve AI capability than at helping to solve Friendliness, since solving technical problems is likely to be more of a strength for such an AI than predicting human philosophical judgments, and unless humanity develops much better coordination abilities than it has now (so that everyone can agree or be forced to refrain from trying to develop strongly superintelligent AIs until the Friendliness problem is solved), such an AI isn’t likely to ultimately contribute to a positive outcome.
Yes, the range of follow-up examples there was a bit too narrow, I was starting from the other end and working back. Smaller operations could be chained, parallelized (with limited thinking time and capacity per unit), used to check on each other in tandem with random human monitoring and processing, and otherwise leveraged to minimize the human bottleneck element.
solve Friendliness, since solving technical problems is likely to be more of a strength for such an AI than predicting human philosophical judgments,
A strong skew of abilities away from those directly useful for Friendliness development makes things worse, but leaves a lot of options. Solving technical problems can let you work to, e.g.
Create AIs with ability distributions directed more towards “philosophical” problems
Create AIs with simple sensory utility functions that are easier to ‘domesticate’ (short time horizons, satiability, dependency on irreplaceable cryptographic rewards that only the human creators can provide, etc)
Solve the technical problems of making a working brain emulation model
Create software to better detect and block unintended behavior,
coordination
Yes, that’s the biggest challenge for such bootstrapping approaches, which depends on the speedup in safety development one gets out of early models, the degree of international peace and cooperation, and so forth.
Smaller operations could be chained, parallelized (with limited thinking time and capacity per unit), used to check on each other in tandem with random human monitoring and processing, and otherwise leveraged to minimize the human bottleneck element.
This strikes me as quite risky, as the amount of human monitoring has to be really minimal in order to solve a 50-year problem in 1 month, and earlier experiences with slower and less capable AIs seem unlikely to adequately prepare the human designers to come up with fully robust control schemes, especially if you are talking about a time scale of months. Can you say a bit more about the conditions you envision where this proposal would be expected to make a positive impact? It seems to me like it might be a very narrow range of conditions. For example if the degree of international peace and cooperation is very high, then a better alternative may be an international agreement to develop WBE tech while delaying AGI, or an international team to take as much time as needed to build FAI while delaying other forms of AGI.
I tend to think that such high degrees of global coordination are implausible, and therefore put most of my hope in scenarios where some group manages to obtain a large tech lead over the rest of the world and are thereby granted a measure of strategic initiative in choosing how best to navigate the intelligence explosion. Your proposal might be useful in such a scenario, if other seemingly safer alternatives (like going for WBE, or having genetically enhanced humans build FAI with minimal AGI assistance) are out of reach due to time or resource constraints. It’s still unclear to me why you called your point “strategy-swallowing” though, or what that phrase means exactly. Can you please explain?
I certainly didn’t say that would be risk-free, but it interacts with other drag factors on very high estimates of risk. In the full-length discussion of it, I pair it with discussion of historical lags in tech development between leader and follower in technological arms races (longer than one month) and factors relative to corporate and international espionage, raise the possibility of global coordination (or at least between the leader and next closest follower), and so on.
It also interacts with technical achievements in producing ‘domesticity’ short of exact unity of will.
It’s still unclear to me why you called your point “strategy-swallowing” though, or what that phrase means exactly.
When strategy A to a large extent can capture the impacts of strategy B.
I certainly didn’t say that would be risk-free, but it interacts with other drag factors on very high estimates of risk.
If you’re making the point as part of an argument against “either Eliezer’s FAI plan succeeds, or the world dies” then ok, that makes sense. ETA: But it seems like it would be very easy to take “if humans can do it, then not very superintelligent AIs can” out of context, so I’d suggest some other way of making this point.
When strategy A to a large extent can capture the impacts of strategy.
Sorry, I’m still not getting it. What does “impacts of strategy” mean here?
Such systems, hemmed in and restrained, could certainly work on better AI designs, and predict human philosophical judgments. Predicting human philosophical judgments accurately and reporting those predictions is close enough.
“Control problem.”
He circulates them to reviewers, in wider circles as the book becomes more developed. And blogging half-finished idea on the internet is exactly what one shouldn’t do if one is worried about committing controversial ideas to print.
In case this is why you don’t tend to talk about your ideas in public either, except in terse (and sometimes cryptic) comments or in fully polished papers, I wanted to note that I’ve never had a cause to regret blogging (or posting to mailing lists) any of my half-finished ideas. As long as your signal to noise ratio is fairly high, people will remember the stuff you get right and forget the stuff you get wrong. The problem I see with committing ideas to print (as in physical books) is that books don’t come with comments attached pointing out all the parts that are wrong or questionable.
If such a system is powerful enough to predict human philosophical judgments using its general intelligence, without specifically having been programmed with a correct solution for metaphilosophy, it seems very likely that it would already be strongly superintelligent in many other fields, and hence highly dangerous.
(Since you seem to state this confidently but don’t give much detail, I wonder if you’ve discussed the idea elsewhere at greater length. For example I’m assuming that you’d ask the AI to answer questions like “What would Eliezer conclude about second-order logic after thinking undisturbed about it for 100 years?” but maybe you have something else in mind?)
I guess I actually meant “potentially wrong” rather than “controversial”, and I was suggesting that he blog about them after privately circulating to reviewers, but before publishing in print.
The thought is much more bite-sized and tractable questions to work with less individually capable systems (with shorter time horizons, etc) like: “find a machine-checkable proof of this lemma” or “I am going to read one of these 10 papers to try to shed light on my problem using random selection, score each with the predicted rating I will give the paper’s usefulness after reading it.” I discussed this in a presentation at the FHI (focused on WBE, where the issue of unbalanced abilities relative to humans does not apply), and the concepts will be discussed in Nick’s book.
Based on the two examples you give, which seem to suggest a workflow with a substantial portion still being done by humans (perhaps even the majority of the work in the case of the more philosophical parts of the problem), I don’t see how you’d arrive at this earlier conclusion:
Do you have any materials from the FHI presentation or any other writeup that you can share, that might shed more light? If not, I guess I can wait for the book...
It’s hard to discuss your specific proposal without understanding it in more detail, but in general I worry that the kind of AI you suggest would be much better at helping to improve AI capability than at helping to solve Friendliness, since solving technical problems is likely to be more of a strength for such an AI than predicting human philosophical judgments, and unless humanity develops much better coordination abilities than it has now (so that everyone can agree or be forced to refrain from trying to develop strongly superintelligent AIs until the Friendliness problem is solved), such an AI isn’t likely to ultimately contribute to a positive outcome.
Yes, the range of follow-up examples there was a bit too narrow, I was starting from the other end and working back. Smaller operations could be chained, parallelized (with limited thinking time and capacity per unit), used to check on each other in tandem with random human monitoring and processing, and otherwise leveraged to minimize the human bottleneck element.
A strong skew of abilities away from those directly useful for Friendliness development makes things worse, but leaves a lot of options. Solving technical problems can let you work to, e.g.
Create AIs with ability distributions directed more towards “philosophical” problems
Create AIs with simple sensory utility functions that are easier to ‘domesticate’ (short time horizons, satiability, dependency on irreplaceable cryptographic rewards that only the human creators can provide, etc)
Solve the technical problems of making a working brain emulation model
Create software to better detect and block unintended behavior,
Yes, that’s the biggest challenge for such bootstrapping approaches, which depends on the speedup in safety development one gets out of early models, the degree of international peace and cooperation, and so forth.
This strikes me as quite risky, as the amount of human monitoring has to be really minimal in order to solve a 50-year problem in 1 month, and earlier experiences with slower and less capable AIs seem unlikely to adequately prepare the human designers to come up with fully robust control schemes, especially if you are talking about a time scale of months. Can you say a bit more about the conditions you envision where this proposal would be expected to make a positive impact? It seems to me like it might be a very narrow range of conditions. For example if the degree of international peace and cooperation is very high, then a better alternative may be an international agreement to develop WBE tech while delaying AGI, or an international team to take as much time as needed to build FAI while delaying other forms of AGI.
I tend to think that such high degrees of global coordination are implausible, and therefore put most of my hope in scenarios where some group manages to obtain a large tech lead over the rest of the world and are thereby granted a measure of strategic initiative in choosing how best to navigate the intelligence explosion. Your proposal might be useful in such a scenario, if other seemingly safer alternatives (like going for WBE, or having genetically enhanced humans build FAI with minimal AGI assistance) are out of reach due to time or resource constraints. It’s still unclear to me why you called your point “strategy-swallowing” though, or what that phrase means exactly. Can you please explain?
I certainly didn’t say that would be risk-free, but it interacts with other drag factors on very high estimates of risk. In the full-length discussion of it, I pair it with discussion of historical lags in tech development between leader and follower in technological arms races (longer than one month) and factors relative to corporate and international espionage, raise the possibility of global coordination (or at least between the leader and next closest follower), and so on.
It also interacts with technical achievements in producing ‘domesticity’ short of exact unity of will.
When strategy A to a large extent can capture the impacts of strategy B.
If you’re making the point as part of an argument against “either Eliezer’s FAI plan succeeds, or the world dies” then ok, that makes sense. ETA: But it seems like it would be very easy to take “if humans can do it, then not very superintelligent AIs can” out of context, so I’d suggest some other way of making this point.
Sorry, I’m still not getting it. What does “impacts of strategy” mean here?