I agree that there is a lot of room for more and better academic work on this topic to reduce existential risk (including other channels like more academic research into AI safety strategies, influence on other actors like large corporations and governments, etc), but as I said at the minicamp, I think the assumptions of this model systematically lead to overestimates of effectiveness of this channel (EDIT: and would lead to overestimates of other strategies as well, including the “FAI team in a basement” strategy as I mention in my comment below).
One of the primary reasons for concern about AI risk is the likelihood of tradeoffs between safety and speed of development. Commercial or military competition make it plausible that quite extensive tradeoffs along these lines will be made, so that reckless (or self-deceived) projects are more likely to succeed first than more cautious ones. So the “random selection” assumption disproportionately favors safety.
The assumption that safety-conscious researchers always succeed in making any AI they produce safe is also fairly heroic and a substantial upward bias. There may be some cheap and simple safety measures that any safety-conscious AI project can take without significant sacrifice, but we shouldn’t assign high probability to the problem being that easy. Also, if it turns out safety is so easy, why wouldn’t any group sophisticated enough to build AI start to take such precautions once it became apparent they were making real progress?
As folks discussed at the time this idea was first presented, if ‘concern for safety’ means halting a project with high risk to pursue a lower-risk design, then unless almost all researchers are affected this just leads to a modest expected delay until someone unconcerned succeeds.
The assumption that safety-conscious researchers always succeed in making any AI they produce safe is also fairly heroic and a substantial upward bias.
What makes the SIAI team, that will be assembled, any different?
I think many of the same assumptions also lead to overestimates of the success odds of an SIAI team in creating safe AI. In general, some features that I would think conduce to safety and could differ across scenarios include:
Internal institutions and social epistemology of a project that makes it possible to slow down, or even double back, upon discovering a powerful but overly risky design, rather than automatically barreling ahead because of social inertia or releasing the data so that others do the same
The relative role of different inputs, like researchers of different ability levels, abundant computing hardware, neuroscience data, etc, in designing AI; with some patterns of input favoring higher understanding by designers of the likely behavior of their systems
Dispersion of project success, i.e. the longer a period after finding the basis of a design in which one can expect other projects not to reach the same point; the history of nuclear weapons suggests that this can be modestly large (nukes were developed by the first five powers in 1945, 1949, 1952, 1960, 1964) under some development scenarios, although near-simultaneous development is also common in science and technology
The type of AI technology: whole brain emulation looks like it could be relatively less difficult to control initially by solving social coordination problems, without developing new technology, while de novo AGI architectures may vary hugely in the difficulty of specifying decision algorithms with needed precision
Some shifts along these dimensions do seem plausible given sufficient resources and priority for safety (and suggest, to me, that there is a large spectrum of safety investments to be made beyond simply caring about).
Most potential funding exists in the donor cloud, which can reallocate resources easily enough; SIAI does not have large reserves or an endowment that would be encumbered by the nonprofit status. Ensuring that the donor cloud is sophisticated and well-informed contributes to that flexibility, but I’m not sure what other procedures you were thinking about. Formal criteria to identify more promising outside work to recommend?
Formal criteria to identify more promising outside work to recommend?
I think that might help. In this matter it all seems to be about trust.
People doing outside work have to trust that SIAI will look at their work and may be supportive. Without formal guidelines, they might suspect that their work will be judged subjectively and negatively due to potential conflict of interest due to funding.
SIAI also need to be trusted not to leak information from other projects as they evaluate them, having a formal vetted well known evaluation team might help with that.
The Donor cloud needs to trust SIAI to look at work and make a good decision about it, not just based on monkey instincts. Formal criteria might help instill that trust.
SIAI doesn’t need all this now as there aren’t any projects that need evaluating. However it is something to think about for the future.
I don’t think the SIAI has much experience writing code, or programming machine learning applications.
Eliezer’s FAI team currently consists of 2 people: himself and Marcello Herreshoff. Whatever its probability of success, most would seem to come from actually recruiting enough high-powered folk for a team. Certainly he thinks so, thus his focus on Overcoming Bias and then the rationality book as a tool to recruit a credible team.
Superficially, that makes them less likley to know what they are doing, and more likely to make mistakes and screw up.
Sure, ceteris paribus, although coding errors seem less likely than architectural screwups to result in catastrophic harm rather than the AI not working.
I agree that there is a lot of room for more and better academic work on this topic to reduce existential risk (including other channels like more academic research into AI safety strategies, influence on other actors like large corporations and governments, etc), but as I said at the minicamp, I think the assumptions of this model systematically lead to overestimates of effectiveness of this channel (EDIT: and would lead to overestimates of other strategies as well, including the “FAI team in a basement” strategy as I mention in my comment below).
One of the primary reasons for concern about AI risk is the likelihood of tradeoffs between safety and speed of development. Commercial or military competition make it plausible that quite extensive tradeoffs along these lines will be made, so that reckless (or self-deceived) projects are more likely to succeed first than more cautious ones. So the “random selection” assumption disproportionately favors safety.
The assumption that safety-conscious researchers always succeed in making any AI they produce safe is also fairly heroic and a substantial upward bias. There may be some cheap and simple safety measures that any safety-conscious AI project can take without significant sacrifice, but we shouldn’t assign high probability to the problem being that easy. Also, if it turns out safety is so easy, why wouldn’t any group sophisticated enough to build AI start to take such precautions once it became apparent they were making real progress?
As folks discussed at the time this idea was first presented, if ‘concern for safety’ means halting a project with high risk to pursue a lower-risk design, then unless almost all researchers are affected this just leads to a modest expected delay until someone unconcerned succeeds.
What makes the SIAI team, that will be assembled, any different?
I think many of the same assumptions also lead to overestimates of the success odds of an SIAI team in creating safe AI. In general, some features that I would think conduce to safety and could differ across scenarios include:
Internal institutions and social epistemology of a project that makes it possible to slow down, or even double back, upon discovering a powerful but overly risky design, rather than automatically barreling ahead because of social inertia or releasing the data so that others do the same
The relative role of different inputs, like researchers of different ability levels, abundant computing hardware, neuroscience data, etc, in designing AI; with some patterns of input favoring higher understanding by designers of the likely behavior of their systems
Dispersion of project success, i.e. the longer a period after finding the basis of a design in which one can expect other projects not to reach the same point; the history of nuclear weapons suggests that this can be modestly large (nukes were developed by the first five powers in 1945, 1949, 1952, 1960, 1964) under some development scenarios, although near-simultaneous development is also common in science and technology
The type of AI technology: whole brain emulation looks like it could be relatively less difficult to control initially by solving social coordination problems, without developing new technology, while de novo AGI architectures may vary hugely in the difficulty of specifying decision algorithms with needed precision
Some shifts along these dimensions do seem plausible given sufficient resources and priority for safety (and suggest, to me, that there is a large spectrum of safety investments to be made beyond simply caring about).
Another factor to consider, the permeability of the team, how much they are likely to leak information to the outside world.
However if the teams are completely impermeable then it becomes hard for external entities to evaluate the other factors for evaluating the project.
Does SIAI have procedures/structures in place to shift funding between the internal team and more promising external teams if they happen to arise?
Most potential funding exists in the donor cloud, which can reallocate resources easily enough; SIAI does not have large reserves or an endowment that would be encumbered by the nonprofit status. Ensuring that the donor cloud is sophisticated and well-informed contributes to that flexibility, but I’m not sure what other procedures you were thinking about. Formal criteria to identify more promising outside work to recommend?
I think that might help. In this matter it all seems to be about trust.
People doing outside work have to trust that SIAI will look at their work and may be supportive. Without formal guidelines, they might suspect that their work will be judged subjectively and negatively due to potential conflict of interest due to funding.
SIAI also need to be trusted not to leak information from other projects as they evaluate them, having a formal vetted well known evaluation team might help with that.
The Donor cloud needs to trust SIAI to look at work and make a good decision about it, not just based on monkey instincts. Formal criteria might help instill that trust.
SIAI doesn’t need all this now as there aren’t any projects that need evaluating. However it is something to think about for the future.
I don’t think the SIAI has much experience writing code, or programming machine learning applications.
Superficially, that makes them less likley to know what they are doing, and more likely to make mistakes and screw up.
Eliezer’s FAI team currently consists of 2 people: himself and Marcello Herreshoff. Whatever its probability of success, most would seem to come from actually recruiting enough high-powered folk for a team. Certainly he thinks so, thus his focus on Overcoming Bias and then the rationality book as a tool to recruit a credible team.
Sure, ceteris paribus, although coding errors seem less likely than architectural screwups to result in catastrophic harm rather than the AI not working.