Great post. Reminded me a bit of this one. As a concrete tool for discussing AI risk scenarios, I wonder if it doesn’t have too many parameters? Like you have to specify the map (at least locally), how far we can see, what will the impact be of this and that research… I’m not saying I don’t see the value of this approach, I’m just wondering where to start when trying to apply this tool. More of a methodological question.
Similarly, AlphaGo will be closer to AlphaZero than to Quicksort, and different iterations of hypothetical IDA would be closer to each other than two random systems.
Maybe I’m reading too much in the mention of quicksort here, but I deduce that the abstract space is one of algorithms/programs, not AIs. I don’t think it breaks anything you say, but that’s potentially different—I don’t know anyone who would call quicksort AI.
If we currently have some AI system x, we can ask which system are reachable from x --- i.e., which other systems are we capable of constructing now that we have x.
What are the constraints on “construction”? Because by definition, if those others systems are algorithms/programs, we can build them. It might be easier or faster to build them using x, but it doesn’t make construction possible where it was impossible before. I guess what you’re aiming at is something like “x reaches y” as a form of “x is stronger than y”, and also “if y is stronger, it should be considered when thinking about x”?
In general, we could (and perhaps should) consider each of the currently-active AI systems individually. However, this might make it difficult or impossible to identify which negative effects are due to which system. We will, therefore, assume that at any given time, there is only a single system that matters for our considerations (either because there is a single system that matters more than all others taken together, or because we can isolate the effects of individual systems, or because we are viewing all systems jointly as a single super-system). This allows us to translate the 0-10 IH scale into colours and visualize the space of AI systems as a “risk map” (Figure 2).
This part confused me. Are you just saying that to have a color map (a map from the space of AIs to R), you need to integrate all systems into one, or consider all systems but one as irrelevant?
We will assume that we always have sufficient information about what is going on in the world to determine the IH of the current system. Then there will be some systems that are sufficiently similar to the current one (or simple enough to reason about) so that we also have accurate information about their IH. Some other systems might be more opaque, so we will only have fuzzy predictions about their IH. For others yet, we might have no information at all. We can visualize this as decreasing the visibility in parts of the image, indicating that our information is less likely to correspond to reality (Figure 3).
As a concrete tool for discussing AI risk scenarios, I wonder if it doesn’t have too many parameters? Like you have to specify the map (at least locally), how far we can see, what will the impact be of this and that research…
I agree that the model does have quite a few parameters. I think you can get some value out of it already by being aware of what the different parameters are and, in case of a disagreement, identifying the one you disagree about the most.
>If we currently have some AI system x, we can ask which system are reachable from x—i.e., which other systems are we capable of constructing now that we have x.
What are the constraints on “construction”? Because by definition, if those others systems are algorithms/programs, we can build them. It might be easier or faster to build them using x, but it doesn’t make construction possible where it was impossible before. I guess what you’re aiming at is something like “x reaches y” as a form of “x is stronger than y”, and also “if y is stronger, it should be considered when thinking about x”?
I agree that if we can build x, and then build y (with the help of x), then we can in particular build y. So having/not having x doesn’t make a difference on what is possible. I implicitly assumed some kind of constraint on the construction like “what can we build in half a year”, “what can we build for $1M” or, more abstractly, “how far can we get using X units of effort”.
[About the colour map:] This part confused me. Are you just saying that to have a color map (a map from the space of AIs to R), you need to integrate all systems into one, or consider all systems but one as irrelevant?
The goal was more to say that in general, it seems hard to reason about the effects of individual AI systems. But if we make some of the extra assumptions, it will make more sense to treat harmfulness as a function from AIs to $$\mathbb R$$.
Great post. Reminded me a bit of this one. As a concrete tool for discussing AI risk scenarios, I wonder if it doesn’t have too many parameters? Like you have to specify the map (at least locally), how far we can see, what will the impact be of this and that research… I’m not saying I don’t see the value of this approach, I’m just wondering where to start when trying to apply this tool. More of a methodological question.
Maybe I’m reading too much in the mention of quicksort here, but I deduce that the abstract space is one of algorithms/programs, not AIs. I don’t think it breaks anything you say, but that’s potentially different—I don’t know anyone who would call quicksort AI.
What are the constraints on “construction”? Because by definition, if those others systems are algorithms/programs, we can build them. It might be easier or faster to build them using x, but it doesn’t make construction possible where it was impossible before. I guess what you’re aiming at is something like “x reaches y” as a form of “x is stronger than y”, and also “if y is stronger, it should be considered when thinking about x”?
A potentially relevant idea is oracles in complexity theory.
This part confused me. Are you just saying that to have a color map (a map from the space of AIs to R), you need to integrate all systems into one, or consider all systems but one as irrelevant?
This seem like a great visualization.
I agree that the model does have quite a few parameters. I think you can get some value out of it already by being aware of what the different parameters are and, in case of a disagreement, identifying the one you disagree about the most.
I agree that if we can build x, and then build y (with the help of x), then we can in particular build y. So having/not having x doesn’t make a difference on what is possible. I implicitly assumed some kind of constraint on the construction like “what can we build in half a year”, “what can we build for $1M” or, more abstractly, “how far can we get using X units of effort”.
The goal was more to say that in general, it seems hard to reason about the effects of individual AI systems. But if we make some of the extra assumptions, it will make more sense to treat harmfulness as a function from AIs to $$\mathbb R$$.