I agree that a distillation of a complex problem statement to a simple technical problem represents real understanding and progress, and is valuable thereby. But I don’t think your summary of the first half of the AI safety problem is one of these.
The central difficulty that stops this from being a “mere” engineering problem is that we don’t know what “safe” is to mean in practice; that is, we don’t understand in detail *what properties we would desire a solution to satisfy*. From an engineering perspective, that marks the difference between a hard problem, and a confused (and, usually, confusing) problem.
When people were first trying to build an airplane, they could write down a simple property that would characterize a solution to the problem they were solving: (thing) is heavier than air yet manages to stay out of contact with the ground, for, let say at least minutes at a time. Of course this was never the be-all end-all of what they were trying to accomplish, but this was the central hard problem a solution of which they expected to be able to build on incrementally into the unknown direction of Progress.
I can say the same for, for example, the “intelligence” part of the AI safety problem. Using Eliezer Yudkowsky’s optimization framework, I think I have a decent idea of what properties I would want a system to have when I say I want to build an “intelligence”. That understanding may or may not be the final word on the topic for all time, but at least it is a distillation that can function as a “mere” engineering problem, a solution for which I can recognize as such and which we can then improve on.
But for the “safe” part of the problem, I don’t have a good idea about what properties I want the system to achieve at all. I have a lot of complex intuitions on the problem, including simple-ish ideas that seem to be an important part of it and some insight of what is definitely *not* what I want, but I can’t distill this down to a technical requirement that I could push towards. If you were to just hand me a candidate safe AI on a platter, I don’t think I could recognize it for what it is; I could definitely reject *some* failed attempts, but I could not tell whether your candidate solution is actually correct or whether it has a flaw I did not see yet. Unless your solution comes with a mighty lecture series explaining exactly why your solution is what I actually want, it will not count as a “solution”. Which makes the “safe” part of your summary, in my mind, neither really substantive understanding, nor a technical engineering problem yet.
I parse you as pointing to the clarification of a vague problem like “flight” or “safety” or “heat” into an incrementally more precise concept or problem statement. I agree this type of clarification is ultra important and represents real progress in solving a problem, and I agree that my post absolutely did not do this. But I was actually shooting for something quite different.
I was shooting for a problem statement that (1) causes people to work on the problem, and (2) causes them to work on the right part of the problem. I claim it is possible to formulate such a problem statement without doing any clarification in the sense that you pointed at, and additionally that it is useful to do so because (1) distilled problem statements can cause additional progress to be made on a problem, and (2) clarification is super hard, so we definitely shouldn’t block causing additional work to happen until clarification happens, since addition work could be a key ingredient in getting to key clarifications.
To many newcomers to the AI safety space, the problem feels vast and amorphous, and it seems to take a long time before newcomers have confidence that they know what exactly other people in the space are actually trying to accomplish. During this phase, I’ve noticed that people are mostly not willing to work directly on the problem, because of the suspicion that they have completely misunderstood where the core of the problem actually is. This is why distillation is valuable even absent clarification.
I agree that a distillation of a complex problem statement to a simple technical problem represents real understanding and progress, and is valuable thereby. But I don’t think your summary of the first half of the AI safety problem is one of these.
The central difficulty that stops this from being a “mere” engineering problem is that we don’t know what “safe” is to mean in practice; that is, we don’t understand in detail *what properties we would desire a solution to satisfy*. From an engineering perspective, that marks the difference between a hard problem, and a confused (and, usually, confusing) problem.
When people were first trying to build an airplane, they could write down a simple property that would characterize a solution to the problem they were solving: (thing) is heavier than air yet manages to stay out of contact with the ground, for, let say at least minutes at a time. Of course this was never the be-all end-all of what they were trying to accomplish, but this was the central hard problem a solution of which they expected to be able to build on incrementally into the unknown direction of Progress.
I can say the same for, for example, the “intelligence” part of the AI safety problem. Using Eliezer Yudkowsky’s optimization framework, I think I have a decent idea of what properties I would want a system to have when I say I want to build an “intelligence”. That understanding may or may not be the final word on the topic for all time, but at least it is a distillation that can function as a “mere” engineering problem, a solution for which I can recognize as such and which we can then improve on.
But for the “safe” part of the problem, I don’t have a good idea about what properties I want the system to achieve at all. I have a lot of complex intuitions on the problem, including simple-ish ideas that seem to be an important part of it and some insight of what is definitely *not* what I want, but I can’t distill this down to a technical requirement that I could push towards. If you were to just hand me a candidate safe AI on a platter, I don’t think I could recognize it for what it is; I could definitely reject *some* failed attempts, but I could not tell whether your candidate solution is actually correct or whether it has a flaw I did not see yet. Unless your solution comes with a mighty lecture series explaining exactly why your solution is what I actually want, it will not count as a “solution”. Which makes the “safe” part of your summary, in my mind, neither really substantive understanding, nor a technical engineering problem yet.
I parse you as pointing to the clarification of a vague problem like “flight” or “safety” or “heat” into an incrementally more precise concept or problem statement. I agree this type of clarification is ultra important and represents real progress in solving a problem, and I agree that my post absolutely did not do this. But I was actually shooting for something quite different.
I was shooting for a problem statement that (1) causes people to work on the problem, and (2) causes them to work on the right part of the problem. I claim it is possible to formulate such a problem statement without doing any clarification in the sense that you pointed at, and additionally that it is useful to do so because (1) distilled problem statements can cause additional progress to be made on a problem, and (2) clarification is super hard, so we definitely shouldn’t block causing additional work to happen until clarification happens, since addition work could be a key ingredient in getting to key clarifications.
To many newcomers to the AI safety space, the problem feels vast and amorphous, and it seems to take a long time before newcomers have confidence that they know what exactly other people in the space are actually trying to accomplish. During this phase, I’ve noticed that people are mostly not willing to work directly on the problem, because of the suspicion that they have completely misunderstood where the core of the problem actually is. This is why distillation is valuable even absent clarification.