Consider two strategies for developing software: top-down and bottom-up. Each strategy has its own risks. The risk of the bottom-up approach is that you build components that don’t end up being useful in the final design. The risk of the top-down approach is that you make assumptions about the sort of components it’s possible to build that don’t end up being true. Because each approach has advantages and disadvantages, I think it makes sense for people to take both approaches when planning a large, important software project such as an FAI.
OpenAI is working on how to solve the problem of adversarial examples, which is great. But I also think it’s useful for people to ask the question: suppose we had a solution to the problem of adversarial examples. Would it then be possible to build an FAI? If so, how? By answering this question, we can know things like: How useful is it for us to work on adversarial examples, relative to other bottom-up AI safety work? Are there other problems such that if we were able to solve them, and combine the solution to them with our solution to adversarial examples, we’d be able to build an FAI?
BTW, I would guess that top-down work is a comparative advantage of competitions like this one, since bottom-up work is the kind of thing that academic journals will publish.
>treacherous turns
Suppose we have an algorithm that’s known to be a behavior-executor but is otherwise a black box. If it’s truly an accurate imitation of human values, in the sense that the box agrees exactly with us about what kind of worlds are valuable, then maybe you’d still have the problem of a treacherous turn in the sense that once an AI using this box as its values gets powerful enough, it grabs hold of civilization’s steering wheel and doesn’t let go. But that wouldn’t be a problem if the AI’s values were determined by this hypothetical perfect black box, because it would steer us exactly where we wanted to go anyway.
>nonsentient uploads
This seems to point to a deleted post. Did you mean this post? Or did you mean to refer to your conversation with Wei Dai in the thread you linked?
>Artificial Mysterious Intelligence
I largely agree with that post, but there’s a subtle distinction I want to make.
When you develop software top-down, you start with a vague idea of what you want to build, and then you break it down in to several components. For each component, you have a vague idea of how it will work—hopefully at least a little less vague than your idea about how the thing as a whole will work. The process proceeds from gathering requirements, to writing a spec, to architecture diagrams, to pseudocode, until finally you have working code. Software architecture is kinda like philosophy in a certain way.
To a software architect who’s early in the design stage, “this component has not been fully specified” is not a very interesting objection. At this point, none of the components have been fully specified! More interesting objections:
“Based on the sketch you’ve provided, it doesn’t seem like this software would actually do what we need if it was developed”
“I don’t see how this component could be made secure”
“I don’t think we can build this component” / “This component seems at least as difficult to create as the project as a whole”
Note that the last two objections do not necessarily mean the design as a whole is dead on arrival. If the software architect knows what they’re doing, they are describing a subcomponent because they think it’s at least a little easier than the entire project. In this case, there’s a difference of opinion which indicates that it could be useful to discuss that subcomponent more.
If FAI is rated as impossibly difficult, and we’re able to reduce it to subcomponents rated at most as extremely difficult, that seems like progress. If we’re able to reduce it to components that we think might become available before AGI as a whole becomes available, that creates the possibility of FAI through differential technological development.
Software development is a poor metaphor for AI alignment. It reminds me of Ron Jeffries’ sad attempt to write a Sudoku solver by using “extreme programming” without knowing about backtracking search. He kinda blunders around for a few weeks and then stops. Another nice mockery is this, which (unbelievably) comes from an actual AI design by someone.
A better metaphor for AI alignment is fundamental science, where we’re supposed to understand things step by step. Your post makes several such steps at the start. But then you spend many pages on sketching a software design, which even after careful reading didn’t advance my understanding of AI alignment in any way. That’s the criterion I use.
Maybe we should explain to contestants that they should try to advance the frontier of understanding by one inch, not solve all of AI alignment in one go. The latter seems to lead people in unproductive directions.
It may be worth mentioning that the “someone” who produced the “actual AI design” is a known crank. (Whose handle must not be mentioned, for ancient legend says that if you speak his name then he will appear and be a big nuisance.)
>Software development is a poor metaphor for AI alignment.
I think I disagree, but let’s ignore the metaphor aspect and focus on the model. The same causal model can also be communicated using science & engineering as a metaphor. If you want to know what scientific insights to work towards to create some breakthrough technology, it’s valuable to periodically put on your engineer hat. Without it, you’ll do basic research that could end up leading anywhere. In search terms, an engineer hat offers an improved heuristic. If your scientist hat allows you to forward chain, your engineer hat allows you to backward chain.
I’d argue the engineer hat is critical for effective differential technological development.
When I saw the title, I thought, ‘But we want to decompose problems in FAI theory to isolate questions we can answer. This suggests heavy use of black boxes.’ I wondered if perhaps he was trying to help people who were getting everything wrong (in which case I think a positive suggestion has more chance of helping than telling people what to avoid). I was pleased to see the post actually addressed a more intelligent perspective, and has much less to do with your point or mine.
Thanks for this post!
>adversarial examples
Consider two strategies for developing software: top-down and bottom-up. Each strategy has its own risks. The risk of the bottom-up approach is that you build components that don’t end up being useful in the final design. The risk of the top-down approach is that you make assumptions about the sort of components it’s possible to build that don’t end up being true. Because each approach has advantages and disadvantages, I think it makes sense for people to take both approaches when planning a large, important software project such as an FAI.
OpenAI is working on how to solve the problem of adversarial examples, which is great. But I also think it’s useful for people to ask the question: suppose we had a solution to the problem of adversarial examples. Would it then be possible to build an FAI? If so, how? By answering this question, we can know things like: How useful is it for us to work on adversarial examples, relative to other bottom-up AI safety work? Are there other problems such that if we were able to solve them, and combine the solution to them with our solution to adversarial examples, we’d be able to build an FAI?
BTW, I would guess that top-down work is a comparative advantage of competitions like this one, since bottom-up work is the kind of thing that academic journals will publish.
>treacherous turns
Suppose we have an algorithm that’s known to be a behavior-executor but is otherwise a black box. If it’s truly an accurate imitation of human values, in the sense that the box agrees exactly with us about what kind of worlds are valuable, then maybe you’d still have the problem of a treacherous turn in the sense that once an AI using this box as its values gets powerful enough, it grabs hold of civilization’s steering wheel and doesn’t let go. But that wouldn’t be a problem if the AI’s values were determined by this hypothetical perfect black box, because it would steer us exactly where we wanted to go anyway.
>nonsentient uploads
This seems to point to a deleted post. Did you mean this post? Or did you mean to refer to your conversation with Wei Dai in the thread you linked?
>Artificial Mysterious Intelligence
I largely agree with that post, but there’s a subtle distinction I want to make.
When you develop software top-down, you start with a vague idea of what you want to build, and then you break it down in to several components. For each component, you have a vague idea of how it will work—hopefully at least a little less vague than your idea about how the thing as a whole will work. The process proceeds from gathering requirements, to writing a spec, to architecture diagrams, to pseudocode, until finally you have working code. Software architecture is kinda like philosophy in a certain way.
To a software architect who’s early in the design stage, “this component has not been fully specified” is not a very interesting objection. At this point, none of the components have been fully specified! More interesting objections:
“Based on the sketch you’ve provided, it doesn’t seem like this software would actually do what we need if it was developed”
“I don’t see how this component could be made secure”
“I don’t think we can build this component” / “This component seems at least as difficult to create as the project as a whole”
Note that the last two objections do not necessarily mean the design as a whole is dead on arrival. If the software architect knows what they’re doing, they are describing a subcomponent because they think it’s at least a little easier than the entire project. In this case, there’s a difference of opinion which indicates that it could be useful to discuss that subcomponent more.
If FAI is rated as impossibly difficult, and we’re able to reduce it to subcomponents rated at most as extremely difficult, that seems like progress. If we’re able to reduce it to components that we think might become available before AGI as a whole becomes available, that creates the possibility of FAI through differential technological development.
Software development is a poor metaphor for AI alignment. It reminds me of Ron Jeffries’ sad attempt to write a Sudoku solver by using “extreme programming” without knowing about backtracking search. He kinda blunders around for a few weeks and then stops. Another nice mockery is this, which (unbelievably) comes from an actual AI design by someone.
A better metaphor for AI alignment is fundamental science, where we’re supposed to understand things step by step. Your post makes several such steps at the start. But then you spend many pages on sketching a software design, which even after careful reading didn’t advance my understanding of AI alignment in any way. That’s the criterion I use.
Maybe we should explain to contestants that they should try to advance the frontier of understanding by one inch, not solve all of AI alignment in one go. The latter seems to lead people in unproductive directions.
It may be worth mentioning that the “someone” who produced the “actual AI design” is a known crank. (Whose handle must not be mentioned, for ancient legend says that if you speak his name then he will appear and be a big nuisance.)
>Software development is a poor metaphor for AI alignment.
I think I disagree, but let’s ignore the metaphor aspect and focus on the model. The same causal model can also be communicated using science & engineering as a metaphor. If you want to know what scientific insights to work towards to create some breakthrough technology, it’s valuable to periodically put on your engineer hat. Without it, you’ll do basic research that could end up leading anywhere. In search terms, an engineer hat offers an improved heuristic. If your scientist hat allows you to forward chain, your engineer hat allows you to backward chain.
I’d argue the engineer hat is critical for effective differential technological development.
When I saw the title, I thought, ‘But we want to decompose problems in FAI theory to isolate questions we can answer. This suggests heavy use of black boxes.’ I wondered if perhaps he was trying to help people who were getting everything wrong (in which case I think a positive suggestion has more chance of helping than telling people what to avoid). I was pleased to see the post actually addressed a more intelligent perspective, and has much less to do with your point or mine.