These questions seem to cut up the conceptual space in an opinionated, and I think wrong, way.
What is the predicted architecture of the learning algorithm(s) used by AGI?
What even is an architecture? What’s learning? What are learning algorithms, and how they have architecture? What sort of architecture matters? I know it’s trivially easy to ask questions like this; but I think insofar as question 1 has meaning, it’s making assumptions that are actually probably wrong.
What are the most likely bad outcomes of this learning architecture?
We care about all this because of the outcomes, but that doesn’t mean the outcomes themselves are a deep or handy way of understanding what’s wrong with an AGI.
What are the control proposals for minimizing these bad outcomes?
“Minimizing bad outcomes” sounds like this is a basically continuous variable, which is controversial. “Control proposals”, if it means what it sounds like, is assuming too much; how do you know a good alignment strategy looks like control rather than something else?
What is the predicted timeline for the development of AGI?
Are you saying it’s necessary to have an answer to this, to have an approach to alignment? Why would that be? (You write that these are questions ”....an AGI safety research agenda would need to answer correctly in order to be successful...”.)
Hi Tekhne—this post introduces each of the five questions I will put forward and analyze in this sequence. I will be posting one a day for the next week or so. I think I will answer all of your questions in the coming posts.
I doubt that carving up the space in this—or any—way would be totally uncontroversial (there are lots of value judgments necessary to do such a thing), but I think this concern only serves to demonstrate that this framework is not self-justifying (i.e., there is still lots of clarifying work to be done for each of these questions). I agree with this—that’s why there I am devoting a post to each of them!
In order to minimize AGI-induced existential threats, I claim that we need to understand (i.e., anticipate; predict) AGI well enough (Q1) to determine what these threats are (Q2). We then need to figure out ways to mitigate these threats (Q3) and ways to make sure these proposals are actually implemented (Q4). How quickly we need to answer Q1-Q4 will be determined by how soon we expect AGI to be developed (Q5). I appreciate your skepticism, but I would counter that this seems actually like a fairly natural and parsimonious way to get from point A (where we are now) to point B (minimizing AGI-induced existential threats). That’s why I claim that an AGI safety research agenda would need to answer these questions correctly in order to be successful.
Ultimately, I can only encourage you to wait for the rest of the sequence to be published before passing a conclusive judgment!
These questions seem to cut up the conceptual space in an opinionated, and I think wrong, way.
What even is an architecture? What’s learning? What are learning algorithms, and how they have architecture? What sort of architecture matters? I know it’s trivially easy to ask questions like this; but I think insofar as question 1 has meaning, it’s making assumptions that are actually probably wrong.
We care about all this because of the outcomes, but that doesn’t mean the outcomes themselves are a deep or handy way of understanding what’s wrong with an AGI.
“Minimizing bad outcomes” sounds like this is a basically continuous variable, which is controversial. “Control proposals”, if it means what it sounds like, is assuming too much; how do you know a good alignment strategy looks like control rather than something else?
Are you saying it’s necessary to have an answer to this, to have an approach to alignment? Why would that be? (You write that these are questions ”....an AGI safety research agenda would need to answer correctly in order to be successful...”.)
Hi Tekhne—this post introduces each of the five questions I will put forward and analyze in this sequence. I will be posting one a day for the next week or so. I think I will answer all of your questions in the coming posts.
I doubt that carving up the space in this—or any—way would be totally uncontroversial (there are lots of value judgments necessary to do such a thing), but I think this concern only serves to demonstrate that this framework is not self-justifying (i.e., there is still lots of clarifying work to be done for each of these questions). I agree with this—that’s why there I am devoting a post to each of them!
In order to minimize AGI-induced existential threats, I claim that we need to understand (i.e., anticipate; predict) AGI well enough (Q1) to determine what these threats are (Q2). We then need to figure out ways to mitigate these threats (Q3) and ways to make sure these proposals are actually implemented (Q4). How quickly we need to answer Q1-Q4 will be determined by how soon we expect AGI to be developed (Q5). I appreciate your skepticism, but I would counter that this seems actually like a fairly natural and parsimonious way to get from point A (where we are now) to point B (minimizing AGI-induced existential threats). That’s why I claim that an AGI safety research agenda would need to answer these questions correctly in order to be successful.
Ultimately, I can only encourage you to wait for the rest of the sequence to be published before passing a conclusive judgment!