I personally suspect that it will basically come down to solving the stability under self-modification problem… This may already be the part of the problem that people in the know think is difficult
Yes, it is already the part we suspect will be difficult. The other part may or may not be difficult once we solve that part.
is the above discussion just me starting to suspect how hard friendliness is?
That’s what downvoting is for. (And, perhaps, commenting with “I downvoted this because I consider it off-topic for the main site, but I will remove my downvote if you move it to the Discussion section.”)
Would it be fair to say that my last two posts were similarly off-topic (they were both descriptions of widgets that would be used for AI boxing)? I have a very imprecise conception of what is and what is not on-topic for the main site.
Would it be fair to say that my last two posts were similarly off-topic (they were both descriptions of widgets that would be used for AI boxing)?
In my opinion, yes, as it’s not about development or application of rationality, and free discussion of transhumanist topics will damage the rationality site. But I think it’s fine for the discussion area.
It wasn’t the topicness so much as the degree to which the post seemed written in the form of… natter, maybe, would be the word to describe it? It read like Discussion, and not like a main LW post.
I agree, but is the writing style your real objection? If I were an under-rated fAI researcher moderating a website, I’d be annoyed if I saw a main-page post purporting to be humble but still missing the cutting edge of usefulness by several levels of skill. I’d move it to the discussion page by way of emphasizing how far the poster still had to go to be as skilled as I am.
Sure: a third reason might be because leaving low-level posts on friendliness on the top page encourages people to misunderestimate how difficult friendliness is, which might lead them to trivialize the whole project, donate less to SIAI, and/or try to build their own non-provably friendly AI.
The interesting question is why you would remark on writing style if the third reason were your true objection. Maybe you’ve got some other objection altogether—it’s not that important to me; I like your blog and you can move stuff around on it if you feel like it.
A long time ago you described what you perceived as the difficulties for FAI:
Solving the technical problems required to maintain a well-specified abstract invariant in a self-modifying goal system. (Interestingly, this problem is relatively straightforward from a theoretical standpoint.)
Choosing something nice to do with the AI. This is about midway in theoretical hairiness between problems 1 and 3.
Designing a framework for an abstract invariant that doesn’t automatically wipe out the human species. This is the hard part.
I know that was a long time ago, but people here still link to it, presumably because they don’t know of any more up to date statement with similar content. Hopefully you can see why I was confused about which part of this problem was supposed to be hard. I now see that I probably misinterpreted it, but the examples which come directly afterwards reaffirm my incorrect interpretation.
So would it be fair to say that figuring out how to build a good paperclipper, as opposed to a process that does something we don’t understand, already requires solving the hard part?
Either my views changed since that time, or what I was trying to communicate by it was that 3 was the most inscrutable part of the problem to people who try to tackle it, rather than that it was the blocker problem. 1 is the blocker problem, I think. I probably realize that now to a greater degree than I did at that time, and probably also made more progress on 3 relative to 1, but I don’t know how much my opinions actually changed (there’s some well-known biases about that).
So would it be fair to say that figuring out how to build a good paperclipper, as opposed to a process that does something we don’t understand, already requires solving the hard part?
Current estimate says yes. There would still be an inscrutable problem to solve too, but I don’t think it would have quite the same impenetrability about it.
Would it be fair to say that even developing a formalism which is capable of precisely expressing the idea that something is a good paperclipper is significantly beyond current techniques, and that substantial progress on this problem probably represents substantial progress towards FAI?
I’m fine with this use of moderation—I considered posting this in discussion, but editors’ discretion is better than mine, and people have to look at it while they wait for me to move it at someone’s suggestion—but stuff about discussion doesn’t appear anywhere on the main page so I was confused about what had happened.
Yes, it is already the part we suspect will be difficult. The other part may or may not be difficult once we solve that part.
I’m surprised by this—I can imagine where I might start on the “stable under self-modification” problem, but I have a very hard time thinking where I might start on the “actually specifying the supergoal” problem.
To talk about “stable under self-modification”, you need a notion of what it is that needs to be stable, the kind of data that specifies a decision problem. Once we have that notion, it could turn out to be relatively straightforward to extract its instance from human minds (but probably not). On the other hand, while we don’t have that notion, there is little point in attacking the human decision problem extraction problem.
Yes, it is already the part we suspect will be difficult. The other part may or may not be difficult once we solve that part.
No, you still haven’t started to suspect that.
Also, moving this post to the Discussion section.
I disagree with this particular use of moderation power. That’s what not-promoting is for.
It’s not particularly on-topic for the main site.
That’s what downvoting is for. (And, perhaps, commenting with “I downvoted this because I consider it off-topic for the main site, but I will remove my downvote if you move it to the Discussion section.”)
Would it be fair to say that my last two posts were similarly off-topic (they were both descriptions of widgets that would be used for AI boxing)? I have a very imprecise conception of what is and what is not on-topic for the main site.
In my opinion, yes, as it’s not about development or application of rationality, and free discussion of transhumanist topics will damage the rationality site. But I think it’s fine for the discussion area.
It wasn’t the topicness so much as the degree to which the post seemed written in the form of… natter, maybe, would be the word to describe it? It read like Discussion, and not like a main LW post.
I agree, but is the writing style your real objection? If I were an under-rated fAI researcher moderating a website, I’d be annoyed if I saw a main-page post purporting to be humble but still missing the cutting edge of usefulness by several levels of skill. I’d move it to the discussion page by way of emphasizing how far the poster still had to go to be as skilled as I am.
A tad too much cynicism, there. You can’t think of any other reason to move it to the Discussion page, under those circumstances?
Sure: a third reason might be because leaving low-level posts on friendliness on the top page encourages people to misunderestimate how difficult friendliness is, which might lead them to trivialize the whole project, donate less to SIAI, and/or try to build their own non-provably friendly AI.
The interesting question is why you would remark on writing style if the third reason were your true objection. Maybe you’ve got some other objection altogether—it’s not that important to me; I like your blog and you can move stuff around on it if you feel like it.
A long time ago you described what you perceived as the difficulties for FAI:
I know that was a long time ago, but people here still link to it, presumably because they don’t know of any more up to date statement with similar content. Hopefully you can see why I was confused about which part of this problem was supposed to be hard. I now see that I probably misinterpreted it, but the examples which come directly afterwards reaffirm my incorrect interpretation.
So would it be fair to say that figuring out how to build a good paperclipper, as opposed to a process that does something we don’t understand, already requires solving the hard part?
Either my views changed since that time, or what I was trying to communicate by it was that 3 was the most inscrutable part of the problem to people who try to tackle it, rather than that it was the blocker problem. 1 is the blocker problem, I think. I probably realize that now to a greater degree than I did at that time, and probably also made more progress on 3 relative to 1, but I don’t know how much my opinions actually changed (there’s some well-known biases about that).
Current estimate says yes. There would still be an inscrutable problem to solve too, but I don’t think it would have quite the same impenetrability about it.
Would it be fair to say that even developing a formalism which is capable of precisely expressing the idea that something is a good paperclipper is significantly beyond current techniques, and that substantial progress on this problem probably represents substantial progress towards FAI?
Yes.
I’m fine with this use of moderation—I considered posting this in discussion, but editors’ discretion is better than mine, and people have to look at it while they wait for me to move it at someone’s suggestion—but stuff about discussion doesn’t appear anywhere on the main page so I was confused about what had happened.
I’m surprised by this—I can imagine where I might start on the “stable under self-modification” problem, but I have a very hard time thinking where I might start on the “actually specifying the supergoal” problem.
To talk about “stable under self-modification”, you need a notion of what it is that needs to be stable, the kind of data that specifies a decision problem. Once we have that notion, it could turn out to be relatively straightforward to extract its instance from human minds (but probably not). On the other hand, while we don’t have that notion, there is little point in attacking the human decision problem extraction problem.