The only disagreement I’m seeing in the comments is on smaller points, not larger ones. I wonder what that means. It feels like “absence of evidence is evidence of absence” to me.
1: It takes longer than a few hours to properly disagree with a post like this. 2: I’m not sure the comments here are an appropriate venue for debating such a disagreement.
I personally have a number of significant, specific disagreements with the post, primarily relating to the predictability and expected outcomes of inner misalignments and the most appropriate way of thinking about agency and value fragility. I’ve linked some comments I’ve made on those topics, but I think a better way to debate these sorts of questions is via a top level post specifically focusing on one area of disagreement.
1: Yeah I guess that’s true. And comments about smaller points are quicker to write up, explaining the fact that we see a bunch of those comments earlier on. But my intuition is that in 24-48 hours those sorts of meatier objections would usually surface.
2: Regardless of whether that is true, I would expect some people to find the OP an appropriate place to debate.
One datapoint: - Overall I don’t think the structure of the text makes it easy to express larger disagreements. Many points state obviously true observations, many other points are expressing the same problem in different words, some points are false, and sometimes whether a point actually bites depends on highly speculative assumptions. - For example: if that counts as a disagreement, in my view what makes multiple of these points “lethal” is a hidden assumption there is a fundamental discontinuity between some categories of systems (eg. weak, won’t kill you, won’t help you with alignment | strong, would help you with alignment, but will kill you by default ) and there isn’t anything interesting/helpful in between (eg. “moderately strong” systems). I don’t think this is true or inevitable. - I’ll probably try to write and post a longer, top-level post about this (working title: Hope is in continuity). - I think an attempt to discuss this in comments would be largely pointless. Short-form comment would run into the problem of misunderstanding of what I mean, long comment would be too long.
a hidden assumption there is a fundamental discontinuity between some categories of systems (eg. weak, won’t kill you, won’t help you with alignment | strong, would help you with alignment, but will kill you by default ) and there isn’t anything interesting/helpful in between (eg. “moderately strong” systems). I don’t think this is true or inevitable.
- I’ll probably try to write and post a longer, top-level post about this (working title: Hope is in continuity).
I think discontinuity is true, but it’s not actually required for EY’s argument. Thus, asserting continuity isn’t sufficient as a response.
You specifically need it to be the case that you get useful capabilities earlier than dangerous ones. If the curves are continuous and danger comes at a different time than pivotalness, but danger comes before pivotalness, then you’re plausibly in a worse situation rather than a better one.
So there needs to be some pivotal act that is pre-dangerous but also post-useful. I think the best way to argue for this is just to name one or more examples. Not necessarily examples where you have an ironclad proof that the curves will work out correctly; just examples that you do in fact believe are reasonably likely to work out. Then we can talk about whether there’s a disagreement about the example’s usefulness, or about its dangeousness, or both.
(Elaborating on “I think discontinuity is true”: I don’t think AGI is just GPT-7 or Bigger AlphaGo; I don’t think the cognitive machinery involved in modeling physical environments, generating and testing scientific hypotheses to build an edifice of theory, etc. is a proper or improper subset of the machinery current systems exhibit; and I don’t think the missing skills are a huge grab bag of unrelated local heuristics such that accumulating them will be gradual and non-lumpy.)
The only disagreement I’m seeing in the comments is on smaller points, not larger ones. I wonder what that means. It feels like “absence of evidence is evidence of absence” to me.
1: It takes longer than a few hours to properly disagree with a post like this.
2: I’m not sure the comments here are an appropriate venue for debating such a disagreement.
I personally have a number of significant, specific disagreements with the post, primarily relating to the predictability and expected outcomes of inner misalignments and the most appropriate way of thinking about agency and value fragility. I’ve linked some comments I’ve made on those topics, but I think a better way to debate these sorts of questions is via a top level post specifically focusing on one area of disagreement.
1: Yeah I guess that’s true. And comments about smaller points are quicker to write up, explaining the fact that we see a bunch of those comments earlier on. But my intuition is that in 24-48 hours those sorts of meatier objections would usually surface.
2: Regardless of whether that is true, I would expect some people to find the OP an appropriate place to debate.
One datapoint:
- Overall I don’t think the structure of the text makes it easy to express larger disagreements. Many points state obviously true observations, many other points are expressing the same problem in different words, some points are false, and sometimes whether a point actually bites depends on highly speculative assumptions.
- For example: if that counts as a disagreement, in my view what makes multiple of these points “lethal” is a hidden assumption there is a fundamental discontinuity between some categories of systems (eg. weak, won’t kill you, won’t help you with alignment | strong, would help you with alignment, but will kill you by default ) and there isn’t anything interesting/helpful in between (eg. “moderately strong” systems). I don’t think this is true or inevitable.
- I’ll probably try to write and post a longer, top-level post about this (working title: Hope is in continuity).
- I think an attempt to discuss this in comments would be largely pointless. Short-form comment would run into the problem of misunderstanding of what I mean, long comment would be too long.
I think discontinuity is true, but it’s not actually required for EY’s argument. Thus, asserting continuity isn’t sufficient as a response.
You specifically need it to be the case that you get useful capabilities earlier than dangerous ones. If the curves are continuous and danger comes at a different time than pivotalness, but danger comes before pivotalness, then you’re plausibly in a worse situation rather than a better one.
So there needs to be some pivotal act that is pre-dangerous but also post-useful. I think the best way to argue for this is just to name one or more examples. Not necessarily examples where you have an ironclad proof that the curves will work out correctly; just examples that you do in fact believe are reasonably likely to work out. Then we can talk about whether there’s a disagreement about the example’s usefulness, or about its dangeousness, or both.
(Elaborating on “I think discontinuity is true”: I don’t think AGI is just GPT-7 or Bigger AlphaGo; I don’t think the cognitive machinery involved in modeling physical environments, generating and testing scientific hypotheses to build an edifice of theory, etc. is a proper or improper subset of the machinery current systems exhibit; and I don’t think the missing skills are a huge grab bag of unrelated local heuristics such that accumulating them will be gradual and non-lumpy.)
The actual post is now here—as expected, it’s more post-length than a comment.
You are dealing with a potentially very biased sample of people, I wouldn’t conclude that