I think this post is incredibly useful as a concrete example of the challenges of seemingly benign powerful AI, and makes a compelling case for serious AI safety research being a prerequisite to any safe further AI development. I strongly dislike part 9, as painting the Predict-o-matic as consciously influencing others personality at the expense of short-term prediction error seems contradictory to the point of the rest of the story. I suspect I would dislike part 9 significantly less if it was framed in terms of a strategy to maximize predictive accuracy.
More specifically, I really enjoy the focus on the complexity of “optimization” on a gears-level: I think that it’s a useful departure from high abstraction levels, as the question of what predictive accuracy means, and the strategy AI would use to pursue it, is highly influenced by the approach taken. I think a more rigorous approach to analyzing whether different AI approaches are susceptible to “undercutting” as a safety feature would be an extremely valuable piece. My suspicion is that even the engineer’s perspective here is significantly under-specified with the details necessary to determine whether this vulnerability exists.
I also think that Part 9 detracts from the piece in two main ways: by painting the predict-o-matic as conscious, it implies a significantly more advanced AI than necessary to exhibit this effect. Additionally, because the AI admits to sacrificing predictIve accuracy in favor of some abstract value-add, it seems like pretty much any naive strategy would outcompete the current one, according to the engineer, meaning that the type of threat is also distorted: the main worry should be AI OPTIMIZING for predictive accuracy, not pursuing its own goals. That’s bad sci-fi or very advanced GAI, not a prediction-optimizer.
I would support the deletion or aggressive editing of part 9 in this and future similar pieces: I’m not sure what it adds. ETA-I think whether or not this post should be updated depends on whether you think the harms of part 9 outweigh the benefit of the previous parts: it’s plausible to me that the benefits of a clearly framed story that’s relevant to AI safety are enormous, but it’s also plausible that the costs of creating a false sense of security are larger.
I would support the deletion or aggressive editing of part 9 in this and future similar pieces
I don’t think I’m the target audience for this story so I’m not leaving a full review, but +1 to this. Part 9 seems to be trying to display another possible failure mode (specifically inner misalignment), but it severely undercuts the core message from the rest of the post: that a predictive accuracy optimizer is dangerous even if that’s all it optimizes for.
I do think an analogous story which focused specifically on inner optimization would be great, but mixing it in here dilutes the main message.
I don’t see why it should necessarily undercut the core message of the post, since inner optimizers are still in some sense about the consequences of a pure predictive accuracy optimizer (but in the selection sense, not the control sense). But I agree that it wasn’t sufficiently well done. It didn’t feel like a natural next complication, the way everything else did.
I wouldn’t say that inner optimizers are about the consequences of pure predictive accuracy optimization; the two are orthogonal. An inner optimizer can pop up in optimizers which optimize for things besides predictive accuracy, and predictive accuracy optimization can be done in ways which don’t give rise to inner optimizers. Contrast that to the other failure modes discussed in the post, which are inherently about predictive accuracy—e.g. the assassination markets problem.
I agree that it’s narratively exciting; I worry that it makes the story counterproductive in its current form (I.e. computer people thinking “computers don’t think like that, so this is irrelevant)
Either it’s a broadly accurate portrayal of its reasons for action or it isn’t, just because people find hard sci-fi weird doesn’t mean you should make it into a fantasy. Don’t dilute art for people who don’t get it.
I’m a bit confused-I thought that this was what I was trying to say. I don’t think this is a broadly accurate portray of reasons for action as discussed elsewhere in the story, see great-grandparent for why. Separately, I think it’s a really bad idea to be implicitly tying harm done by AI (hard sci-fi) to a prerequisite of anthropomorphized consciousness (fantasy). Maybe we agree, and are miscommunication?
Yeah, my bad, I didn’t read your initial review properly (I saw John’s comment in Recent Discussion and made some fast inferences about what you originally said). Sorry about that! Thx for the review :)
I think this post is incredibly useful as a concrete example of the challenges of seemingly benign powerful AI, and makes a compelling case for serious AI safety research being a prerequisite to any safe further AI development. I strongly dislike part 9, as painting the Predict-o-matic as consciously influencing others personality at the expense of short-term prediction error seems contradictory to the point of the rest of the story. I suspect I would dislike part 9 significantly less if it was framed in terms of a strategy to maximize predictive accuracy.
More specifically, I really enjoy the focus on the complexity of “optimization” on a gears-level: I think that it’s a useful departure from high abstraction levels, as the question of what predictive accuracy means, and the strategy AI would use to pursue it, is highly influenced by the approach taken. I think a more rigorous approach to analyzing whether different AI approaches are susceptible to “undercutting” as a safety feature would be an extremely valuable piece. My suspicion is that even the engineer’s perspective here is significantly under-specified with the details necessary to determine whether this vulnerability exists.
I also think that Part 9 detracts from the piece in two main ways: by painting the predict-o-matic as conscious, it implies a significantly more advanced AI than necessary to exhibit this effect. Additionally, because the AI admits to sacrificing predictIve accuracy in favor of some abstract value-add, it seems like pretty much any naive strategy would outcompete the current one, according to the engineer, meaning that the type of threat is also distorted: the main worry should be AI OPTIMIZING for predictive accuracy, not pursuing its own goals. That’s bad sci-fi or very advanced GAI, not a prediction-optimizer.
I would support the deletion or aggressive editing of part 9 in this and future similar pieces: I’m not sure what it adds. ETA-I think whether or not this post should be updated depends on whether you think the harms of part 9 outweigh the benefit of the previous parts: it’s plausible to me that the benefits of a clearly framed story that’s relevant to AI safety are enormous, but it’s also plausible that the costs of creating a false sense of security are larger.
I share a feeling that part 9 is somehow bad, and I think your points are fair.
I don’t think I’m the target audience for this story so I’m not leaving a full review, but +1 to this. Part 9 seems to be trying to display another possible failure mode (specifically inner misalignment), but it severely undercuts the core message from the rest of the post: that a predictive accuracy optimizer is dangerous even if that’s all it optimizes for.
I do think an analogous story which focused specifically on inner optimization would be great, but mixing it in here dilutes the main message.
I don’t see why it should necessarily undercut the core message of the post, since inner optimizers are still in some sense about the consequences of a pure predictive accuracy optimizer (but in the selection sense, not the control sense). But I agree that it wasn’t sufficiently well done. It didn’t feel like a natural next complication, the way everything else did.
I wouldn’t say that inner optimizers are about the consequences of pure predictive accuracy optimization; the two are orthogonal. An inner optimizer can pop up in optimizers which optimize for things besides predictive accuracy, and predictive accuracy optimization can be done in ways which don’t give rise to inner optimizers. Contrast that to the other failure modes discussed in the post, which are inherently about predictive accuracy—e.g. the assassination markets problem.
OK, yeah, that’s fair.
I found it narratively quite exciting to have a section from the POV of the predic-o-matic.
I agree with this.
I agree that it’s narratively exciting; I worry that it makes the story counterproductive in its current form (I.e. computer people thinking “computers don’t think like that, so this is irrelevant)
Either it’s a broadly accurate portrayal of its reasons for action or it isn’t, just because people find hard sci-fi weird doesn’t mean you should make it into a fantasy. Don’t dilute art for people who don’t get it.
I’m a bit confused-I thought that this was what I was trying to say. I don’t think this is a broadly accurate portray of reasons for action as discussed elsewhere in the story, see great-grandparent for why. Separately, I think it’s a really bad idea to be implicitly tying harm done by AI (hard sci-fi) to a prerequisite of anthropomorphized consciousness (fantasy). Maybe we agree, and are miscommunication?
Yeah, my bad, I didn’t read your initial review properly (I saw John’s comment in Recent Discussion and made some fast inferences about what you originally said). Sorry about that! Thx for the review :)