Like Matthew, I’m excited to see more work on transparency and adversarial training for inner alignment. I’m a somewhat skeptical of the value of work that plans to decompose future models into a “world model”, “search” and “objective”: I would guess that there are many ways to achieve intelligent cognition that don’t easily factor into any of these concepts. It seems fine to study a system composed of a world model, search and objective in order to gain conceptual insight; I’m more worried about proposing it as an actual plan.
The point about decompositions is a pretty minor portion of this post; is there a reason you think that part is more worthwhile to focus on for the newsletter?
That’s… a fair point. It does make up a substantial portion of the transparency section, which seems like the “solutions” part of this post, but it isn’t the entire post.
Matthew’s certainly right that I tend to reply to things I disagree with, though I usually try to avoid disagreeing with details. I’m not sure that I only disagree with details here, but I can’t clearly articulate what about this feels off to me. I’ll delete the opinion altogether; I’m not going to put an unclear opinion in the newsletter.
I’m not Rohin, but I think there’s a tendency to reply to things you disagree with rather than things you agree with. That would explain my emphasis anyway.
My opinion, also going into the newsletter:
Like Matthew, I’m excited to see more work on transparency and adversarial training for inner alignment. I’m a somewhat skeptical of the value of work that plans to decompose future models into a “world model”, “search” and “objective”: I would guess that there are many ways to achieve intelligent cognition that don’t easily factor into any of these concepts. It seems fine to study a system composed of a world model, search and objective in order to gain conceptual insight; I’m more worried about proposing it as an actual plan.
The point about decompositions is a pretty minor portion of this post; is there a reason you think that part is more worthwhile to focus on for the newsletter?
That’s… a fair point. It does make up a substantial portion of the transparency section, which seems like the “solutions” part of this post, but it isn’t the entire post.
Matthew’s certainly right that I tend to reply to things I disagree with, though I usually try to avoid disagreeing with details. I’m not sure that I only disagree with details here, but I can’t clearly articulate what about this feels off to me. I’ll delete the opinion altogether; I’m not going to put an unclear opinion in the newsletter.
I’m not Rohin, but I think there’s a tendency to reply to things you disagree with rather than things you agree with. That would explain my emphasis anyway.