Thanks for posting this. I am still a bit fuzzy on what exactly the Superalignment plan is, or if there even is a firm plan at this stage. Hope we can learn more soon.
I think they had a reasonably detailed (but unfortunately unrealistic) plan for aligning superintelligence before Ilya became a co-lead of the Superalignment team. That had been published, in multiple installments.
But the problem with most such alignment plans including this one had always been that they didn’t have much chance of working for a self-improving superintelligent AI or ecosystem of AIs, that is, exactly when we start really needing them to work.
I think Ilya understood this very well, and he started to revise plans and to work in new directions in this sense, and we were seeing various bits of his thoughts on that in his various interviews (in addition to what he said here, one other motif he was returning to in recent months was that it is desirable that superintelligent AIs would think about themselves as something like parents, and about us as something like their children, so one of the questions is what should we do to achieve that).
But I don’t know if he would want to publish details going forward (successful AI safety research is capability research, there is no way to separate them, and the overall situation might be getting too close to the endgame). He will certainly share something, but the core novel technical stuff will more and more be produced via intellectual collaboration with cutting edge advanced (pre-public-release in-house) AI systems, and they would probably want to at least introduce a delay before sharing something as sensitive as this.
Thanks for posting this. I am still a bit fuzzy on what exactly the Superalignment plan is, or if there even is a firm plan at this stage. Hope we can learn more soon.
I think they had a reasonably detailed (but unfortunately unrealistic) plan for aligning superintelligence before Ilya became a co-lead of the Superalignment team. That had been published, in multiple installments.
The early July text https://openai.com/blog/introducing-superalignment was the last of those installments, and most of its technical content was pre-Ilya (as far as I knew), but it also introduced Ilya as a co-lead.
But the problem with most such alignment plans including this one had always been that they didn’t have much chance of working for a self-improving superintelligent AI or ecosystem of AIs, that is, exactly when we start really needing them to work.
I think Ilya understood this very well, and he started to revise plans and to work in new directions in this sense, and we were seeing various bits of his thoughts on that in his various interviews (in addition to what he said here, one other motif he was returning to in recent months was that it is desirable that superintelligent AIs would think about themselves as something like parents, and about us as something like their children, so one of the questions is what should we do to achieve that).
But I don’t know if he would want to publish details going forward (successful AI safety research is capability research, there is no way to separate them, and the overall situation might be getting too close to the endgame). He will certainly share something, but the core novel technical stuff will more and more be produced via intellectual collaboration with cutting edge advanced (pre-public-release in-house) AI systems, and they would probably want to at least introduce a delay before sharing something as sensitive as this.
Jan Leike is head of superalignment. He blogged about a version of Eliezer’s CEV.