Over at medium, I’m continuing to write about AI control; here’s a roundup from the last month.
Many of these seem like interesting things to discuss here; would it be better to post each of these as a link when I write it?
#Strategy
Prosaic AI control argues that AI control research should first consider the case where AI involves no “unknown unknowns.”
Handling destructive technology tries to explain the upside of AI control, if we live in a universe where we eventually need to build a singleton anyway.
Meta-execution is my current leading contender for security and capability amplification. It’s totally unclear how well it can work (some relevant speculation).
My recent posts
Over at medium, I’m continuing to write about AI control; here’s a roundup from the last month.
Many of these seem like interesting things to discuss here; would it be better to post each of these as a link when I write it?
#Strategy
Prosaic AI control argues that AI control research should first consider the case where AI involves no “unknown unknowns.”
Handling destructive technology tries to explain the upside of AI control, if we live in a universe where we eventually need to build a singleton anyway.
Hard-core subproblems explains a concept I find helpful for organizing research.
#Building blocks of ALBA
Security amplification and reliability amplification are complements to capability amplification. Ensembling for reliability is now implemented in ALBA on github.
Meta-execution is my current leading contender for security and capability amplification. It’s totally unclear how well it can work (some relevant speculation).
Thoughts on reward engineering discusses a bunch of prosaic but important issues when designing reward functions.
Terminology and concepts
Clarifying the distinction between safety, control and alignment.
Benignity may be a useful invariant when designing aligned AI.