My recent posts

paulfchristiano29 Nov 2016 18:51 UTC

LW: 7 AF: 5

0 comments1 min readLW link

Over at medium, I’m continuing to write about AI control; here’s a roundup from the last month.

Many of these seem like interesting things to discuss here; would it be better to post each of these as a link when I write it?

#Strategy

Prosaic AI control argues that AI control research should first consider the case where AI involves no “unknown unknowns.”
Handling destructive technology tries to explain the upside of AI control, if we live in a universe where we eventually need to build a singleton anyway.
Hard-core subproblems explains a concept I find helpful for organizing research.

#Building blocks of ALBA

Security amplification and reliability amplification are complements to capability amplification. Ensembling for reliability is now implemented in ALBA on github.
Meta-execution is my current leading contender for security and capability amplification. It’s totally unclear how well it can work (some relevant speculation).
Thoughts on reward engineering discusses a bunch of prosaic but important issues when designing reward functions.

Terminology and concepts

Clarifying the distinction between safety, control and alignment.
Benignity may be a useful invariant when designing aligned AI.

paulfchristiano29 Nov 2016 18:51 UTC

LW: 7 AF: 5

0 comments1 min readLW link

No comments.