paulfchristiano comments on My Overview of the AI Alignment Landscape: Threat Models

paulfchristiano 26 Dec 2021 7:08 UTC
LW: 12 AF: 7
AF
I’m still pretty confused by “You get what you measure” being framed as a distinct threat model from power-seeking AI (rather than as another sub-threat model)
I also consider catastrophic versions of “you get what you measure” to be a subset/framing/whatever of “misaligned power-seeking.” I think misaligned power-seeking is the main way the problem is locked in.
To a lesser extent, “you get what you measure” may also be an obstacle to using AI systems to help us navigate complex challenges without quick feedback, like improving governance. But I don’t think that’s an x-risk in itself, more like a missed opportunity to do better. This is in the same category as e.g. failures of the education system, though it’s plausibly better-leveraged if you have EA attitudes about AI being extremely important/leveraged. (ETA: I also view AI coordination, and differential capability progress, in a similar way.)
What links here?
- Mau's comment on Classifying sources of AI x-risk by Sam Clarke (EA Forum; 8 Aug 2022 23:47 UTC; 5 points)
- Mau's comment on What failure looks like by paulfchristiano (26 Dec 2021 8:06 UTC; 2 points)