Ben Pace comments on What is it to solve the alignment problem?

Ben Pace 28 Aug 2024 17:58 UTC
17 points
9
Curated!
(“Curated”, a term which here means “This just got emailed to 30,000 people, of whom typically half open the email and it gets shown at the top of the frontpage to anyone who hasn’t read it for ~1 week.”)
This is a thoughtful and detailed attempt to think through the entire alignment problem, making slightly different conceptual distinctions and tradeoffs, and this reaching somewhat different conclusions, and that’s very worthwhile! I want to reward people doing and publishing serious intellectual labor like this that otherwise mostly wouldn’t get done.
I like the notion of ‘avoiding’ and ‘handling’ the alignment problem as distinct from ‘solving’ it, and generally trying to talk about the same subject but without definitionally building-in the assumption that the agent will need to have identical values to us (which is especially worthwhile given how confused I am about my own values!)). I amused that you consider your definition here ‘devious’.
One critique I’ll make is that only a while in did I pick up that you weren’t talking about building maximally-intelligent systems, merely superintelligent systems (i.e. there’s a whole range of how much more intelligent than us a machine can be, and for a substantial part of this I believe you’re focusing on the lower end). I read you as focusing on the level of superintelligence that solves tons of major problems that have plagued humanity since its inception and has tons of obvious benefits (e.g. ending disease, amazing videogames, superintelligent life advice, etc) but not crazily higher than that (e.g. perhaps uploading everyone into ems and redesigning the human mind). It seems to me like your choice to focus on dynamics at this level of intelligence, while potentially highly worthwhile, rests on a bunch of empirical beliefs about how the development of AI will play out that are pretty absent in this more abstract, philosophical treatise.
I have many more thoughts and disagreements with this and related works, I hope to write a more thorough response sometime, but still, really glad to read it, thank you!