Thesis: Everything is alignment-constrained, nothing is capabilities-constrained.
Examples:
“Whenever you hear a headline that a medication kills cancer cells in a petri dish, remember that so does a gun.” Healthcare is probably one of the biggest constraints on humanity, but the hard part is in coming up with an intervention that precisely targets the thing you want to treat, I think often because knowing what exactly that thing is is hard.
Housing is also obviously a huge constraint, mainly due to NIMBYism. But the idea that NIMBYism is due to people using their housing for investments seems kind of like a cope, because then you’d expect that when cheap housing gets built, the backlash is mainly about dropping investment value. But the vibe I get is people are mainly upset about crime, smells, unruly children in schools, etc., due to bad people moving in. Basically high housing prices function as a substitute for police, immigration rules and teacher authority, and those in turn are compromised less because we don’t know how to e.g. arm people or discipline children, and more because we aren’t confident enough about the targeting (alignment problem), and because we have a hope that bad people can be reformed if we could just solve what’s wrong with them (again an alignment problem, because that requires defining what’s wrong with them).
Education is expensive and doesn’t work very well; a major constraint on society. Yet those who get educated do get given exams which assess whether they’ve picked up stuff from the education, and they perform reasonably well. Seems a substantial part of the issue is that they get educated in the wrong things, an alignment problem.
American GDP is the highest it’s ever been, yet its elections are devolving into choosing between scammers. It’s not even a question of ignorance, since it’s pretty well-known that it’s scammy (consider also that patriotism is at an all-time low).
Exercise: Think about some tough problem, then think about what capabilities you need to solve that problem, and whether you even know what the problem is well enough that you can pick some relevant capabilities.
(Certifications and regulations promise to solve this, but they face the same problem: they don’t know what requirements to put up, an alignment problem.)
As another example: in principle, one could make a web server use an LLM connected to database to serve any requests, not coding anything. It would even work… till the point someone would convince the model to rewrite the database to their whims! (A second problem is that normal site should be focused on something, in line with famous “if you can explain anything, your knowledge is zero”.)
Reading this made me think that the framing “Everything is alignment-constrained, nothing is capabilities-constrained.” is a rathering and that a more natural/joint-carving framing is:
To the extent that you can get capabilities by your own means (rather than hoping for reality to give you access to a new pool of some resource or whatever), you get them by getting various things to align so that they produce those capabilities.
I think the big thing that makes multi-alignment disproportionately hard in a way that isn’t the case for the alignment problem of AI being aligned to a single person, is due to the lack of a ground truth, combined with severe enough value conflicts being common enough that alignment is probably conceptually impossible, and the big reason our society stays stable is precisely because people depend on each other for their lives, and one of the long-term effects of AI is to make at least a few people no longer be dependent on others for long, healthy lives, which predicts that our society will increasingly no longer matter to powerful actors that set up their own nations, ala seasteading.
I basically agree with this, and one of the more important effects of AI very deep into takeoff is that we will start realizing that a lot of human alignment relied on the fact that people were dependent on each other, and that a person is dependent on society, so societal coercion like laws/police mostly work, which AI more or less breaks, and there is no reason to assume that a lot of people wouldn’t be paper-clippers relative to each other if they didn’t need society.
To be clear, I still expect some level of cooperation, due to the existence of very altruistic people, but yeah the reduction of positive sum trades between different values, combined with a lot of our value systems only tolerating other value systems in contexts where we need other people will make our future surprisingly dark compared to what people usually think due to “most humans being paperclippers relative to each other [in the supposed reflective limit]”.
Thesis: Everything is alignment-constrained, nothing is capabilities-constrained.
Examples:
“Whenever you hear a headline that a medication kills cancer cells in a petri dish, remember that so does a gun.” Healthcare is probably one of the biggest constraints on humanity, but the hard part is in coming up with an intervention that precisely targets the thing you want to treat, I think often because knowing what exactly that thing is is hard.
Housing is also obviously a huge constraint, mainly due to NIMBYism. But the idea that NIMBYism is due to people using their housing for investments seems kind of like a cope, because then you’d expect that when cheap housing gets built, the backlash is mainly about dropping investment value. But the vibe I get is people are mainly upset about crime, smells, unruly children in schools, etc., due to bad people moving in. Basically high housing prices function as a substitute for police, immigration rules and teacher authority, and those in turn are compromised less because we don’t know how to e.g. arm people or discipline children, and more because we aren’t confident enough about the targeting (alignment problem), and because we have a hope that bad people can be reformed if we could just solve what’s wrong with them (again an alignment problem, because that requires defining what’s wrong with them).
Education is expensive and doesn’t work very well; a major constraint on society. Yet those who get educated do get given exams which assess whether they’ve picked up stuff from the education, and they perform reasonably well. Seems a substantial part of the issue is that they get educated in the wrong things, an alignment problem.
American GDP is the highest it’s ever been, yet its elections are devolving into choosing between scammers. It’s not even a question of ignorance, since it’s pretty well-known that it’s scammy (consider also that patriotism is at an all-time low).
Exercise: Think about some tough problem, then think about what capabilities you need to solve that problem, and whether you even know what the problem is well enough that you can pick some relevant capabilities.
(Certifications and regulations promise to solve this, but they face the same problem: they don’t know what requirements to put up, an alignment problem.)
An interesting framing! I agree with it.
As another example: in principle, one could make a web server use an LLM connected to database to serve any requests, not coding anything. It would even work… till the point someone would convince the model to rewrite the database to their whims! (A second problem is that normal site should be focused on something, in line with famous “if you can explain anything, your knowledge is zero”.)
Reading this made me think that the framing “Everything is alignment-constrained, nothing is capabilities-constrained.” is a rathering and that a more natural/joint-carving framing is:
Partially disagree, but only partially.
I think the big thing that makes multi-alignment disproportionately hard in a way that isn’t the case for the alignment problem of AI being aligned to a single person, is due to the lack of a ground truth, combined with severe enough value conflicts being common enough that alignment is probably conceptually impossible, and the big reason our society stays stable is precisely because people depend on each other for their lives, and one of the long-term effects of AI is to make at least a few people no longer be dependent on others for long, healthy lives, which predicts that our society will increasingly no longer matter to powerful actors that set up their own nations, ala seasteading.
More below:
https://www.lesswrong.com/posts/dHNKtQ3vTBxTfTPxu/what-is-the-alignment-problem#KmqfavwugWe62CzcF
Or this quote by me: