It doesn’t seem very surprising to me that a serious problem has already been addressed to the extent that it’s true that both 1) it’s very hard to make any further progress on the problem and 2) the remaining cost from not fully solving the problem can be lived with.
The obvious thing that’s missing is the intermediate stance of “this is probably a big pervasive problem, and we should try at all to fix it by the obvious means before giving up.”
It seems to me that people like political scientists, business leaders, and economists have been attacking the problem for a while, so it doesn’t seem that likely there’s a lot of low hanging fruit to be found by “obvious means”. I have some more hope that the situation with AI alignment is different enough from what people thought about in the past (e.g., a lot of people involved are at least partly motivated by altruism compared to the kinds of people described in Moral Mazes) that you can make progress on credit assigning as applied to AI alignment, but you still seem to be too optimistic.
What are a couple clear examples of people trying to fix the problem locally in an integrated way, rather than just talking about the problem or trying to fix it at scale using corrupt power structures for enforcement?
It seems to me like the nearest thing to a direct attempt was the Quakers. As far as I understand, while they at least tried to coordinate around high-integrity discourse, they put very little work into explicitly modeling the problem of adversarial behavior or developing robust mechanisms for healing or routing around damage to shared information processing.
I’d have much more hope about existing AI alignment efforts if it seemed like what we’ve learned so far had been integrated into the coordination methods of AI safety orgs, and technical development were more focused on current alignment problems.
It doesn’t seem very surprising to me that a serious problem has already been addressed to the extent that it’s true that both 1) it’s very hard to make any further progress on the problem and 2) the remaining cost from not fully solving the problem can be lived with.
It seems to me that people like political scientists, business leaders, and economists have been attacking the problem for a while, so it doesn’t seem that likely there’s a lot of low hanging fruit to be found by “obvious means”. I have some more hope that the situation with AI alignment is different enough from what people thought about in the past (e.g., a lot of people involved are at least partly motivated by altruism compared to the kinds of people described in Moral Mazes) that you can make progress on credit assigning as applied to AI alignment, but you still seem to be too optimistic.
What are a couple clear examples of people trying to fix the problem locally in an integrated way, rather than just talking about the problem or trying to fix it at scale using corrupt power structures for enforcement?
It seems to me like the nearest thing to a direct attempt was the Quakers. As far as I understand, while they at least tried to coordinate around high-integrity discourse, they put very little work into explicitly modeling the problem of adversarial behavior or developing robust mechanisms for healing or routing around damage to shared information processing.
I’d have much more hope about existing AI alignment efforts if it seemed like what we’ve learned so far had been integrated into the coordination methods of AI safety orgs, and technical development were more focused on current alignment problems.