Stephen Fowler comments on The Field of AI Alignment: A Postmortem, and What To Do About It

Stephen Fowler 27 Dec 2024 3:05 UTC
32 points
9
Robin Hanson recently wrote about two dynamics that can emerge among individuals within an organisations when working as a group to reach decisions. These are the “outcome game” and the “consensus game.”
In the outcome game, individuals aim to be seen as advocating for decisions that are later proven correct. In contrast, the consensus game focuses on advocating for decisions that are most immediately popular within the organization. When most participants play the consensus game, the quality of decision-making suffers.
The incentive structure within an organization influences which game people play. When feedback on decisions is immediate and substantial, individuals are more likely to engage in the outcome game. Hanson argues that capitalism’s key strength is its ability to make outcome games more relevant.
However, if an organization is insulated from the consequences of its decisions or feedback is delayed, playing the consensus game becomes the best strategy for gaining resources and influence.
This dynamic is particularly relevant in the field of (existential) AI Safety, which needs to develop strategies to mitigate risks before AGI is developed. Currently, we have zero concrete feedback about which strategies can effectively align complex systems of equal or greater intelligence to humans.
As a result, it is unsurprising that most alignment efforts avoid tackling seemingly intractable problems. The incentive structures in the field encourage individuals to play the consensus game instead.
- TsviBT 27 Dec 2024 11:19 UTC
  17 points
  6
  Parent
  
  Currently, we have zero concrete feedback about which strategies can effectively align complex systems of equal or greater intelligence to humans.
  
  Actually, I now suspect this is to a significant extent disinformation. You can tell when ideas make sense if you think hard about them. There’s plenty of feedback, that’s not already being taken advantage of, at the level of “abstract, high-level, philosophy of mind”, about the questions of alignment.
  - philh 4 Jan 2025 0:54 UTC
    4 points
    0
    Parent
    That’s not really “concrete” feedback though, right? In the outcome game/consensus game dynamic Stephen’s talking about, it seems hard to play an outcome game with that kind of feedback.
    - TsviBT 4 Jan 2025 1:31 UTC
      9 points
      4
      Parent
      I’m not sure what “concrete” is supposed to mean; for the one or two senses I immediately imagine, no, I would say the feedback is indeed concrete. In terms of consensus/outcome, no, I think the feedback is actually concrete. There is a difficulty, which is that there’s a much smaller set of people to whom the outcomes are visible.
      
      As an analogy/example: feedback in higher math. It’s “nonconcrete” in that it’s “just verbal arguments” (and translating those into something much more objective, like a computer proof, is a big separate long undertaking). And there’s a much smaller set of people who can tell what statements are true in the domain. There might even be a bunch more people who have opinions, and can say vaguely related things that other non-experts can’t distinguish from expert statements, and who therefore form an apparent consensus that’s wrong + ungrounded. But one shouldn’t conclude from those facts that math is less real, or less truthtracking, or less available for communities to learn about directly.
- stavros 27 Dec 2024 11:40 UTC
  3 points
  0
  Parent
  Thanks for linking this post. I think it has a nice harmony with Prestige vs Dominance status games.
  I agree that this is a dynamic that is strongly shaping AI Safety, but would specify that it’s inherited from the non-profit space in general—EA originated with the claim that it could do outcome focused altruism, but.. there’s still a lot of room for improvement, and I’m not even sure we’re improving.
  The underlying dynamics and feedback loops are working against us, and I don’t see evidence that core EA funders/orgs are doing more than pay lip service to this problem.