I think this post would benefit from being more explicit on its target. This problem concerns AGI labs and their employees on one hand, and anyone trying to build a solution to Alignment/AI Safety on the other.
By narrowing the scope to the labs, we can better evaluate the proposed solutions (for example to improve decision making we’ll need to influence decision makers therein), make them more focused (to the point of being lab specific, analyzing each’s pressures), and think of new solutions (inoculating ourselves/other decision makers on AI about believing stuff that come from those labs by adding a strong dose of healthy skepticism).
By narrowing the scope to people working on AI Safety who’s status or monetary support relies on giving impressions of progress, we come up with different solutions (try to explicitly reward honesty, truthfulness, clarity over hype and story making). A general recommendation I’d have is to have some kind of reviews that check against “Wizard of Oz’ing” for flagging the behavior and suggesting corrections. Currently I’d say the diversity of LW and norms for truth seeking are doing quite well at this, so posting on here publicly is a great way to control this. It highlights the importance of this place and of upkeeping these norms.
I think this post would benefit from being more explicit on its target. This problem concerns AGI labs and their employees on one hand, and anyone trying to build a solution to Alignment/AI Safety on the other.
By narrowing the scope to the labs, we can better evaluate the proposed solutions (for example to improve decision making we’ll need to influence decision makers therein), make them more focused (to the point of being lab specific, analyzing each’s pressures), and think of new solutions (inoculating ourselves/other decision makers on AI about believing stuff that come from those labs by adding a strong dose of healthy skepticism).
By narrowing the scope to people working on AI Safety who’s status or monetary support relies on giving impressions of progress, we come up with different solutions (try to explicitly reward honesty, truthfulness, clarity over hype and story making). A general recommendation I’d have is to have some kind of reviews that check against “Wizard of Oz’ing” for flagging the behavior and suggesting corrections. Currently I’d say the diversity of LW and norms for truth seeking are doing quite well at this, so posting on here publicly is a great way to control this. It highlights the importance of this place and of upkeeping these norms.