thenoviceoof comments on Who determines whether an alignment proposal is the definitive alignment solution?

thenoviceoof 6 Oct 2023 2:02 UTC
4 points
3
Thoughts on the different sub-questions, from someone that doesn’t professionally work in AI safety:
- “Who is responsible?” Legally, no one has this responsibility (say, in the same way that the FDA is legally responsible for evaluating drugs). Hopefully in the near future, if you’re in the UK the UK AI task force will be competent and have jurisdiction/a mandate to do so, and even more hopefully more countries will have similar organizations (or an international organization exists).
- Alternative “responsible” take: I’m sure if you managed to get the attention of OpenAI / DeepMind / Anthropic safety teams with an actual alignment plan and it held up to cursory examination, they would consider it their personal responsibility to humanity to evaluate it more rigorously. In other words, it might be good to define what you mean by responsibility (are we trying to find a trusted arbiter? Find people that are competent to do the evaluation? Find a way to assign blame if things go wrong? Ideally these would all be the same person/organization, but it’s not guaranteed).
- “Is LessWrong the platform for [evaluating alignment proposals]?” In the future, I sure hope not. If LW is still the best place to do evaluations when alignment is plausibly solvable, then… things are not going well. A negative/do-not-launch evaluation means nothing without the power to do something about it, and LessWrong is just an open collaborative blogging platform and has very little actual power.
- That said, LessWrong (or the Alignment Forum) is probably the best current discussion place for alignment evaluation ideas.
- “Is there a specialized communication network[?]” I’ve never heard of such a network, unless you include simple gossip. Of course, the PhDs might be hiding one from all non-PhDs, but it seems unlikely.
- ″… demonstrate the solution in a real-world setting...” It needs to be said, please do not run potentially dangerous AIs *before* the review step.
- ″… have it peer reviewed.” If we shouldn’t share evaluation details due to capability concerns (I reflexively agree, but haven’t thought too deeply about it), this makes LessWrong a worse platform for evaluations, since it’s completely open, both for access and to new entrants.
- MiguelDev 6 Oct 2023 2:44 UTC
  1 point
  0
  Parent
  Unfortunately, I’m not based in the UK. However, the UK government’s prioritization of the alignment problem is commendable, and I hope their efforts continue to yield positive results.
  (are we trying to find a trusted arbiter? Find people that are competent to do the evaluation? Find a way to assign blame if things go wrong? Ideally these would all be the same person/organization, but it’s not guaranteed).
  Unfortunately, I’m not based in the UK. However, the UK government’s prioritization of the alignment problem is commendable, and I hope their efforts continue to yield positive results.
  (Are we attempting to identify a trusted mediator? Are we seeking individuals competent enough for evaluation? Or are we trying to establish a mechanism to assign accountability should things go awry? Ideally, all these roles would be fulfilled by the same entity or individual, but it’s not necessarily the case.)
  I understand your point, but it seems that we need a specific organization or team designed for such operations. Why did I pose the question initially? I’ve developed a prototype for a shutdown mechanism, which involves a potentially hazardous step. This prototype requires assessment by a reliable and skilled team. From my observations of discussions on LW, it appears there’s a “clash of agendas” that takes precedence over the principle of “preserving life on earth.” Consequently, this might not be the right platform to share anything of a hazardous nature.
  Thank you for taking the time to respond to my inquiry.