VojtaKovarik comments on Can we get an AI to “do our alignment homework for us”?

VojtaKovarik 26 Feb 2024 22:42 UTC
3 points
2
I feel a bit confused about your comment: I agree with each individual claim, but I feel like perhaps you meant to imply something beyond just the individual claims. (Which I either don’t understand or perhaps disagree with.)
Are you saying something like: “Yeah, I think that while this plan would work in theory, I expect it to be hopeless in practice (or unneccessary because the homework wasn’t hard in the first place).”?
If yes, then I agree—but I feel that of the two questions, “would the plan work in theory” is the much less interesting one. (For example, suppose that OpenAI could in theory use AI to solve alignment in 2 years. Then this won’t really matter unless they can refrain from using that same AI to build misaligned superintelligence in 1.5 years. Or suppose the world could solve AI alignment if the US government instituted a 2-year moratorium on AI research—then this won’t really matter unless the US government actually does that.)
- ryan_greenblatt 26 Feb 2024 22:49 UTC
  4 points
  6
  Parent
  I just think that these are important concepts to distinguish because I think it’s useful to notice the extent to which problems could be solved by moderate amount of coordination and which asks could suffice for safety.
  
  I wasn’t particularly trying to make a broader claim, just trying to highlight something that seemed important.
  
  My overall guess is that people paying costs equivalent to 2 years of delay for existential safety reasons is about 50% likely. (Though I’m uncertain overall and this is possible to influence.) Thus, ensuring that the plan for spending that budget is as good as possible looks quite good. And not hopeless overall.
  
  By analogy, note that google bears substantial costs to improve security (e.g. running 10% slower).
  
  I think that if we could ensure the implementation of our best safety plans which just cost a few years of delay, we’d be in a much better position.