Dakara comments on If we solve alignment, do we die anyway?

Dakara 19 Nov 2024 18:48 UTC
1 point
0
That’s pretty interesting, I do think that if iterative alignment strategy ends up working, then this will probably end up working too (if nothing else, then because this seems much easier).

I have some concerns left about iterative alignment strategy in general, so I will try to write them down below.

EDIT: On the second thought, I might create a separate question for it (and link it here), for the benefit of all of the people who concerned about the things (or similar things) that I am concerned about.
- Seth Herd 19 Nov 2024 19:06 UTC
  3 points
  1
  Parent
  That would be great. Do reference scalable oversight to show you’ve done some due diligence before asking to have it explained. If you do that, I think it would generate some good discussion.
  - Dakara 19 Nov 2024 19:26 UTC
    1 point
    0
    Parent
    Sure, I might as well ask my question directly about scalable oversight, since it seems like a leading strategy of iterative alignment anyways. I do have one preliminary question (which probably isn’t worthy of being included in that post, given that it doesn’t ask about a specific issue or threat model, but rather about expectations of people).
    
    I take it that this strategy relies on evaluation being easier than coming up with research? Do you expect this to be the case?