If dropping competitiveness, what counts as a solution? Is “imitate a human, but run it fast” fair game? We could try to hash out the details in something along those lines, and I think that’s worthwhile, but I don’t think it’s a top priority and I don’t think the difficulties will end up being that similar. I think it may be productive to relax the competitiveness requirement (e.g. to allow solutions that definitely have at most a polynomial slowdown), but probably not a good idea to eliminate it altogether.
If dropping competitiveness, what counts as a solution?
I’m not sure, but mainly because I’m not sure what counts as a solution to your problem. If we had a specification of that, couldn’t we just remove the parts that deal with competitiveness?
Is “imitate a human, but run it fast” fair game?
I guess not, because a human imitation might have selfish goals and not be intent aligned to the user?
We could try to hash out the details in something along those lines, and I think that’s worthwhile, but I don’t think it’s a top priority and I don’t think the difficulties will end up being that similar.
What about my suggestion of hashing the details of how to implement IDA/DEBATE using Opt and then seeing if we can decide whether or not it’s aligned?
If dropping competitiveness, what counts as a solution? Is “imitate a human, but run it fast” fair game? We could try to hash out the details in something along those lines, and I think that’s worthwhile, but I don’t think it’s a top priority and I don’t think the difficulties will end up being that similar. I think it may be productive to relax the competitiveness requirement (e.g. to allow solutions that definitely have at most a polynomial slowdown), but probably not a good idea to eliminate it altogether.
I’m not sure, but mainly because I’m not sure what counts as a solution to your problem. If we had a specification of that, couldn’t we just remove the parts that deal with competitiveness?
I guess not, because a human imitation might have selfish goals and not be intent aligned to the user?
What about my suggestion of hashing the details of how to implement IDA/DEBATE using Opt and then seeing if we can decide whether or not it’s aligned?