I think I disagree with lots of things in this post, sometimes in ways that partly cancel each other out.
Parts of generalizing correctly involve outer alignment. I.e. building objective functions that have “something to say” about how humans want the AI to generalize.
Relatedly, outer alignment research is not done, and RLHF/P is not the be-all-end-all.
I think we should be aiming to build AI CEOs (or more generally, working on safety technology with an eye towards how it could be used in AGI that skillfully navigates the real world). Yes, the reality of the game we’re playing with gung-ho orgs is more complicated, but sometimes, if you don’t, someone else really will.
Getting AI systems to perform simpler behaviors safely also looks like capabilities research. When you say “this will likely require improving sample efficiency,” a bright light should flash. This isn’t a fatal problem—some amount of advancing capabilities is just a cost of doing business. There exists safety research that doesn’t advance capabilities, but that subset has a lot of restrictions on it (little connection to ML being the big one). Rather than avoiding ever advancing AI capabilities, we should acknowledge that fact in advance and try to make plans that account for it.
Agree with (1) and (2). I am ambivalent RE (3) and the replaceability arguments. RE (4): I largely agree, but I think the norm should be “let’s try to do less ambitious stuff properly” rather than “let’s try to do the most ambitious stuff we can, and then try and figure out how to do it as safely as possible as a secondary objective”.
I think I disagree with lots of things in this post, sometimes in ways that partly cancel each other out.
Parts of generalizing correctly involve outer alignment. I.e. building objective functions that have “something to say” about how humans want the AI to generalize.
Relatedly, outer alignment research is not done, and RLHF/P is not the be-all-end-all.
I think we should be aiming to build AI CEOs (or more generally, working on safety technology with an eye towards how it could be used in AGI that skillfully navigates the real world). Yes, the reality of the game we’re playing with gung-ho orgs is more complicated, but sometimes, if you don’t, someone else really will.
Getting AI systems to perform simpler behaviors safely also looks like capabilities research. When you say “this will likely require improving sample efficiency,” a bright light should flash. This isn’t a fatal problem—some amount of advancing capabilities is just a cost of doing business. There exists safety research that doesn’t advance capabilities, but that subset has a lot of restrictions on it (little connection to ML being the big one). Rather than avoiding ever advancing AI capabilities, we should acknowledge that fact in advance and try to make plans that account for it.
(A very quick response):
Agree with (1) and (2).
I am ambivalent RE (3) and the replaceability arguments.
RE (4): I largely agree, but I think the norm should be “let’s try to do less ambitious stuff properly” rather than “let’s try to do the most ambitious stuff we can, and then try and figure out how to do it as safely as possible as a secondary objective”.