It still seems like we mainly agree, but might be having a communication gap.
In your Client example in your most recent comment, the reason this is a bad approach is that The Client is far less likely to be able to verify a line-and-color verification plan accurately than to verify whether a concrete design is what she was envisioning. She already has a great verification strategy available—making or eyeballing drawings, proposing concrete changes, and iterating—and she and The Expert are just failing to use it.
In technical AI alignment, we unfortunately don’t have any equivalent to “just eyeballing things.” Bad solutions can seem intuitively compelling, and qualitative objections to proposed alignment schemes won’t satisfy profit-oriented businesses eager to cash in on new AI systems. We also can’t “just have the AI do it,” for the same reason—how would we validate any solutions it came up with? Surely “just have the AI do it” isn’t the right answer to “what if the AI can’t prove its technical AI solution is correct.”
My contention is that there may already be facets of AI alignment work that can be successfully outsourced to AI, precisely because we are already able to adequately validate them. For example, I can have ChatGPT come up with and critique ELK solutions. If the ELK contest were still running, I could then submit those solutions, and they would receive the same level of validation that human-proposed solutions achieve. That is why it’s possible to outsource the generation of new potential ELK solutions both to humans and to AI. If that field is bottlenecked by the need to brainstorm and critique solutions, and if ChatGPT can do that work faster and better than a human, then we can outsource that specific form of labor to it.
But in areas where we don’t have any meaningful verification solutions, then we can’t outsource, either to humans or to AI. We might have trouble even explaining what the problem is, or motivating capable people of working on it (like how we’ve failed/never even tried to recruit Terry Tao to alignment work because he loves prime numbers so very much and isn’t that interested in money or Silicon Valley status). Omni-capable AI alignment researchers will have to come up with those verification strategies, validate each others’ work, and then, hopefully, make their validation tools legible enough that less-expert people can follow the proof as well until everybody is satisfied.
It still seems like we mainly agree, but might be having a communication gap.
In your Client example in your most recent comment, the reason this is a bad approach is that The Client is far less likely to be able to verify a line-and-color verification plan accurately than to verify whether a concrete design is what she was envisioning. She already has a great verification strategy available—making or eyeballing drawings, proposing concrete changes, and iterating—and she and The Expert are just failing to use it.
In technical AI alignment, we unfortunately don’t have any equivalent to “just eyeballing things.” Bad solutions can seem intuitively compelling, and qualitative objections to proposed alignment schemes won’t satisfy profit-oriented businesses eager to cash in on new AI systems. We also can’t “just have the AI do it,” for the same reason—how would we validate any solutions it came up with? Surely “just have the AI do it” isn’t the right answer to “what if the AI can’t prove its technical AI solution is correct.”
My contention is that there may already be facets of AI alignment work that can be successfully outsourced to AI, precisely because we are already able to adequately validate them. For example, I can have ChatGPT come up with and critique ELK solutions. If the ELK contest were still running, I could then submit those solutions, and they would receive the same level of validation that human-proposed solutions achieve. That is why it’s possible to outsource the generation of new potential ELK solutions both to humans and to AI. If that field is bottlenecked by the need to brainstorm and critique solutions, and if ChatGPT can do that work faster and better than a human, then we can outsource that specific form of labor to it.
But in areas where we don’t have any meaningful verification solutions, then we can’t outsource, either to humans or to AI. We might have trouble even explaining what the problem is, or motivating capable people of working on it (like how we’ve failed/never even tried to recruit Terry Tao to alignment work because he loves prime numbers so very much and isn’t that interested in money or Silicon Valley status). Omni-capable AI alignment researchers will have to come up with those verification strategies, validate each others’ work, and then, hopefully, make their validation tools legible enough that less-expert people can follow the proof as well until everybody is satisfied.
Ah, I see what you’re saying now.