There are reasons to think that an AI is aligned between “hoping it is aligned” and “having a formal proof that it is aligned”. For example, we might be able to find sufficiently strong selection theorems, which tell us that certain types of optima tend to be chosen, even if we can’t prove theorems with certainty. We also might be able to find a working ELK strategy that gives us interpretability.
These might not be good strategies, but the statement “Therefore no AI built by current methods can be aligned” seems far too strong.
There are reasons to think that an AI is aligned between “hoping it is aligned” and “having a formal proof that it is aligned”. For example, we might be able to find sufficiently strong selection theorems, which tell us that certain types of optima tend to be chosen, even if we can’t prove theorems with certainty. We also might be able to find a working ELK strategy that gives us interpretability.
These might not be good strategies, but the statement “Therefore no AI built by current methods can be aligned” seems far too strong.