I don’t think my view has changed too much (I don’t work in the area so don’t pay as much attention or think about it as often as I might like).
The main updates have been:
At the time of that interview I think it was public that Interval Bound Propagation was competitive with other verification methods for perturbation robustness, but I wasn’t aware of that and definitely hadn’t reflected on it. I think this makes other verification schemes seem somewhat less impressive / it’s less likely they are addressing the hard parts of the problem we ultimately need to solve. I haven’t really talked about this with researchers in the area so am not sure if it’s the right conclusion.
Since then my vague sense is that robustness research has continued to make progress but that there aren’t promising new ideas in verification. This isn’t a huge update since not many people are working on verification with an eye towards alignment applications (and the rate of progress has always been slow / close to zero) but it’s still a somewhat negative update.
I don’t think my view has changed too much (I don’t work in the area so don’t pay as much attention or think about it as often as I might like).
The main updates have been:
At the time of that interview I think it was public that Interval Bound Propagation was competitive with other verification methods for perturbation robustness, but I wasn’t aware of that and definitely hadn’t reflected on it. I think this makes other verification schemes seem somewhat less impressive / it’s less likely they are addressing the hard parts of the problem we ultimately need to solve. I haven’t really talked about this with researchers in the area so am not sure if it’s the right conclusion.
Since then my vague sense is that robustness research has continued to make progress but that there aren’t promising new ideas in verification. This isn’t a huge update since not many people are working on verification with an eye towards alignment applications (and the rate of progress has always been slow / close to zero) but it’s still a somewhat negative update.
Thank you for your answer, and good luck with the Alignment Research Center.