I don’t know that there is necessarily a lot of overlap in the vocabularies of these two efforts… yet? But the pictures in my head suggest that the math might not actually be that different. There’s going to be a lot of “lines and boundaries and paths in a high dimensional space” mixed with fuzzing operations, to try to regularize things until the math itself starts to better match out intuitions around meaning and safety and so on.
I’m impressed by Gaziv et al’s “adversarial examples that work on humans” enough to not pause and carefully read the paper, but rather to speculate on how it could be a platform for building things :-)
The specific thing that jumped to mind is Davidad’s current request for proposals looking to build up formal languages within which to deploy “imprecise probability” formalisms such that AI system outputs could come with proofs about safely hitting human expressible goals, in these languages, like “solve global warming” while still “avoiding extinction, genocide, poverty, or other dystopian side effects”.
I don’t know that there is necessarily a lot of overlap in the vocabularies of these two efforts… yet? But the pictures in my head suggest that the math might not actually be that different. There’s going to be a lot of “lines and boundaries and paths in a high dimensional space” mixed with fuzzing operations, to try to regularize things until the math itself starts to better match out intuitions around meaning and safety and so on.