I’m curious what’s Chris’s best guess (or anyone else’s) about where to place AlphaGo Zero on that diagram
Without the ability to poke around at AlphaGo—and a lot of time to invest in doing so—I can only engage in wild speculation. It seems like it must have abstractions that human Go players don’t have or anticipate. This is true of even vanilla vision models before you invest lots of time in understanding them (I’ve learned more than I ever needed to about useful features for distinguishing dog species from ImageNet models).
But I’d hope the abstractions are in a regime where, with effort, humans can understand them. This is what I expect the slope downwards as we move towards “alien abstractions” to look like: we’ll see abstractions that are extremely useful if you can internalize them, but take more and more effort to understand.
Is there an implicit assumption here that RL agents are generally more dangerous than models that are trained with (un)supervised learning?
Yes, I believe that RL agents have a much wider range of accident concerns than supervised / unsupervised models.
Later the OP contrasts microscopes with oracles, so perhaps Chris interprets a microscope as a model that is smaller, or otherwise somehow restricted, s.t. we know it’s safe?
Gurkenglas provided a very eloquent description that matches why I believe this. I’ll continue discussion of this in that thread. :)
Yes, I believe that RL agents have a much wider range of accident concerns than supervised / unsupervised models.
Is there anything that prevents them from being used as microscopes though? Presumably you can still inspect the models it has learned without using it as an agent (after it’s been trained). Or am I missing something?
Without the ability to poke around at AlphaGo—and a lot of time to invest in doing so—I can only engage in wild speculation. It seems like it must have abstractions that human Go players don’t have or anticipate. This is true of even vanilla vision models before you invest lots of time in understanding them (I’ve learned more than I ever needed to about useful features for distinguishing dog species from ImageNet models).
But I’d hope the abstractions are in a regime where, with effort, humans can understand them. This is what I expect the slope downwards as we move towards “alien abstractions” to look like: we’ll see abstractions that are extremely useful if you can internalize them, but take more and more effort to understand.
Yes, I believe that RL agents have a much wider range of accident concerns than supervised / unsupervised models.
Gurkenglas provided a very eloquent description that matches why I believe this. I’ll continue discussion of this in that thread. :)
Is there anything that prevents them from being used as microscopes though? Presumably you can still inspect the models it has learned without using it as an agent (after it’s been trained). Or am I missing something?