Vanessa Kosoy comments on Principles of Privacy for Alignment Research

Vanessa Kosoy 29 Jul 2022 7:45 UTC
LW: 7 AF: 5
0
AF

Our work doesn’t necessarily need wide memetic spread to be found by the people who know what to look for. E.g. people playing through the alignment game tree are a lot more likely to realize that ontology identification, grain-of-truth, value drift, etc, are key questions to ask, whereas ML researchers just pushing toward AGI are a lot less likely to ask those questions.

That’s a valid argument, but I can also imagine groups that (i) in a world where alignment research is obscure proceed to create unaligned AGI (ii) in a world where alignment research is famous, use this research when building their AGI. Maybe any such group would be operationally inadequate anyway, but I’m not sure. More generally, it’s possible that in a world where alignment research is a well-known respectable field of study, more people take AI risk seriously.

...I do expect there to be at least some steps which need a fairly large alignment community doing “normal” (i.e. paradigmatic) incremental research. For instance, on some paths we need lots of people doing incremental interpretability/ontology research to link up lots of concepts to their representations in a trained system. On the other hand, not all of the foundations need to be very widespread—e.g. in the case of incremental interpretability/ontology research, it’s mostly the interpretability tools which need memetic reach, not e.g. theory around grain-of-truth or value drift.

I think I have a somewhat different model of the alignment knowledge tree. From my perspective, the research I’m doing is already paradigmatic. I have a solid-enough paradigm, inside which there are many open problems, and what we need is a bunch of people chipping away at these open problems. Admittedly, the size of this “bunch” is still closer to 10 people than to 1000 people but (i) it’s possible that the open problems will keep multiplying hydra-style, as often happens in math and (ii) memetic fitness would help getting the very best 10 people to do it.

It’s also likely that there will be a “phase II” where the nature of the necessary research becomes very different (e.g. it might involve combining the new theory with neuroscience, or experimental ML research, or hardware engineering), and successful transition to this phase might require getting a lot of new people on board which would also be a lot easier given memetic fitness.