Charlie Steiner comments on The Kindness Project

Charlie Steiner 22 Feb 2022 3:03 UTC
2 points
I’m genuinely not sure how useful this would be. So I think we should maybe try to think about some high-value information that you might try to learn.
The way I imagine this might be useful is in trying to do near to medium term AI alignment on language models. Then having a lot of highly ethical text lying around might be good data to learn from. But if the AI is clever, it might not need specially labeled examples that really spell out the ethical implications—it might be able to learn about humans while observing more complicated situations.
Also, I’m personally skeptical that fine-tuning only on object-level ethics-relevant text is what we need to work on in the near term. At the very least, I’m interested in trying to learn and apply human “meta-preferences”—our preferences about how we would want an observer to think of our preferences, what we wish we were like, how we go about experiencing moral change and growth, times we’ve felt misunderstood, that sort of thing.
But I say this in spite of people actively working on this sort of thing at places like the Allen Institute for AI and Redwood Research. So the opinions of other people are definitely important here—it’s not the average opinion that counts, it’s the opinion of whoever’s most excited.