There’s a related dynamic that came up in a convo I just had.
Alice: My current work is exploring if we can solve value loading using reward learning.
Bob: Woah, isn’t that obviously doomed? Didn’t Rohin write a whole sequence on this?
Alice: Well, I don’t want to solve the whole problem for arbitrary difficulty. I just want to know whether we can build something that gets the basics right in distributions that a present day human can understand. For example I reckon we may be able to teach an AI what murder is today, even if we can’t teach it what murder is post-singularity.
Bob: I see. That’s more reasonable. However, there is a school of thought that I suspect you may be a part of, called “slow takeoff”, that looks like over the course of maybe 10 years the world will increasingly be reliant on ML systems whose internals we don’t understand, and whose actions we don’t understand (we just see the metrics go up). It is world is already out of distribution for what a present day human can understand.
Alice: That’s true. I guess I’m interested to know how far we can push it and still build a system that helps us take pivotal acts and doesn’t destroy everything.
(The above is my paraphrase. Both said different things than the above and also I incorporated some things I said.)
The success story you have in mind determines what problem you’re trying to solve, and to some extent which capability regime you’re thinking about.
There’s a related dynamic that came up in a convo I just had.
(The above is my paraphrase. Both said different things than the above and also I incorporated some things I said.)
The success story you have in mind determines what problem you’re trying to solve, and to some extent which capability regime you’re thinking about.