I want to mention that a proposed impossible problem was pretty close to being solved by Anthropic, if not solved outright, and very critically neither Eliezer or anyone at MIRI noticed that a proposed AI alignment problem was possible to solve, when they claimed that it was basically impossible to solve.
“It won’t understand language until it’s already superintelligent.” stands out to me in that it was considered an impossible problem that ordinary capabilities research just solved outright, with no acknowledgement something ‘impossible’ had occurred.
You can quibble over the word ‘impossible’, but it was generally accepted that the first big insurmountable barrier is that there is simply no good way to encode concepts like ‘happiness’ in their full semantic richness without ASI already built at which point it doesn’t care.
And in case one is tempted to say “well, you still can’t meaningfully align AI systems by defining things we want in terms of high level philosophical paraphrases” I remind you that constitutional AI exists, which does just that:
I want to mention that a proposed impossible problem was pretty close to being solved by Anthropic, if not solved outright, and very critically neither Eliezer or anyone at MIRI noticed that a proposed AI alignment problem was possible to solve, when they claimed that it was basically impossible to solve.
Three tweets illustrates it pretty well:
https://twitter.com/jd_pressman/status/1709355851457479036
https://twitter.com/jd_pressman/status/1709358430128152658
https://twitter.com/jd_pressman/status/1709362209024033210