Noosphere89 comments on Evaluating the historical value misspecification argument

Noosphere89 6 Oct 2023 13:39 UTC
0 points
1
I want to mention that a proposed impossible problem was pretty close to being solved by Anthropic, if not solved outright, and very critically neither Eliezer or anyone at MIRI noticed that a proposed AI alignment problem was possible to solve, when they claimed that it was basically impossible to solve.

Three tweets illustrates it pretty well:

https://twitter.com/jd_pressman/status/1709355851457479036

“It won’t understand language until it’s already superintelligent.” stands out to me in that it was considered an impossible problem that ordinary capabilities research just solved outright, with no acknowledgement something ‘impossible’ had occurred.

https://twitter.com/jd_pressman/status/1709358430128152658

You can quibble over the word ‘impossible’, but it was generally accepted that the first big insurmountable barrier is that there is simply no good way to encode concepts like ‘happiness’ in their full semantic richness without ASI already built at which point it doesn’t care.

https://twitter.com/jd_pressman/status/1709362209024033210

And in case one is tempted to say “well, you still can’t meaningfully align AI systems by defining things we want in terms of high level philosophical paraphrases” I remind you that constitutional AI exists, which does just that:

https://www.anthropic.com/index/claudes-constitution