Charlie Steiner comments on Framing approaches to alignment and the hard problem of AI cognition

Charlie Steiner 23 Dec 2021 19:06 UTC
4 points
Just to self-promote, the idea that alignment means precisely matching a specific utility function is one of the main things I wrote Reducing Goodhart to address. The point that this is incompatible with humans being physical systems should be reached in the first two posts.
Though I’ve gotten some good feedback since writing it and am going to do some rewriting (including explaining that “reducing” has multiple meanings), so if you find it hard to muddle through, check back in a month.