we cannot align AI’s to human values until we know approximately what human values actually are, and we cannot know that until we solve the human alignment problem
Fairly sure this isn’t true, and isn’t how things are going to go. Instead we’re going to explain what it means for a thing to have values, then make a system that investigates us and figures out what our values are, as it pursues them to the extent that it understands them, by whatever means seem best to it at the time (discussed a bit by Russel, value learning, and ‘inverse reinforcement learning’). How load-bearing is this assumption to your strategy?
And I’d anticipate that human peace is way harder than AGI-supported peace for various reasons (Cognitive opacity, and intelligence having superlinear returns (ie, human myopia having superlinear costs)), they probably aren’t continuous. I work on this sort of thing, systems for human peace. I don’t exaggerate its importance. AGI will mostly not need our systems.
I wasn’t planning to use any particularly serious AI in building it, beyond the coordination market framework and maybe a language model where people can argue with a fictional but realistic sparring partner.
It’s going to turn out that having a llm ask “but are you sure that’s correct?” 7 times increases the reflective consistency of peoples’ votes, and I am going to lmao.
You say this like it’s a devastating putdown, or invalidates my idea. I think you’re right that this is what would happen, but I think hooking that up to a state-of-the-art language model could cure schizophrenic delusions, mass delusions like 9/11 trutherism, and all sorts of amazing things like that. Maybe not right away (schizophrenics are not generally known for engaging with reality diligently and aggressively). But over time, this is a capability I predict would emerge from the system I am trying to build. Why should anyone believe that? I don’t know.
Fairly sure this isn’t true, and isn’t how things are going to go. Instead we’re going to explain what it means for a thing to have values, then make a system that investigates us and figures out what our values are, as it pursues them to the extent that it understands them, by whatever means seem best to it at the time (discussed a bit by Russel, value learning, and ‘inverse reinforcement learning’). How load-bearing is this assumption to your strategy?
And I’d anticipate that human peace is way harder than AGI-supported peace for various reasons (Cognitive opacity, and intelligence having superlinear returns (ie, human myopia having superlinear costs)), they probably aren’t continuous. I work on this sort of thing, systems for human peace. I don’t exaggerate its importance. AGI will mostly not need our systems.
I wasn’t planning to use any particularly serious AI in building it, beyond the coordination market framework and maybe a language model where people can argue with a fictional but realistic sparring partner.
It’s going to turn out that having a llm ask “but are you sure that’s correct?” 7 times increases the reflective consistency of peoples’ votes, and I am going to lmao.
You say this like it’s a devastating putdown, or invalidates my idea. I think you’re right that this is what would happen, but I think hooking that up to a state-of-the-art language model could cure schizophrenic delusions, mass delusions like 9/11 trutherism, and all sorts of amazing things like that. Maybe not right away (schizophrenics are not generally known for engaging with reality diligently and aggressively). But over time, this is a capability I predict would emerge from the system I am trying to build. Why should anyone believe that? I don’t know.
No, not laughing at you, laughing at the absurdity of the human critter, the fact that rubberducking works and so on.