To be clear: The diamond maximizer problem is about getting specific intended content into the AI’s goals (“diamonds” as opposed to some random physical structure it’s maximizing), not just about building a stable maximizer.
If you relax the “specific intended content” constraint, and allow for maximizing any random physical structure, as long as it’s always the same physical structure in the real world and not just some internal metric that has historically correlated with the amount of that structure that existed in the real world, does that make the problem any easier / is there a known solution? My vague impression was that the answer was still “no, that’s also not a thing we know how to do”.
To be clear: The diamond maximizer problem is about getting specific intended content into the AI’s goals (“diamonds” as opposed to some random physical structure it’s maximizing), not just about building a stable maximizer.
Thanks for the clarification!
If you relax the “specific intended content” constraint, and allow for maximizing any random physical structure, as long as it’s always the same physical structure in the real world and not just some internal metric that has historically correlated with the amount of that structure that existed in the real world, does that make the problem any easier / is there a known solution? My vague impression was that the answer was still “no, that’s also not a thing we know how to do”.
I expect it makes it easier, but I don’t think it’s solved.