I am disagreeing with the underlying assumption that it’s worthwhile to create simulacra of the sort that satisfy point 2. I expect an AI reasoning about its successor to not simulate it with perfect fidelity—instead, it’s much more practical to make approximations that make the reasoning process different from instantiating the successor.
I expect agentic simulacra to occur without intentionally simulating them, in that agents are just generally useful for solving prediction problems and that in conducting millions of predictions (as would be expected of a product on the order of ChatGPT, or future successors,) it’s probable for agentic simulacra to occur. Even if these agents are just approximations, in predicting the behaviors of approximated agents their preferences could still be satisfied in the real world (as described in the Hubinger post.)
The problem I’m interested in is how you ensure that all subsequent agentic simulacra (whether occurred intentionally or otherwise) are safe, which seems difficult to verify formally due to the Löbian Obstacle.
I am disagreeing with the underlying assumption that it’s worthwhile to create simulacra of the sort that satisfy point 2. I expect an AI reasoning about its successor to not simulate it with perfect fidelity—instead, it’s much more practical to make approximations that make the reasoning process different from instantiating the successor.
I expect agentic simulacra to occur without intentionally simulating them, in that agents are just generally useful for solving prediction problems and that in conducting millions of predictions (as would be expected of a product on the order of ChatGPT, or future successors,) it’s probable for agentic simulacra to occur. Even if these agents are just approximations, in predicting the behaviors of approximated agents their preferences could still be satisfied in the real world (as described in the Hubinger post.)
The problem I’m interested in is how you ensure that all subsequent agentic simulacra (whether occurred intentionally or otherwise) are safe, which seems difficult to verify formally due to the Löbian Obstacle.