You use “utility function” in a weird way when you say it has “a utility function U” at the start. Standard usage has it that, by definition, the utility function is the thing that the AI wants to maximize, so the utility function in this case is only “U-R”. To reduce confusion, you could perhaps relabel your initial “utility function” as “objective function” or “goal function” (since you already use “O”). Along similar lines:
AI consists functionally of the utility function U, the penalty function R and a probability estimating module P, and whatever practical components (including knowledge bases) it needs to run its calculations.
Isn’t exactly true. A better description might be that we’re assuming it consists of a utility function U = G—R(P), where G is the goal function, R the penalty function, and P is the output of some probability-estimation system which is specified within the utility function. This is basically what you describe in the Relative Scaling section, but prior to that section it’s not clear that the AI wouldn’t want to modify the probability module.
In fact, if you don’t put P inside the utility function, presumably the AI modifies its probability module by indirect means to say P(a) = 0 for all a and proceeds to optimize the universe. It’s actually not clear what is the purpose of the “can’t directly modify components” assumption, since we seem to be assuming a potentially godlike AI that should have no difficulty modifying its components indirectly if it wishes.
I realize this is all pretty nitpicky, but I think this suggestion would make the post clearer and the assumptions of what it is exactly that the AI can’t touch more explicit. Note also that since I shove the P explicitly inside the utility function, it doesn’t really matter if the AI can figure out how to unravel chaos, since the probability estimate it uses for comparing worlds comes from P, which is what we gave it, which can’t figure out how to unravel chaos. (Well, I’m pretty sure.)
Other than that, I think this should work, and it’s actually the sort of thing I had in mind when talking to you about satisficers/maximisers with a cutoff (with the changes cost being the other necessary component to avoid having it do too much).
You use “utility function” in a weird way when you say it has “a utility function U” at the start. Standard usage has it that, by definition, the utility function is the thing that the AI wants to maximize, so the utility function in this case is only “U-R”. To reduce confusion, you could perhaps relabel your initial “utility function” as “objective function” or “goal function” (since you already use “O”). Along similar lines:
Isn’t exactly true. A better description might be that we’re assuming it consists of a utility function U = G—R(P), where G is the goal function, R the penalty function, and P is the output of some probability-estimation system which is specified within the utility function. This is basically what you describe in the Relative Scaling section, but prior to that section it’s not clear that the AI wouldn’t want to modify the probability module.
In fact, if you don’t put P inside the utility function, presumably the AI modifies its probability module by indirect means to say P(a) = 0 for all a and proceeds to optimize the universe. It’s actually not clear what is the purpose of the “can’t directly modify components” assumption, since we seem to be assuming a potentially godlike AI that should have no difficulty modifying its components indirectly if it wishes.
I realize this is all pretty nitpicky, but I think this suggestion would make the post clearer and the assumptions of what it is exactly that the AI can’t touch more explicit. Note also that since I shove the P explicitly inside the utility function, it doesn’t really matter if the AI can figure out how to unravel chaos, since the probability estimate it uses for comparing worlds comes from P, which is what we gave it, which can’t figure out how to unravel chaos. (Well, I’m pretty sure.)
Other than that, I think this should work, and it’s actually the sort of thing I had in mind when talking to you about satisficers/maximisers with a cutoff (with the changes cost being the other necessary component to avoid having it do too much).