Ok, so now we’ve got a Clippy who a). is not too averse to tinkering with its own goals, as long as the goals remain functionally the same, b). simulates a relatively long-running version of itself, and c). is capable of examining the inner workings of both that version and itself.
You say,
I don’t think tampering with its fundamental motivation to make paperclips is a particularly promising strategy for optimizing its paperclips production.
But remember, at this stage Clippy is not changing its own fundamental motivation (beyound some outcome-invariant optimizations); it’s merely observing sim-Clippies in a controlled environment.
Do you think that Clippy would ever simulate versions of itself whose fundamental motivations were, in fact, changed ? I could see several scenarios where this might be the case, for example:
Clippy wanted to optimize some goal, but ended up accidentally changing it. Oops !
Clippy created a version with drastically reduced goals on purpose, in order to measure how much performance is affected by certain goals, thus targeting them for possible future optimization. Of course, Clippy would only want to optimize the goals, not remove them.
But remember, at this stage Clippy is not changing its own fundamental motivation (beyound some outcome-invariant optimizations); it’s merely observing sim-Clippies in a controlled environment.
Why does it do that? I said it sounded plausible that it would cut out its redundant goal, because that would save computing resources. But this sounds like we’ve gone back to experimenting blindly. Why would it think observing sim-clippies is a good use of its computing resources in order to maximize paperclips?
I’d say that Clippy simulating versions of itself whose fundamental motivations are different is much less plausible, because it’s using a lot of computing resources for something that isn’t a likely route to optimizing its paperclip production. I think this falls into the “protein collider” category. Even if it did do so, I think it would be unlikely to go from there to changing its own terminal value.
Ok, so now we’ve got a Clippy who a). is not too averse to tinkering with its own goals, as long as the goals remain functionally the same, b). simulates a relatively long-running version of itself, and c). is capable of examining the inner workings of both that version and itself.
You say,
But remember, at this stage Clippy is not changing its own fundamental motivation (beyound some outcome-invariant optimizations); it’s merely observing sim-Clippies in a controlled environment.
Do you think that Clippy would ever simulate versions of itself whose fundamental motivations were, in fact, changed ? I could see several scenarios where this might be the case, for example:
Clippy wanted to optimize some goal, but ended up accidentally changing it. Oops !
Clippy created a version with drastically reduced goals on purpose, in order to measure how much performance is affected by certain goals, thus targeting them for possible future optimization. Of course, Clippy would only want to optimize the goals, not remove them.
Why does it do that? I said it sounded plausible that it would cut out its redundant goal, because that would save computing resources. But this sounds like we’ve gone back to experimenting blindly. Why would it think observing sim-clippies is a good use of its computing resources in order to maximize paperclips?
I’d say that Clippy simulating versions of itself whose fundamental motivations are different is much less plausible, because it’s using a lot of computing resources for something that isn’t a likely route to optimizing its paperclip production. I think this falls into the “protein collider” category. Even if it did do so, I think it would be unlikely to go from there to changing its own terminal value.