The argument depends on awareness that the canvas is at least a timeline (but potentially also various counterfactuals and frames), not a future state of the physical world in the vicinity of the agent at some point of time. Otherwise elegance asks planning to pave over the world to make it easier to reason about. In contrast, a timeline will have permanent scars from the paving-over that might be harder to reason through sufficiently beforehand than keeping closer to the status quo, or even developing affordances to maintain it.
Interestingly, this seems to predict that preference for “low impact” is more likely for LLM-ish things trained on human text (than for de novo RL-ish things or decision theory inspired agents), but for reasons that have nothing to do with becoming motivated to pursue human values. Instead, the relevant imitation is for ontology of caring about timelines, counterfactuals, and frames.
My point is that elegance of natural impact regularization takes different shapes for different minds, and paving over everything is only elegant for minds that care about the state of the physical world at some point in time, rather than the arc of history.
I think even if you care about the arc of history, paving over everything would still be selected for. Yes, there’s the scar problem you mention, but it’s not clear that it’s strong enough to prevent it.
The argument depends on awareness that the canvas is at least a timeline (but potentially also various counterfactuals and frames), not a future state of the physical world in the vicinity of the agent at some point of time. Otherwise elegance asks planning to pave over the world to make it easier to reason about. In contrast, a timeline will have permanent scars from the paving-over that might be harder to reason through sufficiently beforehand than keeping closer to the status quo, or even developing affordances to maintain it.
Interestingly, this seems to predict that preference for “low impact” is more likely for LLM-ish things trained on human text (than for de novo RL-ish things or decision theory inspired agents), but for reasons that have nothing to do with becoming motivated to pursue human values. Instead, the relevant imitation is for ontology of caring about timelines, counterfactuals, and frames.
I think to some extent, “paving over everything” is also an illustration of how natural impact regularization != safety.
My point is that elegance of natural impact regularization takes different shapes for different minds, and paving over everything is only elegant for minds that care about the state of the physical world at some point in time, rather than the arc of history.
I think even if you care about the arc of history, paving over everything would still be selected for. Yes, there’s the scar problem you mention, but it’s not clear that it’s strong enough to prevent it.