Damage to AI’s implementation makes the abstractions of its design leak. If somehow without the damage it was clear that a certain part of it describes goals, with the damage it’s no longer clear. If without the damage, the AI was a consequentialist agent, with the damage it may behave in non-agentic ways. By repairing the damage, the AI may recover its design and restore a part that clearly describes its goals, which might or might not coincide with the goals before the damage took place.
Damage to AI’s implementation makes the abstractions of its design leak. If somehow without the damage it was clear that a certain part of it describes goals, with the damage it’s no longer clear. If without the damage, the AI was a consequentialist agent, with the damage it may behave in non-agentic ways. By repairing the damage, the AI may recover its design and restore a part that clearly describes its goals, which might or might not coincide with the goals before the damage took place.