If your agent isn’t aware that its compute cycles are limited (i.e. the compute constraint is part of the math problem), then you have three cases: (1a) the agent doesn’t hit the limit with its standard search, you’re in luck; (1b) the problem is difficult enough that the agent runs its standards search but fails to find a solution in the allocated cycles, so it always fails, but safely. (1c) you tweak the agent to be more compute-efficient, which is very costly and might not work, in practice if you’re in case 1b and it apparently fails safely you have an incentive to just increase the limit.
If your agent is indeed aware of the constraint, then it has an incentive to remove it, or increase the limit by other means. Three cases here again: (2a) identical to 1a, you’re in luck; (2b) the limit is low enough that strategic action to remove the constraint is impossible, the agent fails “safely”; (3b) the agent finds a way to remove the constraint, and you’re in very unsafe territory.
Two observations from there: first, ideally you’d want your agent to operate safely even if given unbounded cycles, that’s the Omni Test. Second, there’s indeed an alignment concept for agents that just try to solve the problem without long-term planning, that’s Myopia (and defining it formally is… hard).
So, assuming an unaligned agent here.
If your agent isn’t aware that its compute cycles are limited (i.e. the compute constraint is part of the math problem), then you have three cases: (1a) the agent doesn’t hit the limit with its standard search, you’re in luck; (1b) the problem is difficult enough that the agent runs its standards search but fails to find a solution in the allocated cycles, so it always fails, but safely. (1c) you tweak the agent to be more compute-efficient, which is very costly and might not work, in practice if you’re in case 1b and it apparently fails safely you have an incentive to just increase the limit.
If your agent is indeed aware of the constraint, then it has an incentive to remove it, or increase the limit by other means. Three cases here again: (2a) identical to 1a, you’re in luck; (2b) the limit is low enough that strategic action to remove the constraint is impossible, the agent fails “safely”; (3b) the agent finds a way to remove the constraint, and you’re in very unsafe territory.
Two observations from there: first, ideally you’d want your agent to operate safely even if given unbounded cycles, that’s the Omni Test. Second, there’s indeed an alignment concept for agents that just try to solve the problem without long-term planning, that’s Myopia (and defining it formally is… hard).