Episodic learning algorithms will still penalize this behavior if it appears on the training distribution, so it seems reasonable to call this an inner alignment problem.
Ah, I agree (edited my comment above accordingly).
Episodic learning algorithms will still penalize this behavior if it appears on the training distribution, so it seems reasonable to call this an inner alignment problem.
Ah, I agree (edited my comment above accordingly).