Ulisse Mini comments on ELK prize results

Ulisse Mini 11 Dec 2022 18:43 UTC
LW: 1 AF: 1
AF
Random thought: Perhaps you could carefully engineer gradient starvation in order to “avoid generalizing” and defeat the Discrete modes of prediction example. You’d only need to delay it until reflection, then the AI can solve the successor AI problem.

In general: hack our way towards getting value-preserving reflectivity before values drift from “Diamonds” → “What’s labeled as a diamond by humans”. (Replacing with “Telling the truth”, and “What the human thinks is true” respectively).