Thane Ruthenis comments on A Case for the Least Forgiving Take On Alignment

Thane Ruthenis 9 May 2023 19:38 UTC
2 points
0
Invent new mathematical structures (e.g. new operations on known objects, or new abstract algebraic structures based on their axioms) and ask the LLM to reason about them and prove theorems (that weren’t too hard to prove yourself or for someone else to prove).
Yup, that’s also my current best guess for how this sort of test must look like.
- Choose some really obscure math discipline $D$ , one that we’re pretty sure lacks much practical applications (i. e., won’t be convergently learned from background data about the world).
- Curate the AI’s dataset to only include information up to some point in time $T_{1}$ .
- Guide the AI step-by-step (as in, via chain-of-thought prompting or its equivalent) through replicating all discoveries made in $D$ between $T_{1}$ and the present $T_{2}$ .
Pick the variables such that the inferential gap between $D (T_{1})$ and $D (T_{2})$ is large (can’t be cleared by a non-superintelligent logical leap), but the gaps between individual insights are tiny. This would ensure that our AI would only be able to reach $D (T_{2})$ if it’s able to re-use its insights (i. e., build novel abstractions, store them in the context window/short-term memory, fluidly re-use them when needed), while not putting onerous demands on how good each individual insight must be. See also.
I should probably write up a post about it, and maybe pitch this project to the LTFF or something.
What links here?
- Thane Ruthenis's comment on johnswentworth’s Shortform by johnswentworth (14 May 2023 1:47 UTC; 2 points)