MichaelStJules comments on A Case for the Least Forgiving Take On Alignment

MichaelStJules 9 May 2023 8:47 UTC
3 points
2
(Your reply is in response to a comment I deleted, because I thought it was basically a duplicate of this one, but I’d be happy if you’d leave your reply up, so we can continue the conversation.)
See here, starting from “consider a scheme like the following”. In short: should be possible, but seems non-trivially difficult.
That seems like a high bar to me for testing for any fluid intelligence, though, and the vast majority of humans would do about as bad or worse (but possibly because of far worse crystallized intelligence). Similarly, in your post, “No scientific breakthroughs, no economy-upturning startup pitches, certainly no mind-hacking memes.”
I would say to look at it based on definitions and existing tests of fluid intelligence. These are about finding patterns and relationships between unfamiliar objects and any possible rules relating to them, applying those rules and/or inference rules with those identified patterns and relationships, and doing so more or less efficiently. More fluid intelligence means noticing patterns earlier, taking more useful steps and fewer useless steps.
Some ideas for questions:
1. Invent new games or puzzles, and ask it to achieve certain things from a given state.
2. Invent new mathematical structures (e.g. new operations on known objects, or new abstract algebraic structures based on their axioms) and ask the LLM to reason about them and prove theorems (that weren’t too hard to prove yourself or for someone else to prove).
3. Ask it to do hardness proofs (like NP-hardness proofs), either between two new problems, or just with one problem (e.g. ChatGPT proved a novel problem was NP-hard here).
4. Maybe other new discrete math problems.
5. EDIT: New IMO and Putnam problems.
My impression is that there are few cross-applicable techniques in these areas, and the ones that exist often don’t get you very far to solving problems. To do NP-hardness proofs, you need to identify patterns and relationships between two problems. The idea of using “gadgets” is way too general and hides all of the hard work, which is finding the right gadget to use and how to use it. EDIT: For IMO and Putnam problems, there are some common tools, too, but if just simple pattern matching for those was all it took, math undergrads would generally be good at them, and they’re not, so it probably does take considerable fluid intelligence.
I guess one possibility is that an LLM can try a huge number of steps and combinations of steps before generating the next token, possibly looking ahead multiple steps internally before picking one. Maybe it could solve hard problems this way without fluid intelligence.
- Thane Ruthenis 9 May 2023 19:38 UTC
  2 points
  0
  Parent
  Invent new mathematical structures (e.g. new operations on known objects, or new abstract algebraic structures based on their axioms) and ask the LLM to reason about them and prove theorems (that weren’t too hard to prove yourself or for someone else to prove).
  Yup, that’s also my current best guess for how this sort of test must look like.
  - Choose some really obscure math discipline $D$ , one that we’re pretty sure lacks much practical applications (i. e., won’t be convergently learned from background data about the world).
  - Curate the AI’s dataset to only include information up to some point in time $T_{1}$ .
  - Guide the AI step-by-step (as in, via chain-of-thought prompting or its equivalent) through replicating all discoveries made in $D$ between $T_{1}$ and the present $T_{2}$ .
  Pick the variables such that the inferential gap between $D (T_{1})$ and $D (T_{2})$ is large (can’t be cleared by a non-superintelligent logical leap), but the gaps between individual insights are tiny. This would ensure that our AI would only be able to reach $D (T_{2})$ if it’s able to re-use its insights (i. e., build novel abstractions, store them in the context window/short-term memory, fluidly re-use them when needed), while not putting onerous demands on how good each individual insight must be. See also.
  I should probably write up a post about it, and maybe pitch this project to the LTFF or something.
  What links here?
  - Thane Ruthenis's comment on johnswentworth’s Shortform by johnswentworth (14 May 2023 1:47 UTC; 2 points)