Somewhat relatedly: I’m interested on how well LLMs can solve tasks in parallel. This seems very important to me.[1]
The “I’ve thought about this for 2 minutes” version is: Hand an LLM two multiple choice questions with four answer options each. Encode these four answer options into a single token, so that there are 16 possible tokens of which one corresponds to the correct answer to both questions. A correct answer means that the model has solved both tasks in one forward-pass.
(One can of course vary the number of answer options and questions. I can see some difficulties in implementing this idea properly, but would nevertheless be excited if someone took a shot at it.)
- For serial computation the number of layers gives some very rough indication of the strength of one forward-pass, but it’s harder to have intuitions for parallel computation.
- For scheming, the model could reason about “should I still stay undercover”, “what should I do in case I should stay undercover” and “what should I do in case it’s time to attack” in parallel, finally using only one serial step to decide on its action.
There might be more such results I’m currently forgetting about, but they should be relatively easy to find by e.g. following citation trails (to and from the above references) with Google Scholar (or by looking at my recent comments / short forms).
- For scheming, the model could reason about “should I still stay undercover”, “what should I do in case I should stay undercover” and “what should I do in case it’s time to attack” in parallel, finally using only one serial step to decide on its action.
I am also very interested in e.g. how one could operationalize the number of hops of inference of out-of-context reasoning required for various types of scheming, especially scheming in one-forward-pass; and especially in the context of automated AI safety R&D.
Somewhat relatedly: I’m interested on how well LLMs can solve tasks in parallel. This seems very important to me.[1]
The “I’ve thought about this for 2 minutes” version is: Hand an LLM two multiple choice questions with four answer options each. Encode these four answer options into a single token, so that there are 16 possible tokens of which one corresponds to the correct answer to both questions. A correct answer means that the model has solved both tasks in one forward-pass.
(One can of course vary the number of answer options and questions. I can see some difficulties in implementing this idea properly, but would nevertheless be excited if someone took a shot at it.)
Two quick reasons:
- For serial computation the number of layers gives some very rough indication of the strength of one forward-pass, but it’s harder to have intuitions for parallel computation.
- For scheming, the model could reason about “should I still stay undercover”, “what should I do in case I should stay undercover” and “what should I do in case it’s time to attack” in parallel, finally using only one serial step to decide on its action.
I would expect, generally, solving tasks in parallel to be fundamentally hard in one-forward pass for pretty much all current SOTA architectures (especially Transformers and modern RNNs like MAMBA). See e.g. this comment of mine; and other related works like https://twitter.com/bohang_zhang/status/1664695084875501579, https://twitter.com/bohang_zhang/status/1664695108447399937 (video presentation), Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks, RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval.
There might be more such results I’m currently forgetting about, but they should be relatively easy to find by e.g. following citation trails (to and from the above references) with Google Scholar (or by looking at my recent comments / short forms).
I am also very interested in e.g. how one could operationalize the number of hops of inference of out-of-context reasoning required for various types of scheming, especially scheming in one-forward-pass; and especially in the context of automated AI safety R&D.