For example, if you wanted to generally predict model behavior right now, you’d probably just want to get really good at understanding webtext, practice the next token prediction game, etc.
Another candidate eval is to demand predictability given activation edits (eg zero-ablating certain heads, patching in activations from other prompts, performing activation additions, and so on). Webtext statistics won’t be sufficient there.
Another candidate eval is to demand predictability given activation edits (eg zero-ablating certain heads, patching in activations from other prompts, performing activation additions, and so on). Webtext statistics won’t be sufficient there.