Ben Amitay comments on Internal Interfaces Are a High-Priority Interpretability Target

Ben Amitay 30 Dec 2022 20:50 UTC
LW: 2 AF: 1
0
AF
Last point: if we change the name of “world model” to “long-term memory”, we may notice the possibility that much of what you think about as shard-work may be programs stored in memory, and executed by a general-program-executor or a bunch of modules that specializes in specific sorts of programs, functioning as modern CPUs/interpreters (hopefully, stored in organised way that preserve modularity). What will be in the general memory and what is in the weights themselves is non-obvious, and we may want to intervene in this point too (not sure I’m which direction).