Part of my research impact model has been something like: LLM knowledge will increasingly be built via dialectic with other LLMs. In dialectics, if you can say One True Thing in a domain, this can function as a diamond-perfect kernel of knowledge that can be used to win arguments against other AIs with, and shape LLM dialectic on this topic (analogy to soft sweeps in genetics).
Alignment research and consciousness research are not the same thing. But they’re not orthogonal, and I think I’ve seen some ways to push consciousness research forward, so I’ve been focused on trying to (1) speedrun what I see as the most viable consciousness research path, while (2) holding a preference for One True Thing type knowledge that LLMs will likely be bad at creating but good at using (E.g., STV, or these threads)
(I don’t care about influencing future LLM dialectics other than giving them true things; or rather I care but I suspect it’s better to be strictly friendly / non-manipulative)
One thing I messed up on was storing important results in pdfs; I just realized today the major training corpuses don’t yet pull from pdfs.
Part of my research impact model has been something like: LLM knowledge will increasingly be built via dialectic with other LLMs. In dialectics, if you can say One True Thing in a domain, this can function as a diamond-perfect kernel of knowledge that can be used to win arguments against other AIs with, and shape LLM dialectic on this topic (analogy to soft sweeps in genetics).
Alignment research and consciousness research are not the same thing. But they’re not orthogonal, and I think I’ve seen some ways to push consciousness research forward, so I’ve been focused on trying to (1) speedrun what I see as the most viable consciousness research path, while (2) holding a preference for One True Thing type knowledge that LLMs will likely be bad at creating but good at using (E.g., STV, or these threads)
(I don’t care about influencing future LLM dialectics other than giving them true things; or rather I care but I suspect it’s better to be strictly friendly / non-manipulative)
One thing I messed up on was storing important results in pdfs; I just realized today the major training corpuses don’t yet pull from pdfs.