Thanks for sharing! I was wondering what happened with that project & found this helpful (and would have found it even more helpful if I didn’t already know and talk with some of you).
I’d love to see y’all write more, if you feel like it. E.g. here’s a prompt:
You said:
Solving the full problem despite reflection / metacognition seems pretty out of reach for now. In the worst case, if an agent reflects, it can be taken over by subagents, refactor all of its concepts into a more efficient language, invent a new branch of moral philosophy that changes its priorities, or a dozen other things. There’s just way too much to worry about, and the ability to do these things is—at least in humans—possibly tightly connected to why we’re good at science.
Can you elaborate? I’d love to see a complete list of all the problems you know of (the dozen things!). I’d also love to see it annotated with commentary about the extent to which you expect these problems to arise in practice vs. probably only if we get unlucky. Another nice thing to include would be a definition of reflection/metacognition. Finally it would be great to say some words about why ability to do reflection/metacognition might be tightly connected to ability to do science.
I’m less concerned about the fact that there might be a dozen different problems (and therefore don’t have an explicit list), and more concerned about the fact that we don’t understand the mathematical structure behind metacognition (if there even is something to find), and therefore can’t yet characterize it or engineer it to be safe. We were trying to make a big list early on, but gradually shifted to resolving confusions and trying to make concrete models of the systems and how these problems arise.
Off the top of my head, by metacognition I mean something like: reasoning that chains through models of yourself, or applying your planning process to your own planning process.
On why reflection/metacognition might be connected to general science ability, I don’t like to speculate on capabilities, but just imagine that the scientists in the Apollo program were unable to examine and refine their research processes—I think they would likely fail.
I agree that just because we’ve thought hard and made a big list, doesn’t mean the list is exhaustive. Indeed the longer the list we CAN find, the higher the probability that there are additional things we haven’t found yet...
But I still think having a list would be pretty helpful. If we are trying to grok the shape of the problem, it helps to have many diverse examples.
Re: metacognition: OK, that’s a pretty broad definition I guess. Makes the “why is this important for doing science” question easy to answer. Arguably GPT4 already does metacognition to some extent, at least in ARC Evals and when in an AutoGPT harness, and probably not very skillfully.
ETA: so, to be clear, I’m not saying you were wrong to move from the draft list to making models; I’m saying if you have time & energy to write up the list, that would help me along in my own journey towards making models & generally understanding the problem better. And probably other readers besides me also.
Thanks for sharing! I was wondering what happened with that project & found this helpful (and would have found it even more helpful if I didn’t already know and talk with some of you).
I’d love to see y’all write more, if you feel like it. E.g. here’s a prompt:
You said:
Can you elaborate? I’d love to see a complete list of all the problems you know of (the dozen things!). I’d also love to see it annotated with commentary about the extent to which you expect these problems to arise in practice vs. probably only if we get unlucky. Another nice thing to include would be a definition of reflection/metacognition. Finally it would be great to say some words about why ability to do reflection/metacognition might be tightly connected to ability to do science.
I’m less concerned about the fact that there might be a dozen different problems (and therefore don’t have an explicit list), and more concerned about the fact that we don’t understand the mathematical structure behind metacognition (if there even is something to find), and therefore can’t yet characterize it or engineer it to be safe. We were trying to make a big list early on, but gradually shifted to resolving confusions and trying to make concrete models of the systems and how these problems arise.
Off the top of my head, by metacognition I mean something like: reasoning that chains through models of yourself, or applying your planning process to your own planning process.
On why reflection/metacognition might be connected to general science ability, I don’t like to speculate on capabilities, but just imagine that the scientists in the Apollo program were unable to examine and refine their research processes—I think they would likely fail.
I agree that just because we’ve thought hard and made a big list, doesn’t mean the list is exhaustive. Indeed the longer the list we CAN find, the higher the probability that there are additional things we haven’t found yet...
But I still think having a list would be pretty helpful. If we are trying to grok the shape of the problem, it helps to have many diverse examples.
Re: metacognition: OK, that’s a pretty broad definition I guess. Makes the “why is this important for doing science” question easy to answer. Arguably GPT4 already does metacognition to some extent, at least in ARC Evals and when in an AutoGPT harness, and probably not very skillfully.
ETA: so, to be clear, I’m not saying you were wrong to move from the draft list to making models; I’m saying if you have time & energy to write up the list, that would help me along in my own journey towards making models & generally understanding the problem better. And probably other readers besides me also.