I think we currently do not have good gears level models of lots of the important questions of AI/cognition/alignment, and I think the way to get there is by treating it as a software/physicalist/engineering problem, not presupposing an already higher level agentic/psychological/functionalist framing.
Here’s two ways that a high-level model can be wrong:
It isn’t detailed enough, but once you learn the detail it adds up to basically the same picture. E.g. Newtonian physics, ideal gas laws. When you get a more detailed model, you learn more about which edge-cases will break it. But the model basically still works, and is valuable for working out the more detailed model.
It’s built out of confused concepts. E.g. free will, consciousness (probably), many ways of thinking about personal identity, four humors model. We’re basically better off without this kind of model and should start from scratch.
It sounds like you’re saying high-level agency-as-outcome-directed is wrong in the second way? If so, I disagree, it looks much more like the first way. I don’t think I understand your beliefs well enough to argue about this, maybe there’s something I should read?
I have a discomfort that I want to try to gesture at:
Are you ultimately wanting to build a piece of software that solves a problem so difficult that it needs to modify itself? My impression from the post is that you are thinking about this level of capability in a distant way, and mostly focusing on much earlier and easier regimes. I think it’s probably very easy to work on legible low-level capabilities without making any progress on the regime that matters.
To me it looks important for researchers to have this ultimate goal constantly in their mind, because there are many pathways off-track. Does it look different to you?
Ultimately, this is a governance problem, not a technical problem. The choice to choose illegible capabilities is a political one.
I think this is a bad place to rely on governance, given the fuzziness of this boundary and the huge incentive toward capability over legibility. Am I right in thinking that you’re making a large-ish gamble here on the way the tech tree shakes out (such that it’s easy to see a legible-illegible boundary, and the legible approaches are competitive-ish) and also the way governance shakes out (such that governments decide that e.g. assigning detailed blame for failures is extremely important and worth delaying capabilities)?
I’m glad you’re doing ambitious things, and I’m generally a fan of trying to understand problems from scratch in the hope that they dissolve or become easier to solve.
Literally compute and man-power. I can’t afford the kind of cluster needed to even begin a pretraining research agenda, or to hire a new research team to work on this. I am less bottlenecked on the theoretical side atm, because I need to run into a lot of bottlenecks from actual grounded experiments first.
Why would this be a project that requires large scale experiments? Looks like something that a random PhD student with two GPUs could maybe make progress on. Might be a good problem to make a prize for even?
Here’s two ways that a high-level model can be wrong:
It isn’t detailed enough, but once you learn the detail it adds up to basically the same picture. E.g. Newtonian physics, ideal gas laws. When you get a more detailed model, you learn more about which edge-cases will break it. But the model basically still works, and is valuable for working out the more detailed model.
It’s built out of confused concepts. E.g. free will, consciousness (probably), many ways of thinking about personal identity, four humors model. We’re basically better off without this kind of model and should start from scratch.
It sounds like you’re saying high-level agency-as-outcome-directed is wrong in the second way? If so, I disagree, it looks much more like the first way. I don’t think I understand your beliefs well enough to argue about this, maybe there’s something I should read?
I have a discomfort that I want to try to gesture at:
Are you ultimately wanting to build a piece of software that solves a problem so difficult that it needs to modify itself? My impression from the post is that you are thinking about this level of capability in a distant way, and mostly focusing on much earlier and easier regimes. I think it’s probably very easy to work on legible low-level capabilities without making any progress on the regime that matters.
To me it looks important for researchers to have this ultimate goal constantly in their mind, because there are many pathways off-track. Does it look different to you?
I think this is a bad place to rely on governance, given the fuzziness of this boundary and the huge incentive toward capability over legibility. Am I right in thinking that you’re making a large-ish gamble here on the way the tech tree shakes out (such that it’s easy to see a legible-illegible boundary, and the legible approaches are competitive-ish) and also the way governance shakes out (such that governments decide that e.g. assigning detailed blame for failures is extremely important and worth delaying capabilities)?
I’m glad you’re doing ambitious things, and I’m generally a fan of trying to understand problems from scratch in the hope that they dissolve or become easier to solve.
Why would this be a project that requires large scale experiments? Looks like something that a random PhD student with two GPUs could maybe make progress on. Might be a good problem to make a prize for even?