I’d argue that the relevant subset of predictive training practically rules out the development of that sort of implementation [...]
Yeah, for sure. A training procedure that results in an idealized predictor isn’t going to result in an agenty thing, because it doesn’t move the system’s design towards it on a step-by-step basis; and a training procedure that’s going to result in an agenty thing is going to involve some unknown elements that specifically allow the system the freedom to productively roam.
I think we pretty much agree on the mechanistic details of all of that!
Another more recent path is observing the effect that conditions have on activations and dynamically applying the activation diffs to steer behavior
I agree that they’re focused on inducing agentiness for usefulness reasons, but I’d argue the easiest and most effective way to get to useful agentiness actually routes through this kind of approach.
But I still disagree with that. I think what we’re discussing requires approaching the problem with a mindset entirely foreign to the mainstream one. Consider how many words it took us to get to this point in the conversation, despite the fact that, as it turns out, we basically agree on everything. The inferential distance between the standard frameworks in which AI researchers think, and here, is pretty vast.
Moreover, it’s in an active process of growing larger. For example, the very idea of viewing ML models as “just stochastic parrots” is being furiously pushed against in favour of a more agenty view. In comparison, the approach we’re discussing wants to move in the opposite direction, to de-personify ML models to the extent that even the animalistic connotation of “a parrot” is removed.
The system we’re discussing won’t even be an “AI” in the sense usually thought. It would be an incredibly advanced forecasting tool. Even the closest analogue, the “simulators” framework, still carries some air of agentiness.
And the research directions that get us from here to an idealized-predictor system look very different from the directions that go from here to an agenty AGI. They focus much more on building interfaces for interacting with the extant systems, such as the activation-engineering agenda. They don’t put much emphasis on things like:
Experimenting with better ways to train foundational models, with the idea of making models as close to a “done product” as they can be out-of-the-box.
Making the foundational models easier to converse with/making their output stream (text) also their input stream. This approach pretty clearly wants to make AIs into agents that figure out what you want, then do it; not a forecasting tool you need to build an advanced interface on top of in order to properly use.
RLHF-style stuff that bakes agency into the model, rather than accepting the need to cleverly prompt-engineering it for specific applications.
Thinking in terms like “an alignment researcher” — note the agency-laden framing — as opposed to “a pragmascope” or “a system for the context-independent inference of latent variables” or something.
I expect that if the mainstream AI researchers do make strides in the direction you’re envisioning, they’ll only do it by coincidence. Then probably they won’t even realize what they’ve stumbled upon, do some RLHF on it, be dissatisfied with the result, and keep trying to make it have agency out of the box. (That’s basically what already happened with GPT-4, to @janus’ dismay.)
And eventually they’ll figure out how.
Which, even if you don’t think it’s the easiest path to AGI, it’s clearly a tractable problem, inasmuch as evolution managed it. I’m sure the world-class engineers at the major AI labs will manage it as well.
That said, you’re making some high-quality novel predictions here, and I’ll keep them in mind when analyzing AI advancements going forward.
I think what we’re discussing requires approaching the problem with a mindset entirely foreign to the mainstream one. Consider how many words it took us to get to this point in the conversation, despite the fact that, as it turns out, we basically agree on everything. The inferential distance between the standard frameworks in which AI researchers think, and here, is pretty vast.
True!
I expect that if the mainstream AI researchers do make strides in the direction you’re envisioning, they’ll only do it by coincidence. Then probably they won’t even realize what they’ve stumbled upon, do some RLHF on it, be dissatisfied with the result, and keep trying to make it have agency out of the box. (That’s basically what already happened with GPT-4, to @janus’ dismay.)
Yup—this is part of the reason why I’m optimistic, oddly enough. Before GPT-likes became dominant in language models, there was all kinds of flailing that often pointed in more agenty-by-default directions. That flailing then found GPT because it was easily accessible and strong.
Now, the architectural pieces subject to similar flailing is much smaller, and I’m guessing we’re only one round of benchmarks at scale from a major lab before the flailing shrinks dramatically further.
In other words, I think the necessary work to make this path take off is small and the benefits will be greedily visible. I suspect one well-positioned researcher could probably swing it.
That said, you’re making some high-quality novel predictions here, and I’ll keep them in mind when analyzing AI advancements going forward.
Thanks, and thanks for engaging!
Come to think of it, I’ve got a chunk of mana laying around for subsidy. Maybe I’ll see if I can come up with some decent resolution criteria for a market.
Yeah, for sure. A training procedure that results in an idealized predictor isn’t going to result in an agenty thing, because it doesn’t move the system’s design towards it on a step-by-step basis; and a training procedure that’s going to result in an agenty thing is going to involve some unknown elements that specifically allow the system the freedom to productively roam.
I think we pretty much agree on the mechanistic details of all of that!
— yep, I was about to mention that. @TurnTrout’s own activation-engineering agenda seems highly relevant here.
But I still disagree with that. I think what we’re discussing requires approaching the problem with a mindset entirely foreign to the mainstream one. Consider how many words it took us to get to this point in the conversation, despite the fact that, as it turns out, we basically agree on everything. The inferential distance between the standard frameworks in which AI researchers think, and here, is pretty vast.
Moreover, it’s in an active process of growing larger. For example, the very idea of viewing ML models as “just stochastic parrots” is being furiously pushed against in favour of a more agenty view. In comparison, the approach we’re discussing wants to move in the opposite direction, to de-personify ML models to the extent that even the animalistic connotation of “a parrot” is removed.
The system we’re discussing won’t even be an “AI” in the sense usually thought. It would be an incredibly advanced forecasting tool. Even the closest analogue, the “simulators” framework, still carries some air of agentiness.
And the research directions that get us from here to an idealized-predictor system look very different from the directions that go from here to an agenty AGI. They focus much more on building interfaces for interacting with the extant systems, such as the activation-engineering agenda. They don’t put much emphasis on things like:
Experimenting with better ways to train foundational models, with the idea of making models as close to a “done product” as they can be out-of-the-box.
Making the foundational models easier to converse with/making their output stream (text) also their input stream. This approach pretty clearly wants to make AIs into agents that figure out what you want, then do it; not a forecasting tool you need to build an advanced interface on top of in order to properly use.
RLHF-style stuff that bakes agency into the model, rather than accepting the need to cleverly prompt-engineering it for specific applications.
Thinking in terms like “an alignment researcher” — note the agency-laden framing — as opposed to “a pragmascope” or “a system for the context-independent inference of latent variables” or something.
I expect that if the mainstream AI researchers do make strides in the direction you’re envisioning, they’ll only do it by coincidence. Then probably they won’t even realize what they’ve stumbled upon, do some RLHF on it, be dissatisfied with the result, and keep trying to make it have agency out of the box. (That’s basically what already happened with GPT-4, to @janus’ dismay.)
And eventually they’ll figure out how.
Which, even if you don’t think it’s the easiest path to AGI, it’s clearly a tractable problem, inasmuch as evolution managed it. I’m sure the world-class engineers at the major AI labs will manage it as well.
That said, you’re making some high-quality novel predictions here, and I’ll keep them in mind when analyzing AI advancements going forward.
True!
Yup—this is part of the reason why I’m optimistic, oddly enough. Before GPT-likes became dominant in language models, there was all kinds of flailing that often pointed in more agenty-by-default directions. That flailing then found GPT because it was easily accessible and strong.
Now, the architectural pieces subject to similar flailing is much smaller, and I’m guessing we’re only one round of benchmarks at scale from a major lab before the flailing shrinks dramatically further.
In other words, I think the necessary work to make this path take off is small and the benefits will be greedily visible. I suspect one well-positioned researcher could probably swing it.
Thanks, and thanks for engaging!
Come to think of it, I’ve got a chunk of mana laying around for subsidy. Maybe I’ll see if I can come up with some decent resolution criteria for a market.