DragonGod

Karma: 2,487

Theoretical Computer Science Msc student at the University of [Redacted] in the United Kingdom.

I’m an aspiring alignment theorist; my research vibes are descriptive formal theories of intelligent systems (and their safety properties) with a bias towards constructive theories.

I think it’s important that our theories of intelligent systems remain rooted in the characteristics of real world intelligent systems; we cannot develop adequate theory from the null string as input.

DragonGod May 11, 2023, 12:54 AM
2 points
in reply to: niplav’s comment on: DragonGod’s Shortform
If you define your utility function over histories, then every behaviour is maximising an expected utility function no?

Even behaviour that is money pumped?

I mean you can’t money pump any preference over histories anyway without time travel.

The Dutchbook arguments apply when your utility function is defined over your current state with respect to some resource?

I feel like once you define utility function over histories, you lose the force of the coherence arguments?

What would it look like to not behave as if maximising an expected utility function for a utility function defined over histories.

DragonGod May 9, 2023, 6:23 PM
2 points
in reply to: niplav’s comment on: DragonGod’s Shortform
My contention is that I don’t think the preconditions hold.

Agents don’t fail to be VNM coherent by having incoherent preferences given the axioms of VNM. They fail to be VNM coherent by violating the axioms themselves.

Completeness is wrong for humans, and with incomplete preferences you can be non exploitable even without admitting a single fixed utility function over world states.

DragonGod May 8, 2023, 5:15 PM
2 points
in reply to: Dagon’s comment on: DragonGod’s Shortform
Yeah, I think the preconditions of VNM straightforwardly just don’t apply to generally intelligent systems.

DragonGod May 8, 2023, 12:49 AM
2 points
−4
on: Orthogonal’s Formal-Goal Alignment theory of change
Not at all convinced that “strong agents pursuing a coherent goal is a viable form for generally capable systems that operate in the real world, and the assumption that it is hasn’t been sufficiently motivated.

DragonGod May 6, 2023, 7:50 PM
1 point
on: DragonGod’s Shortform
What are the best arguments that expected utility maximisers are adequate (descriptive if not mechanistic) models of powerful AI systems?
[I want to address them in my piece arguing the contrary position.]

DragonGod Apr 30, 2023, 4:13 AM
2 points
0
in reply to: DragonGod’s comment on: LLMs and computation complexity
Caveat to the caveat:

The solution is IMO just to consider the number of computations performed per generated token as some function of the model size, and once we’ve identified a suitable asymptotic order on the function, we can say intelligent things like “the smallest network capable of solving a problem in complexity class C of size N is X”.

Or if our asymptotic bounds are not tight enough:

“No economically feasible LLM can solve problems in complexity class C of size >= N”.

(Where economically feasible may be something defined by aggregate global economic resources or similar, depending on how tight you want the bound to be.)

Regardless, we can still obtain meaningful impossibility results.

DragonGod Apr 30, 2023, 2:27 AM
2 points
0
in reply to: Rudi C’s comment on: LLMs and computation complexity
The solution is IMO just to consider the number of computations performed per generated token as some function of the model size, and once we’ve identified a suitable asymptotic order on the function, we can say intelligent things like “the smallest network capable of solving a problem in complexity class C of size N is X”.

Or if our asymptotic bounds are not tight enough:

“No economically feasible LLM can solve problems in complexity class C of size >= N”.

(Where economically feasible may be something defined by aggregate global economic resources or similar, depending on how tight you want the bound to be.)

Regardless, we can still obtain meaningful impossibility results.
What links here?
- DragonGod's comment on LLMs and computation complexity by Jonathan Marcus (Apr 30, 2023, 4:13 AM; 2 points)

DragonGod Apr 29, 2023, 11:23 PM
2 points
0
in reply to: DragonGod’s comment on: LLMs and computation complexity
Very big caveat: the LLM doesn’t actually perform O(1) computations per generated token.

The number of computational steps performed per generated token scales with network size: https://www.lesswrong.com/posts/XNBZPbxyYhmoqD87F/llms-and-computation-complexity?commentId=QWEwFcMLFQ678y5Jp

DragonGod Apr 29, 2023, 10:15 PM
5 points
−2
on: LLMs and computation complexity
Strongly upvoted.

Short but powerful.

Tl;Dr: LLMs perform O(1) computational steps per generated token and this is true regardless of the generated token.

The LLM sees each token in its context window when generating the next token so can compute problems in O(n^2) [where n is the context window size].

LLMs can get along the computational requirements by “showing their working” and simulating a mechanical computer (one without backtracking, so not Turing complete) in their context window.

This only works if the context window is large enough to contain the workings for the entire algorithm.

Thus LLMs can perform matrix multiplication when showing workings, but not when asked to compute it without showing workings.

Important fundamental limitation on the current paradigm.

We can now say with certainty tasks that GPT will never be able to solve (e.g. beat Stockfish at Chess because Chess is combinatorial and the LLM can’t search the game tree to any depth) no matter how far it’s scaled up.

This is a very powerful argument.

DragonGod Apr 25, 2023, 1:34 PM
4 points
on: DragonGod’s Shortform
A reason I mood affiliate with shard theory so much is that like...
I’ll have some contention with the orthodox ontology for technical AI safety and be struggling to adequately communicate it, and then I’ll later listen to a post/podcast/talk by Quintin Pope/Alex Turner, or someone else trying to distill shard theory and then see the exact same contention I was trying to present expressed more eloquently/with more justification.
One example is that like I had independently concluded that “finding an objective function that was existentially safe when optimised by an arbitrarily powerful optimisation process is probably the wrong way to think about a solution to the alignment problem”.
And then today I discovered that Alex Turner advances a similar contention in “Inner and outer alignment decompose one hard problem into two extremely hard problems”.
Shard theory also seems to nicely encapsulates my intuitions that we shouldn’t think about powerful AI systems as optimisation processes with a system wide objective that they are consistently pursuing.
Or just the general intuitions that our theories of intelligent systems should adequately describe the generally intelligent systems we actually have access to and that theories that don’t even aspire to do that are ill motivated.
It is the case that I don’t think I can adequately communicate shard theory to a disbeliever, so on reflection there’s some scepticism that I properly understand it.
That said, the vibes are right.

DragonGod Apr 25, 2023, 1:32 PM
3 points
on: DragonGod’s Shortform
“All you need is to delay doom by one more year per year and then you’re in business” — Paul Christiano.

DragonGod Apr 24, 2023, 12:05 AM
2 points
0
on: Consequentialism is in the Stars not Ourselves
Took this to drafts for a few days with the intention of refining it and polishing the ontology behind the post.
I ended up not doing that as much, because the improvements I was making to the underlying ontology felt better presented as a standalone post, so I mostly factored them out of this one.
I’m not satisfied with this post as is, but there’s some kernel of insight here that I think is valuable, and I’d want to be able to refer to the basic thrust of this post/some arguments made in it elsewhere.
I may make further edits to it in future.

Consequentialism is in the Stars not Ourselves

DragonGodApr 24, 2023, 12:02 AM

7 points

19 comments5 min readLW link

DragonGod Apr 21, 2023, 6:17 PM
2 points
on: Risks from Learned Optimization: Conclusion and Related Work
It should be noted, however, that while inner alignment is a robustness problem, the occurrence of unintended mesa-optimization is not. If the base optimizer’s objective is not a perfect measure of the human’s goals, then preventing mesa-optimizers from arising at all might be the preferred outcome. In such a case, it might be desirable to create a system that is strongly optimized for the base objective within some limited domain without that system engaging in open-ended optimization in new environments.(11) One possible way to accomplish this might be to use strong optimization at the level of the base optimizer during training to prevent strong optimization at the level of the mesa-optimizer.(11)
I don’t really follow this paragraph, especially the bolded.
Why would mesa-optimisation arising when not intended not be an issue for robustness (the mesa-optimiser could generalise capably of distribution but pursue the wrong goal).
The rest of the post also doesn’t defend that claim; it feels more like defending a claim like:
The non-occurrence of mesa-optimisation is not a robustness problem.

DragonGod Apr 20, 2023, 6:49 PM
2 points
in reply to: evhub’s comment on: Deceptive Alignment
Is this a correct representation of corrigible alignment:
1. The mesa-optimizer (MO) has a proxy of the base objective that it’s optimising for.
2. As more information about the base objective is received, MO updates the proxy.
3. With sufficient information, the proxy may converge to a proper representation of the base objective.
4. Example: a model-free RL algorithm whose policy is argmax over actions with respect to its state-action value function
  1. The base objective is the reward signal
  2. The value function serves as a proxy for the base objective.
  3. The value function is updated as future reward signals are received, gradually refining the proxy to better align with the base objective.

DragonGod Apr 20, 2023, 3:32 PM
2 points
0
in reply to: CallumMcDougall’s comment on: AI Alignment Research Engineer Accelerator (ARENA): call for applicants
Sounds good, will do!

DragonGod Apr 20, 2023, 1:53 PM
4 points
0
on: AI Alignment Research Engineer Accelerator (ARENA): call for applicants
March 22nd is when my first exam starts.

It finishes June 2nd.

Is it possible for me to delay my start a bit?

DragonGod Apr 19, 2023, 9:22 PM
4 points
0
in reply to: tailcalled’s comment on: Consequentialism is in the Stars not Ourselves
I’m gestating on this post. I suggest part of my original framing was confused, and so I’ll just let the ideas ferment some more.

DragonGod Apr 19, 2023, 7:59 AM
2 points
0
in reply to: tailcalled’s comment on: Consequentialism is in the Stars not Ourselves
Yeah for humans in particular, I think the statement is not true of solely biological evolution.

But also, I’m not sure you’re looking at it on the right level. Any animal presumably doesvmany bits worth of selection in a given day, but the durable/macroscale effects are better explained by evolutionary forces acting on the population than actions of different animals within their lifetimes.

Or maybe this is just a confused way to think/talk about it.

DragonGod Apr 19, 2023, 7:41 AM
2 points
0
in reply to: tailcalled’s comment on: Consequentialism is in the Stars not Ourselves
I could change that. I was thinking of work done in terms of bits of selection.

Though I don’t think that statement is true of humans unless you also include cultural memetic evolution (which I think you should).

DragonGod

Con­se­quen­tial­ism is in the Stars not Ourselves

Consequentialism is in the Stars not Ourselves