DragonGod

Karma: 2,486

Theoretical Computer Science Msc student at the University of [Redacted] in the United Kingdom.

I’m an aspiring alignment theorist; my research vibes are descriptive formal theories of intelligent systems (and their safety properties) with a bias towards constructive theories.

I think it’s important that our theories of intelligent systems remain rooted in the characteristics of real world intelligent systems; we cannot develop adequate theory from the null string as input.

DragonGod Jan 28, 2025, 4:19 PM
2 points
0
in reply to: Vanessa Kosoy’s comment on: Announcement: Learning Theory Online Course
I’m currently working through Naive Set Theory (alongside another text). I’ll take this as a recommendation to work through the other textbooks later.
My maths level is insufficient for the course I’d guess.
I would appreciate it if videos of the meetings could be recorded. Or maybe I should just stick around and hope this will be run again next year.

DragonGod Jan 17, 2025, 2:24 PM
10 points
0
on: We probably won’t just play status games with each other after AGI
When I saw the beginning/title I thought the post would be a refutation of the material scarcity thesis; I found myself disappointed it is not.

DragonGod Jan 2, 2025, 10:23 AM
6 points
0
in reply to: DragonGod’s comment on: DragonGod’s Shortform
There is not an insignificant sense of guilt/of betraying myself from 2023 and my ambitions from before.

And I don’t want to just end up doing irrelevant TCS research that only a few researchers in a niche field will ever care about.

It’s not high impact research.

And it’s mostly just settling. I get the sense that I enjoy theoretical research, I don’t currently feel poised to contribute to the AI safety problem, I seem to have an unusually good (at least it appears so to my limited understanding) opportunity to pursue a boring TCS PhD in some niche field that few people care about.

I don’t think I’ll be miserable pursuing the boring TCS PhD or not enjoy it, or anything of the sort. It’s just not directly contributing to what I wanted to contribute to. It’s somewhat sad and it’s undignified (but it’s less undignified than the path I thought I was on at various points in the last 15 months).

DragonGod Jan 2, 2025, 10:10 AM
7 points
0
on: DragonGod’s Shortform
I still want to work on technical AI safety eventually.
I feel like I’m on quite far off path from directly being useful in 2025 than I felt in 2023.
And taking a detour to do a TCS PhD that isn’t directly pertinent to AI safety (current plan) feels like not contributing.

Cope is that becoming a strong TCS researcher will make me better poised to contribute to the problem, but short timelines could make this path less viable.
[Though there’s nothing saying I can’t try to work on AI on the side even if it isn’t the focus of my PhD.]

DragonGod Dec 6, 2024, 12:51 PM
12 points
3
on: (The) Lightcone is nothing without its people: LW + Lighthaven’s big fundraiser
I think LW is a valuable intellectual hub and community.

Haven’t been an active participant of recent, but it’s still a service I occasionally find myself relying on explicitly, and I prefer the world where it continues to exist.

[I donated $20. Am unemployed and this is a nontrivial fraction of my disposable income.]

DragonGod Nov 25, 2024, 1:43 PM
4 points
0
in reply to: Joel Burget’s comment on: DeepSeek beats o1 on math and ties on coding; will release weights
o1′s reasoning trace also does this for different languages (IIRC I’ve seen Chinese and Japanese and other languages I don’t recognise/recall), usually an entire paragraph not a word, but when I translated them it seemed to make sense in context.

DragonGod Oct 16, 2024, 4:14 PM
6 points
4
in reply to: TsviBT’s comment on: Change My Mind: Thirders in “Sleeping Beauty” are Just Doing Epistemology Wrong
This is not a rhetorical question:) What do you mean by “probability” here?
Yeah, since posting this question:
I have updated towards thinking that it’s in a sense not obvious/not clear what exactly “probability” is supposed to be interpreted as here.
And once you pin down an unambiguous interpretation of probability the problem dissolves.
I had a firm notion in mind for what I thought probability meant. But Rafael Harth’s answer really made me unconfident that the notion I had in mind was the right notion of probability for the question.

DragonGod Oct 16, 2024, 4:04 PM
2 points
0
in reply to: Ape in the coat’s comment on: Change My Mind: Thirders in “Sleeping Beauty” are Just Doing Epistemology Wrong
I have not read all of them!

DragonGod Oct 16, 2024, 1:21 PM
2 points
0
in reply to: DragonGod’s comment on: Change My Mind: Thirders in “Sleeping Beauty” are Just Doing Epistemology Wrong
My current position now is basically:
Actually, I’m less confident and now unsure.

Harth’s framing was presented as an argument re: the canonical Sleeping Beauty problem.

And the question I need to answer is: “should I accept Harth’s frame?”

I am at least convinced that it is genuinely a question about how we define probability.

There is still a disconnect though.

While I agree with the frequentist answer, it’s not clear to me how to backgpropagate this in a Bayesian framework.

Suppose I treat myself as identical to all other agents in the reference class.

I know that my reference class will do better if we answer “tails” when asked about the outcome of the coin toss.

But it’s not obvious to me that there is anything to update from when trying to do a Bayesian probability calculation.

There being many more observers in the tails world to me doesn’t seem to alter these probabilities at all:
- P(waking up)
- P(being asked questions)
- P(...)
By stipulation my observational evidence is the same in both cases.

And I am not compelled by assuming I should be randomly sampled from all observers.

There are many more versions of me in this other world does not by itself seem to raise the probability of me witnessing the observational evidence since by stipulation all versions of me witness the same evidence.

DragonGod Oct 16, 2024, 12:49 PM
7 points
0
in reply to: Rafael Harth’s comment on: Change My Mind: Thirders in “Sleeping Beauty” are Just Doing Epistemology Wrong
I’m curious how your conception of probability accounts for logical uncertainty?

DragonGod Oct 16, 2024, 12:44 PM
15 points
0
in reply to: Rafael Harth’s comment on: Change My Mind: Thirders in “Sleeping Beauty” are Just Doing Epistemology Wrong
So in this case, I agree that like if this experiment is repeated multiple times and every Sleeping Beauty version created answered tails, the reference class of Sleeping Beauty agents would have many more correct answers than if the experiment is repeated many times and every sleeping Beauty created answered heads.

I think there’s something tangible here and I should reflect on it.

I separately think though that if the actual outcome of each coin flip was recorded, there would be a roughly equal distribution between heads and tails.

And when I was thinking through the question before it was always about trying to answer a question regarding the actual outcome of the coin flip and not what strategy maximises monetary payoffs under even bets.

While I do think that like betting odds isn’t convincing re: actual probabilities because you can just have asymmetric payoffs on equally probable mutually exclusive and jointly exhaustive events, the “reference class of agents being asked this question” seems like a more robust rebuttal.

I want to take some time to think on this.

Strong up voted because this argument actually/genuinely makes me think I might be wrong here.

Much less confident now, and mostly confused.

DragonGod Oct 16, 2024, 11:49 AM
3 points
1
in reply to: Charlie Steiner’s comment on: Change My Mind: Thirders in “Sleeping Beauty” are Just Doing Epistemology Wrong
I mean I am not convinced by the claim that Bob is wrong.

Bob’s prior probability is 50%. Bob sees no new evidence to update this prior so the probability remains at 50%.

I don’t favour an objective notion of probabilities. From my OP:
2. Bayesian Reasoning
- Probability is a property of the map (agent’s beliefs), not the territory (environment).
- For an observation O to be evidence for a hypothesis H, P(O|H) must be > P(O|¬H).
- The wake-up event is equally likely under both Heads and Tails scenarios, thus provides no new information to update priors.
- The original ⁵⁰⁄₅₀ probability should remain unchanged after waking up.
So I am unconvinced by your thought experiments? Observing nothing new I think the observers priors should remain unchanged.

I feel like I’m not getting the distinction you’re trying to draw out with your analogy.

DragonGod Oct 16, 2024, 11:44 AM
2 points
−3
in reply to: Gurkenglas’s comment on: Change My Mind: Thirders in “Sleeping Beauty” are Just Doing Epistemology Wrong
I mean I think the “gamble her money” interpretation is just a different question. It doesn’t feel to me like a different notion of what probability means, but just betting on a fair coin but with asymmetric payoffs.

The second question feels closer to actually an accurate interpretation of what probability means.

DragonGod Jan 20, 2024, 10:46 PM
LW: 2 AF: 1
0
AF
on: Uncertainty in all its flavours
i.e. if each forecaster $w \in W$ has an first-order belief $f (w) \in B (S)$ , and $w \in B (S)$ is your second-order belief about which forecaster is correct, then $(w ⊳_{W S} f) \in B (S)$ should be your first-order belief about the election.
I think there might be a typo here. Did you instead mean to write: “ $w \in B (W)$ ” for the second order beliefs about the forecasters?

DragonGod Jul 26, 2023, 3:50 PM
5 points
1
in reply to: quetzal_rainbow’s comment on: Order Matters for Deceptive Alignment
The claim is that given the presence of differential adversarial examples, the optimisation process would adjust the parameters of the model such that it’s optimisation target is the base goal.

DragonGod Jul 25, 2023, 9:16 PM
2 points
in reply to: Dalcy’s comment on: DragonGod’s Shortform
That was it, thanks!

DragonGod Jul 25, 2023, 8:30 PM
2 points
on: DragonGod’s Shortform
Probably sometime last year, I posted on Twitter something like: “agent values are defined on agent world models” (or similar) with a link to a LessWrong post (I think the author was John Wentworth).
I’m now looking for that LessWrong post.
My Twitter account is private and search is broken for private accounts, so I haven’t been able to track down the tweet. If anyone has guesses for what the post I may have been referring to was, do please send it my way.

DragonGod Jul 24, 2023, 8:08 AM
9 points
on: DragonGod’s Shortform
Most of the catastrophic risk from AI still lies in superhuman agentic systems.
Current frontier systems are not that (and IMO not poised to become that in the very immediate future).
I think AI risk advocates should be clear that they’re not saying GPT-5/Claude Next is an existential threat to humanity.

[Unless they actually believe that. But if they don’t, I’m a bit concerned that their message is being rounded up to that, and when such systems don’t reveal themselves to be catastrophically dangerous, it might erode their credibility.]

DragonGod Jul 22, 2023, 12:46 PM
6 points
on: DragonGod’s Shortform
Immigration is such a tight constraint for me.

My next career steps after I’m done with my TCS Masters are primarily bottlenecked by “what allows me to remain in the UK” and then “keeps me on track to contribute to technical AI safety research”.

What I would like to do for the next 1 − 2 years (“independent research”/ “further upskilling to get into a top ML PhD program”) is not all that viable a path given my visa constraints.

Above all, I want to avoid wasting N more years by taking a detour through software engineering again so I can get Visa sponsorship.

[I’m not conscientious enough to pursue AI safety research/ML upskilling while managing a full time job.]

Might just try and see if I can pursue a TCS PhD at my current university and do TCS research that I think would be valuable for theoretical AI safety research.

The main detriment of that is I’d have to spend N more years in <city> and I was really hoping to come down to London.

Advice very, very welcome.

[Not sure who to tag.]

DragonGod Jul 20, 2023, 3:36 PM
2 points
0
on: Hedonic Loops and Taming RL
Specifically, the experiments by Morrison and Berridge demonstrated that by intervening on the hypothalamic valuation circuits, it is possible to adjust policies zero-shot such that the animal has never experienced a previously repulsive stimulus as pleasurable.
I find this a bit confusing as worded, is something missing?

DragonGod

2. Bayesian Reasoning