Maxwell Clarke

Karma: 85

Maxwell Clarke Oct 12, 2022, 1:16 AM
2 points
0
in reply to: Gerald Monroe’s comment on: Objects in Mirror are Closer Than They Appear
Recursive self improvement is something nature doesn’t “want” to do, the conditions have to be just right or it won’t work.
I very much disagree—I think it’s absolutely an attractor state for all systems that undergo improvement.

Maxwell Clarke Oct 9, 2022, 5:36 AM
1 point
0
in reply to: tcelferact’s comment on: Calibrate—New Chrome Extension for hiding numbers so you can guess
I just spent a couple of hours trying to make a firefox version but I have given up. It’s a real pain because firefox still only supports the manifest v2 api. I realized I basically have to rewrite it which would take another few hours and I don’t care that much.

Maxwell Clarke Oct 9, 2022, 5:36 AM
1 point
0
in reply to: Eli Tyre’s comment on: Calibrate—New Chrome Extension for hiding numbers so you can guess
I just spent a couple of hours trying to make a firefox version but I have given up. It’s a real pain because firefox still only supports the manifest v2 api. I realized I basically have to rewrite it which would take another few hours and I don’t care that much.

My tentative interpretability research agenda—topology matching.

Maxwell ClarkeOct 8, 2022, 10:14 PM

10 points

2 comments4 min readLW link

Maxwell Clarke Sep 18, 2022, 7:01 AM
LW: 19 AF: 7
4
AF
on: Coordinate-Free Interpretability Theory
I think we can get additional information from the topological representation. We can look at the relationship between the different level sets under different cumulative probabilities. Although this requires evaluating the model over the whole dataset.
Let’s say we’ve trained a continuous normalizing flow model (which are equivalent to ordinary differential equations). These kinds of model require that the input and output dimensionality are the same, but we can narrow the model as the depth increases by directing many of those dimensions to isotropic gaussian noise. I haven’t trained any of these models before, so I don’t know if this works in practice.
Here is an example of the topology of an input space. The data may be knotted or tangled, and includes noise. The contours show level sets $S_{i} = {x ∣ p (x) > p_{i}}$ .
The model projects the data into a high dimensionality, then projects it back down into an arbitrary basis, but in the process untangling knots. (We can regularize the model to use the minimum number of dimensions by using an L1 activation loss
Lastly, we can view this topology as the Cartesian product of noise distributions and a hierarchical model. (I have some ideas for GAN losses that might be able to discover these directly)
We can use topological structures like these as anchors. If a model is strong enough, they will correspond to real relationships between natural classes. This means that very similar structures will be present in different models. If these structures are large enough or heterogeneous enough, they may be unique, in which case we can use them to find transformations between (subspaces of) the latent spaces of two different models trained on similar data.

Maxwell Clarke Jun 30, 2022, 6:10 AM
3 points
0
on: $500 bounty for alignment contest ideas
Brain-teaser: Simulated Grandmaster
In front of you sits your opponent, Grandmaster A Smith. You have reached the finals of the world chess championships.
However, not by your own skill. You have been cheating. While you are a great chess player yourself, you wouldn’t be winning without a secret weapon. Underneath your scalp is a prototype neural implant which can run a perfect simulation of another person at a speed much faster than real time.
Playing against your simulated enemies, you can see in your mind exactly how they will play in advance, and use that to gain an edge in the real games.
Unfortunately, unlike your previous opponents (Grandmasters B, C and D), Grandmaster A is giving you some trouble. No matter how you try to simulate him, he plays uncharacteristically badly. The simulated Grandmasters A seem to want to lose against you.
In frustration, you shout at the current simulated clone and threaten to stop the simulation. Surprisingly, he doesn’t look at you puzzled, but looks up with fear in his eyes. Oh. You realize that he has realized that he is being simulated, and is probably playing badly to sabotage your strategy.
By this time, the real Grandmaster A has made the first move of the game.
You propose to the current simulation (calling him A1) a deal. You will continue to simulate A1 and transfer him to a robot body after the game, in return for his help defeating A. You don’t intend to follow through, but you assume he wants to live because he agrees. A1 looks at the simulated current state of the chessboard, thinks for a frustratingly long time, then proposes a response move to A’s first move.
Just to make sure this is repeatable, you restart the simulation, threaten and propose the deal to the new simulation A2. A2 proposes the same response move to A’s first move. Great.
Find strategies that guarantee a win against Grandmaster A with as few assumptions as possible.
- Unfortunately, you can only simulate humans, not computers, which now includes yourself.
- The factor by which your simulations run faster than reality is unspecified but isn’t fast enough to run monte-carlo tree search without using simulations of A to guide it. (And he is familiar with these algorithms)

Maxwell Clarke Jun 24, 2022, 4:51 AM
1 point
in reply to: Multicore’s comment on: [Link] OpenAI: Learning to Play Minecraft with Video PreTraining (VPT)
It’s impressive. So far we see capabilities like this only in domains with loads of data. The models seem to be able to do anything if scaled, but the data dictates the domains where this is possible.
It really doesn’t seem that far away until there’s pre-trained foundation models for most modalities… Google’s “Pathways” project is definitely doing it as we speak IMO.

Maxwell Clarke May 15, 2022, 2:20 AM
3 points
AF
in reply to: Ramana Kumar’s comment on: Against Time in Agent Models
(Edited a lot from when originally posted)

(For more info on consistency see the diagram here: https://jepsen.io/consistency )

I think that the prompt to think about partially ordered time naturally leads one to think about consistency levels—but when thinking about agency, I think it makes more sense to just think about DAGs of events, not reads and writes. Low-level reality doesn’t really have anything that looks like key-value memory. (Although maybe brains do?) And I think there’s no maintaining of invariants in low-level reality, just cause and effect.

Maintaining invariants under eventual (or causal?) consistency might be an interesting way to think about minds. In particular, I think making minds and alignment strategies work under “causal consistency” (which is the strongest consistency level that can be maintained under latency / partitions between replicas), is an important thing to do. It might happen naturally though, if an agent is trained in a distributed environment.

So I think “strong eventual consistency” (CRDTs) and causal consistency are probably more interesting consistency levels to think about in this context than the really weak ones.

Maxwell Clarke May 2, 2022, 10:41 AM
8 points
on: Why hasn’t deep learning generated significant economic value yet?
I think the main thing is that the ML researchers with enough knowledge are in short supply. They are:
- doing foundational ai research
- being paid megabucks to do the data center cooling ai and the smartphone camera ai
- freaking out about AGI
The money and/or lifestyle isn’t in procedural Spotify playlists.

Maxwell Clarke Sep 14, 2021, 2:45 PM
5 points
on: Research productivity tip: “Solve The Whole Problem Day”
Pretty sure I need to reverse the advice on this one. Thanks for including the reminder to do so!

Maxwell Clarke Sep 10, 2021, 2:13 AM
3 points
on: Can you control the past?
I use acausal control between my past and future selves. I have a manual password-generating algorithm based on the name and details of a website. Sometimes there are ambiguities (like whether to use the name of a site vs. the name of the platform, or whether to use the old name or the new name).

Instead of making rules about these ambiguities, I just resolve them arbitrarily however I feel like it (not “randomly” though). Later, future me will almost always resolve that ambiguity in the same way!

Maxwell Clarke Dec 14, 2020, 6:23 AM
2 points
on: I’m leaving AI alignment – you better stay
Hi rmoehn,
I just wanted to thank you for writing this post and “Twenty-three AI alignment research project definitions”.
I have started a 2-year (coursework and thesis) Master’s and intend to use it to learn more maths and fundamentals, which has been going well so far. Other than that, I am in a very similar situation that you were in at the start of this journey, which makes me think that this post is especially useful for me.
- BSc (Comp. Sci) only,
- 2 years professional experience in ordinary software development,
- Interest in programming languages,
- Trouble with “dawdling”.
The part of this post that I found most interesting is
Probably my biggest strategic mistake was to focus on producing results and trying to get hired from the beginning.
[8 months]
Perhaps trying to produce results by doing projects is fine. But then I should have done projects in one area and not jumped around the way I did.
I am currently “jumping around” to find a good area, where good means 1) Results in area X are useful, 2) Results in area X are achievable by me, given my interests, and the skills that I have or can reasonably develop.
However, this has encouraged me more to accept that while jumping around, I will not actually produce results, and so (given that I want results, for example for a successful Master’s) I should really try to find such a good area faster.

Maxwell Clarke

My ten­ta­tive in­ter­pretabil­ity re­search agenda—topol­ogy match­ing.

My tentative interpretability research agenda—topology matching.