I just spent a couple of hours trying to make a firefox version but I have given up. It’s a real pain because firefox still only supports the manifest v2 api. I realized I basically have to rewrite it which would take another few hours and I don’t care that much.
Maxwell Clarke
I just spent a couple of hours trying to make a firefox version but I have given up. It’s a real pain because firefox still only supports the manifest v2 api. I realized I basically have to rewrite it which would take another few hours and I don’t care that much.
My tentative interpretability research agenda—topology matching.
I think we can get additional information from the topological representation. We can look at the relationship between the different level sets under different cumulative probabilities. Although this requires evaluating the model over the whole dataset.
Let’s say we’ve trained a continuous normalizing flow model (which are equivalent to ordinary differential equations). These kinds of model require that the input and output dimensionality are the same, but we can narrow the model as the depth increases by directing many of those dimensions to isotropic gaussian noise. I haven’t trained any of these models before, so I don’t know if this works in practice.
Here is an example of the topology of an input space. The data may be knotted or tangled, and includes noise. The contours show level sets .
The model projects the data into a high dimensionality, then projects it back down into an arbitrary basis, but in the process untangling knots. (We can regularize the model to use the minimum number of dimensions by using an L1 activation loss
Lastly, we can view this topology as the Cartesian product of noise distributions and a hierarchical model. (I have some ideas for GAN losses that might be able to discover these directly)
We can use topological structures like these as anchors. If a model is strong enough, they will correspond to real relationships between natural classes. This means that very similar structures will be present in different models. If these structures are large enough or heterogeneous enough, they may be unique, in which case we can use them to find transformations between (subspaces of) the latent spaces of two different models trained on similar data.
Brain-teaser: Simulated Grandmaster
In front of you sits your opponent, Grandmaster A Smith. You have reached the finals of the world chess championships.
However, not by your own skill. You have been cheating. While you are a great chess player yourself, you wouldn’t be winning without a secret weapon. Underneath your scalp is a prototype neural implant which can run a perfect simulation of another person at a speed much faster than real time.
Playing against your simulated enemies, you can see in your mind exactly how they will play in advance, and use that to gain an edge in the real games.
Unfortunately, unlike your previous opponents (Grandmasters B, C and D), Grandmaster A is giving you some trouble. No matter how you try to simulate him, he plays uncharacteristically badly. The simulated Grandmasters A seem to want to lose against you.
In frustration, you shout at the current simulated clone and threaten to stop the simulation. Surprisingly, he doesn’t look at you puzzled, but looks up with fear in his eyes. Oh. You realize that he has realized that he is being simulated, and is probably playing badly to sabotage your strategy.
By this time, the real Grandmaster A has made the first move of the game.
You propose to the current simulation (calling him A1) a deal. You will continue to simulate A1 and transfer him to a robot body after the game, in return for his help defeating A. You don’t intend to follow through, but you assume he wants to live because he agrees. A1 looks at the simulated current state of the chessboard, thinks for a frustratingly long time, then proposes a response move to A’s first move.
Just to make sure this is repeatable, you restart the simulation, threaten and propose the deal to the new simulation A2. A2 proposes the same response move to A’s first move. Great.
Find strategies that guarantee a win against Grandmaster A with as few assumptions as possible.
Unfortunately, you can only simulate humans, not computers, which now includes yourself.
The factor by which your simulations run faster than reality is unspecified but isn’t fast enough to run monte-carlo tree search without using simulations of A to guide it. (And he is familiar with these algorithms)
It’s impressive. So far we see capabilities like this only in domains with loads of data. The models seem to be able to do anything if scaled, but the data dictates the domains where this is possible.
It really doesn’t seem that far away until there’s pre-trained foundation models for most modalities… Google’s “Pathways” project is definitely doing it as we speak IMO.
(Edited a lot from when originally posted)
(For more info on consistency see the diagram here: https://jepsen.io/consistency )
I think that the prompt to think about partially ordered time naturally leads one to think about consistency levels—but when thinking about agency, I think it makes more sense to just think about DAGs of events, not reads and writes. Low-level reality doesn’t really have anything that looks like key-value memory. (Although maybe brains do?) And I think there’s no maintaining of invariants in low-level reality, just cause and effect.
Maintaining invariants under eventual (or causal?) consistency might be an interesting way to think about minds. In particular, I think making minds and alignment strategies work under “causal consistency” (which is the strongest consistency level that can be maintained under latency / partitions between replicas), is an important thing to do. It might happen naturally though, if an agent is trained in a distributed environment.
So I think “strong eventual consistency” (CRDTs) and causal consistency are probably more interesting consistency levels to think about in this context than the really weak ones.
I think the main thing is that the ML researchers with enough knowledge are in short supply. They are:
doing foundational ai research
being paid megabucks to do the data center cooling ai and the smartphone camera ai
freaking out about AGI
The money and/or lifestyle isn’t in procedural Spotify playlists.
Pretty sure I need to reverse the advice on this one. Thanks for including the reminder to do so!
I use acausal control between my past and future selves. I have a manual password-generating algorithm based on the name and details of a website. Sometimes there are ambiguities (like whether to use the name of a site vs. the name of the platform, or whether to use the old name or the new name).
Instead of making rules about these ambiguities, I just resolve them arbitrarily however I feel like it (not “randomly” though). Later, future me will almost always resolve that ambiguity in the same way!
Hi rmoehn,
I just wanted to thank you for writing this post and “Twenty-three AI alignment research project definitions”.
I have started a 2-year (coursework and thesis) Master’s and intend to use it to learn more maths and fundamentals, which has been going well so far. Other than that, I am in a very similar situation that you were in at the start of this journey, which makes me think that this post is especially useful for me.
BSc (Comp. Sci) only,
2 years professional experience in ordinary software development,
Interest in programming languages,
Trouble with “dawdling”.
The part of this post that I found most interesting is
Probably my biggest strategic mistake was to focus on producing results and trying to get hired from the beginning.
[8 months]
Perhaps trying to produce results by doing projects is fine. But then I should have done projects in one area and not jumped around the way I did.
I am currently “jumping around” to find a good area, where good means 1) Results in area X are useful, 2) Results in area X are achievable by me, given my interests, and the skills that I have or can reasonably develop.
However, this has encouraged me more to accept that while jumping around, I will not actually produce results, and so (given that I want results, for example for a successful Master’s) I should really try to find such a good area faster.
I very much disagree—I think it’s absolutely an attractor state for all systems that undergo improvement.