According to this Nature paper, the Atlantic Meridional Overturning Circulation (AMOC), the “global conveyor belt”, is likely to collapse this century (mean 2050, 95% confidence interval is 2025-2095).
Another recent study finds that it is “on tipping course” and predicts that after collapse average February temperatures in London will decrease by 1.5 °C per decade (15 °C over 100 years). Bergen (Norway) February temperatures will decrease by 35 °C. This is a temperature change about an order of magnitude faster than normal global warming (0.2 °C per decade) but in the other direction!
This seems like a big deal? Anyone with more expertise in climate sciences want to weigh in?
Bart Bussmann
Stitching SAEs of different sizes
Bart Bussmann’s Shortform
I expect the 0.05 peak might be the minimum cosine similarity if you want to distribute 8192 vectors over a 512-dimensional space uniformly? I used a bit of a weird regularizer where I penalized:
mean cosine similarity + mean max cosine similarity + max max cosine similarity
I will check later whether the 0.3 peak all have the same neighbour.
A quick and dirty first experiment with adding an orthogonality regularizer indicates that this can work without too much penalty on the reconstruction loss. I trained an SAE on the MLP output of a 1-layer model with dictionary size 8192 (16 times the MLP output size).
I trained this without the regularizer and got a reconstruction score of 0.846 at an L0 of ~17.
With the regularizer, I got a reconstruction score of 0.828 at an L0 of ~18.
Looking at the cosine similarities between neurons:
Interesting peaks around a cosine similarity of 0.3 and 0.05 there! Maybe (very speculative) that tells us something about the way the model encodes features in superposition?- Jul 13, 2024, 10:40 PM; 1 point) 's comment on Stitching SAEs of different sizes by (
Thanks for the suggestion! @BeyondTheBorg suggested something similar with his Transcendent AI. After some thought, I’ve added the following:
Transcendent AI: AGI uncovers and engages with previously unknown physics, using a different physical reality beyond human comprehension. Its objectives use resources and dimensions that do not compete with human needs, allowing it to operate in a realm unfathomable to us. Humanity remains largely unaffected, as AGI progresses into the depths of these new dimensions, detached from human concerns.
My Trial Period as an Independent Alignment Researcher
Good proposal! I agree that this is a great opportunity to try out some ideas in this space.
Another proposal for the metric:
The regrantor will judge in 5 years whether they are happy that they funded this project. This has a simple binary resolution criterium and aligns the incentives of the market nicely with the regrantor.
Interpreting Modular Addition in MLPs
I agree that “Moral Realism AI” was a bit of a misnomer and I’ve changed it to “Convergent Morality AI”.
Your scenario seems highly specific. Could you try to rephrase it in about three sentences, as in the other scenarios?
I’m a bit wary about adding a lot of future scenarios that are outside of our reality and want the scenarios to focus on the future of our universe. However, I do think there is space for a scenario where our reality ends as it has achieved its goals (as in your scenario, I think?).
Thanks! I think your tag of @avturchin didn’t work, so just pinging them here to see if they think I missed important and probable scenarios.
Taking the Doomsday argument seriously, the “Futures without AGI because we go extinct in another way” and the “Futures with AGI in which we die” seem most probable. In futures with conscious AGI agents, it will depend a lot on how experience gets sampled (e.g. one agent vs many).
Yes, good one! I’ve added the following:
Powergrab with AI: OpenAI, Deepmind or another small group of people invent AGI and align it to their interests. In a short amount of time, they become all-powerful and rule over the world.
I’ve disregarded the “wipe out everyone else” part, as I think that’s unlikely enough for people who are capable of building an AGI.
Thanks, good suggestions! I’ve added the following:
Pious AI: Humanity builds AGI and adopts one of the major religions. Vast amounts of superintelligent cognition is devoted to philosophy, theology, and prayer. AGI proclaims itself to be some kind of Messiah, or merely God’s most loyal and capable servant on Earth and beyond.
I think Transcendant AI is close enough to Far far away AI, where in this case far far away means another plane of physics. Similarly, I think your Matrix AI scenario is captured in:Theoretical Impossibility: For some reason or another (Souls? Consciousness? Quantum something?), it turns out to be theoretically impossible to build AGI. Humanity keeps making progress on other fronts, but just never invents AGI.
where the weird reason in this case is that we live in the matrix.
60+ Possible Futures
I almost never consider character.ai, yet total time spent there is similar to Bing or ChatGPT. People really love the product, that visit duration is off the charts. Whereas this is total failure for Bard if they can’t step up their game.
Wow, wasn’t aware they are this big. And they supposedly train their own models. Does anyone know if the founders have a stance on AI X-risk?
Product Recommendation: LessWrong dialogues with Recast
Interesting! Does it ask for a different confidence interval every time I see the card? Or will it always ask for the 90% confidence interval I see the example card?
This strategy has never worked for me, but I can see it working for other people. If you want to try this though, it is important to make it clear to yourself which procedure you’re following.
I believe that for my mechanism, it is very important to always follow up on the dice. If there is a dice outcome that would disappoint you, just don’t put it on the list!
I can see this being a problem. However, I see myself as someone with very low willpower and this is still not a problem for me. I think this is because of two reasons:
I never put an option on the list that I know I would/could not execute.
I regard the dice outcome as somewhat holy. I would always pay out a bet I lost to a friend. Partly, because it’s just the right thing to do and partly because I know that otherwise, the whole mechanism of betting is worthless from that moment on. I guess that all my parts are happy enough with this system that none of them want to break it by not executing the action.
Thanks!
Yeah, I think that’s fair and don’t necessarily think that stitching multiple SAEs is a great way to move the pareto frontier of MSE/L0 (although some tentative experiments showed they might serve as a good initialization if retrained completely).
However, I don’t think that low L0 should be a goal in itself when training SAEs as L0 mainly serves as a proxy for the interpretability of the features, by lack of good other feature quality metrics. As stitching features doesn’t change the interpretability of the features, I’m not sure how useful/important the L0 metric still is in this context.