Working to bring insights from the collective deliberation and digital democracy space to build tools for AI-facilitated group dialogues.
Cofounder of Mosaic Labs with @Sofia Vanhanen where we are developing Nexus, a discussion platform for improving group epistemics.
If you’re interested in this direction, or AI for epistemics more broadly, please don’t hesitate to shoot me a DM, or join our discord server!
NicholasKees
How would (unaligned) superintelligent AI interact with extraterrestrial life?
Humans, at least, have the capacity for this kind of “cosmopolitanism about moral value.” Would the kind of AI that causes human extinction share this? It would be such a tragedy if the legacy of the human race is to leave behind a kind of life that goes forth and paves the universe, obliterating any and all other kinds of life in its path.
Some thoughts:
First, it sounds like you might be interested the idea of d/acc from this Vitalik Buterin post, which advocates for building a “defense favoring” world. There are a lot of great examples of things we can do now to make the world more defense favoring, but when it comes to strongly superhuman AI I get the sense that things get a lot harder.
Second, there doesn’t seem like a clear “boundaries good” or “boundaries bad” story to me. Keeping a boundary secure tends to impose some serious costs on the bandwidth of what can be shared across it. Pre-industrial Japan maintained a very strict boundary with the outside world to prevent foreign influence, and the cost was falling behind the rest of the world technologically.
My left and right hemispheres are able to work so well together because they don’t have to spend resources protecting themselves from each other. Good cooperative thinking among people also relies on trust making it possible to loosen boundaries of thought. Weakening borders between countries can massively increase trade, and also relies on trust between the participant countries. The problem with AI is that we can’t give it that level of trust, and so we need to build boundaries, but the ultimate cost seems to be that we eventually get left behind. Creating the perfect boundary that only lets in the good and never the bad, and doesn’t incur a massive cost, seems like a really massive challenge and I’m not sure what that would look like.
Finally, when I think of Cyborgism, I’m usually thinking of it in terms of taking control over the “cyborg period” of certain skills, or the period of time where human+AI teams still outperform either humans or AIs on their own. In this frame, if we reach a point where AIs broadly outperform human+AI teams, then baring some kind of coordination, humans won’t have the power to protect themselves from all the non-human agency out there (and it’s up to us to make good use of the cyborg period before then!)
In that frame, I could see “protecting boundaries” intersecting with cyborgism, for example in that AI could help humans perform better oversight and guard against disempowerment around the end of some critical cyborg period. Developing a cyborgism that scales to strongly superhuman AI has both practical challenges (like the kind neuralink seeks to overcome), as well as requiring you to solve it’s own particular version of alignment problem (e.g. how can you trust the AI you are merging with won’t just eat your mind).
Thank you, it’s been fixed.
In terms of LLM architecture, do transformer-based LLMs have the ability to invent new, genuinely useful concepts?
So I’m not sure how well the word “invent” fits here, but I think it’s safe to say LLMs have concepts that we do not.
Recently @Joseph Bloom was showing me Neuronpedia which catalogues features found in GPT-2 by sparse autoencoders, and there were many features which were semantically coherent, but I couldn’t find a word in any of the languages I spoke that could point to these concepts exactly. It felt a little bit like how human languages often have words that don’t translate, and this made us wonder whether we could learn useful abstractions about the world (e.g. that we actually import into English) by identifying the features being used by LLMs.
You might enjoy this post which approaches this topic of “closing the loop,” but with an active inference lens: https://www.lesswrong.com/posts/YEioD8YLgxih3ydxP/why-simulator-ais-want-to-be-active-inference-ais
A main motivation of this enterprise is to assess whether interventions in the realm of Cooperative AI, that increase collaboration or reduce costly conflict, can seem like an optimal marginal allocation of resources.
After reading the first three paragraphs, I had basically no idea what interventions you were aiming to evaluate. Later on in the text, I gather you are talking about coordination between AI singletons, but I still feel like I’m missing something about what problem exactly you are aiming to solve with this. I could have definitely used a longer, more explain-like-I’m-five level introduction.
That sounds right intuitively. One thing worth noting though is that most notes get very few ratings, and most users rate very few notes, so it might be trickier than it sounds. Also if I were them I might worry about some drastic changes in note rankings as a result of switching models. Currently, just as notes can become helpful by reaching a threshold of 0.4, they can lose this status by dropping below 0.39. They may also have to manually pick new thresholds, as well as maybe redesign the algorithm slightly (since it seems that a lot of this algorithm was built via trial and error, rather than clear principles).
“Note: for now, to avoid overfitting on our very small dataset, we only use 1-dimensional factors. We expect to increase this dimensionality as our dataset size grows significantly.”
This was the reason given from the documentation.
Thanks for pointing that out. I’ve added some clarification.
That sounds cool! Though I think I’d be more interested using this to first visualize and understand current LW dynamics rather than immediately try to intervene on it by changing how comments are ranked.
I’m confused by the way people are engaging with this post. That well functioning and stable democracies need protections against a “tyranny of the majority” is not at all a new idea; this seems like basic common sense. The idea that the American civil war was precipitated by the South perceiving an end to their balance of power with the North also seems pretty well accepted. Furthermore, there are lots of other things that make democratic systems work well: e.g. a system of laws/conflict resolution or mechanisms for peaceful transfers of power.
Community Notes by X
fyi, the link chatgptiseatingtheworld.com does not have a secure connection.
Even if you suppose that there are extremely good non-human futures, creating a new kind of life and unleashing it upon the world is a huge deal, with enormous ethical/philosophical implications! To unilaterally make a decision that would drastically affect (and endanger) the lives of everyone on earth (human and non-human) seems extremely bad, even if you had very good reasons to believe that this ends well (which as far as I can tell, you don’t).
I have sympathy for the idea of wanting AI systems to be able to pursue lives they find fulfilling and to find their own kinds of value, for the same reason I would, upon encountering alien life, want to let those aliens find value in their own ways.
But your post seems to imply that we should just give up on trying to positively affect the future, spend no real thought on what would be the biggest decision ever made in all of history, all based on a hunch that everything is guaranteed to end well no matter what we do? This perspective, to me, comes off as careless, selfish, and naive.
I just ran into a post which, if you are interested in AI consciousness, you might find interesting: Improving the Welfare of AIs: A Nearcasted Proposal
There seem to be a lot of good reasons to take potential AI consciousness seriously, even if we haven’t fully understood it yet.
It seems hard to me to be extremely confident in either direction. I’m personally quite sympathetic to the idea, but there is very little consensus on what consciousness is, or what a principled approach would look like to determining whether/to what extent a system is conscious.
Here is a recent paper that gives a pretty in-depth discussion: Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
What you write seems to be focused entirely on the behavior of a system, and while I know there people who agree with that focus, from what I can tell most consciousness researchers are interested in particular properties of the internal process that produces that behavior.
More generally, science is about identifying the structure and patterns in the world; the task taxonomy learned by powerful language models may be very convergent and could be a useful map for understanding the territory of the world we are in. What’s more, such a decomposition would itself be of scientifico-philosophical interest — it would tell us something about thinking.
I would love to see someone expand on the ways we could use interpretability to learn about the world, or the structure of tasks (or perhaps examples of how we’ve already done this?). Aside from being interesting scientifically, maybe this could also help us build economically valuable systems which are more explicit and predictable?
Credit goes to Daniel Biber: https://www.worldphoto.org/sony-world-photography-awards/winners-galleries/2018/professional/shortlisted/natural-world/very
After the shape dissipated it actually reformed into another bird shape.
I wish there were an option in the settings to opt out of seeing the LessWrong reacts. I personally find them quite distracting, and I’d like to be able to hover over text or highlight it without having to see the inline annotations.