Independent AI safety researcher
NicholasKees
This anti-China attitude also seems less concerned with internal threats to democracy. If super-human AI becomes a part of the US military-industrial complex, even if we assume they succeed at controlling it, I find it unlikely that the US can still be described as a democracy.
It’s not hard to criticize the “default” strategy of AI being used to enforce US hegemony, what seems hard is defining a real alternative path for AI governance that can last, and achieve the goal of preventing dangerous arms races long-term. The “tool AI” world you describe still needs some answer to rising tensions between the US and China, and that answer needs to be good enough not just for people concerned about safety, but good enough for the nationalist forces which are likely to drive US foreign policy.
then we can all go home, right?
Doesn’t this just shift what we worry about? If control of roughly human level and slightly superhuman systems is easy, that still leaves:
Human institutions using AI to centralize power
Conflict between human-controlled AI systems
Going out with a whimper scenarios (or other multi-agent problems)
Not understanding the reasoning of vastly superhuman AI (even with COT)
What feels underexplored to me is: If we can control roughly human-level AI systems, what do we DO with them?
I’ve noticed that a lot of LW comments these days will start by thanking the author, or expressing enthusiasm or support before getting into the substance. I have the feeling that this didn’t use to be the case as much. Is that just me?
can it maintain its own boundary over time, in the face of environmental disruption? Some agents are much better at this than others.
I really wish there was more attention paid to this idea of robustness to environmental disruption. It also comes up in discussions of optimization more generally (not just agents). This robustness seems to me like the most risk-relevant part of all this, and seems like it might be more important than the idea of a boundary. Maybe maintaining a boundary is a particularly good way for a process to protect itself from disruption, but I notice some doubt that this idea is most directly getting at what is dangerous about intelligent/optimizing systems, whereas robustness to environmental disruption feels like it has the potential to get at something broader that could unify both agent based risk narratives and non-agent based risk narratives.
Thanks!
Replying in order:Currently completely random yes. We experimented with a more intelligent “daemon manager,” but it was hard to make one which didn’t have a strong universal preference for some daemons over others (and the hacks we came up with to try to counteract this favoritism became increasingly convoluted). It would be great to find an elegant solution to this.
Good point! Thanks for letting people know.
I’ve also had that problem, and whenever I look through the suggestions I often feel like there were many good questions/comments that got pruned away. The reason to focus on surprise was mainly to avoid the repetitiveness caused by mode collapse, where the daemon gets “stuck” giving the same canned responses. This is a crude instrument though, since as you say, just because a response isn’t surprising, doesn’t mean it isn’t useful.
A note to anyone having trouble with their API key:
The API costs money, and you have to give them payment information in order to be able to use it. Furthermore, there are also apparently tiers which determine the rate limits on various models (https://platform.openai.com/docs/guides/rate-limits/usage-tiers).
The default chat model we’re using is gpt-4o, but it seems like you don’t get access to this model until you hit “tier 1,” which happens when you have spent at least $5 on API requests. If you haven’t used the API before, and think this might be your issue, you can try using gpt-3.5-turbo which is definitely available at the “free tier,” though without giving them any payment information you will still run into an issue as this model also costs money. You can also log into your account and go here to buy at least $5 in OpenAI API credits: https://platform.openai.com/settings/organization/billing/overview
Finally, if you are working at an organization which is providing you API credits, you need to make sure to set that organization as your default organization here: https://platform.openai.com/settings/profile?tab=api-keys If you don’t want to do this, in the Pantheon settings you can also provide an organization ID, which you should be able to find here: https://platform.openai.com/settings/organization/general
Sorry for anyone who has found this confusing. Please don’t hesitate to reach out if you continue to have trouble.
Daimons are lesser divinities or spirits, often personifications of abstract concepts, beings of the same nature as both mortals and deities, similar to ghosts, chthonic heroes, spirit guides, forces of nature, or the deities themselves.
It’s a nod to ancient Greek mythology: https://en.wikipedia.org/wiki/Daimon
a daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user.
Also nodding to its use as a term for certain kinds of computer programs: https://en.wikipedia.org/wiki/Daemon_(computing)
Hey Alexander! They should appear fairly soon after you’ve written at least 2 thoughts. The app will also let you know when a daemon is currently developing a response. Maybe there is an issue with your API key? There should be some kind of error message indicating why no daemons are appearing. Please DM me if that isn’t the case and we’ll look into what’s going wrong for you.
We are! There’s a bunch of features we’d like to add, and for the most part we expect to be moving on to other projects (so no promises on when we’ll get to it), but we do absolutely want to add support for other models.
There is a field called Forensic linguistics where detectives use someone’s “linguistic fingerprint” to determine the author of a document (famously instrumental in catching Ted Kaczynski by analyzing his manifesto). It seems like text is often used to predict things like gender, socioeconomic background, and education level.
If LLMs are superhuman at this kind of work, I wonder whether anyone is developing AI tools to automate this. Maybe the demand is not very strong, but I could imagine, for example, that an authoritarian regime might have a lot of incentive to de-anonymize people. While a company like OpenAI seems likely to have an incentive to hide how much the LLM actually knows about the user, I’m curious where anyone would have a strong incentive to make full use of superhuman linguistic analysis.
I wish there were an option in the settings to opt out of seeing the LessWrong reacts. I personally find them quite distracting, and I’d like to be able to hover over text or highlight it without having to see the inline annotations.
How would (unaligned) superintelligent AI interact with extraterrestrial life?
Humans, at least, have the capacity for this kind of “cosmopolitanism about moral value.” Would the kind of AI that causes human extinction share this? It would be such a tragedy if the legacy of the human race is to leave behind a kind of life that goes forth and paves the universe, obliterating any and all other kinds of life in its path.
Some thoughts:
First, it sounds like you might be interested the idea of d/acc from this Vitalik Buterin post, which advocates for building a “defense favoring” world. There are a lot of great examples of things we can do now to make the world more defense favoring, but when it comes to strongly superhuman AI I get the sense that things get a lot harder.
Second, there doesn’t seem like a clear “boundaries good” or “boundaries bad” story to me. Keeping a boundary secure tends to impose some serious costs on the bandwidth of what can be shared across it. Pre-industrial Japan maintained a very strict boundary with the outside world to prevent foreign influence, and the cost was falling behind the rest of the world technologically.
My left and right hemispheres are able to work so well together because they don’t have to spend resources protecting themselves from each other. Good cooperative thinking among people also relies on trust making it possible to loosen boundaries of thought. Weakening borders between countries can massively increase trade, and also relies on trust between the participant countries. The problem with AI is that we can’t give it that level of trust, and so we need to build boundaries, but the ultimate cost seems to be that we eventually get left behind. Creating the perfect boundary that only lets in the good and never the bad, and doesn’t incur a massive cost, seems like a really massive challenge and I’m not sure what that would look like.
Finally, when I think of Cyborgism, I’m usually thinking of it in terms of taking control over the “cyborg period” of certain skills, or the period of time where human+AI teams still outperform either humans or AIs on their own. In this frame, if we reach a point where AIs broadly outperform human+AI teams, then baring some kind of coordination, humans won’t have the power to protect themselves from all the non-human agency out there (and it’s up to us to make good use of the cyborg period before then!)
In that frame, I could see “protecting boundaries” intersecting with cyborgism, for example in that AI could help humans perform better oversight and guard against disempowerment around the end of some critical cyborg period. Developing a cyborgism that scales to strongly superhuman AI has both practical challenges (like the kind neuralink seeks to overcome), as well as requiring you to solve it’s own particular version of alignment problem (e.g. how can you trust the AI you are merging with won’t just eat your mind).
Thank you, it’s been fixed.
In terms of LLM architecture, do transformer-based LLMs have the ability to invent new, genuinely useful concepts?
So I’m not sure how well the word “invent” fits here, but I think it’s safe to say LLMs have concepts that we do not.
Recently @Joseph Bloom was showing me Neuronpedia which catalogues features found in GPT-2 by sparse autoencoders, and there were many features which were semantically coherent, but I couldn’t find a word in any of the languages I spoke that could point to these concepts exactly. It felt a little bit like how human languages often have words that don’t translate, and this made us wonder whether we could learn useful abstractions about the world (e.g. that we actually import into English) by identifying the features being used by LLMs.
You might enjoy this post which approaches this topic of “closing the loop,” but with an active inference lens: https://www.lesswrong.com/posts/YEioD8YLgxih3ydxP/why-simulator-ais-want-to-be-active-inference-ais
A main motivation of this enterprise is to assess whether interventions in the realm of Cooperative AI, that increase collaboration or reduce costly conflict, can seem like an optimal marginal allocation of resources.
After reading the first three paragraphs, I had basically no idea what interventions you were aiming to evaluate. Later on in the text, I gather you are talking about coordination between AI singletons, but I still feel like I’m missing something about what problem exactly you are aiming to solve with this. I could have definitely used a longer, more explain-like-I’m-five level introduction.
The cap per trader per market on PredictIt is $850