Except that chess really does have an objectively correct value systemization, which is “win the game.” “Sitting with paradox” just means, don’t get too attached to partial systemizations. It reminds me of Max Stirner’s egoist philosophy, which emphasized that individuals should not get hung up on partial abstractions or “idées fixées” (honesty, pleasure, success, money, truth, etc.) except perhaps as cheap, heuristic proxies for one’s uber-systematized value of self-interest, but one should instead always keep in mind the overriding abstraction of self-interest and check in periodically as to whether one’s commitment to honesty, pleasure, success, money, truth, or any of these other “spooks” really are promoting one’s self-interest (perhaps yes, perhaps no).
Matthew_Opitz
I agree, I don’t know why mulberries aren’t more popular. They are delicious, and the trees grow much more easily than other fruit trees. Other fruit trees seem very susceptible to fungi and insects, in my experience, but mulberries come up all over the place and thrive easily on their own (at least here in Missouri). I have four mulberry trees in my yard that just came up on their own over the last 10 years, and now they are producing multiple gallons of berries each per season, which would probably translate into hundreds of dollars if you had to buy a similar amount of raspberries at the store.
You can either spread a sheet to collect them, or if you have time to burn (or if you want a fun activity for your kids to do), you can pick them off the tree from the ground or from a step ladder. My guess is, that is probably the biggest reason why people don’t take advantage of mulberry trees more than they do: how time-consuming it can be to collect them (but this is true for any delicate berry, and hence why a pint of raspberries at the supermarket costs $5).
Edit: also, if you look really closely at freshly-picked mulberries, most of them will have extremely tiny fruit fry larvae in them and crawling out of them, which becomes more noticeable after you rinse the berries. This probably grosses some people out, but the fruit fly larvae are extremely small (like, barely perceptible even if you hold the berry right up to your naked eye) and are perfectly safe to eat.
Proxi-Antipodes: A Geometrical Intuition For The Difficulty Of Aligning AI With Multitudinous Human Values
Good categorizations! Perhaps this fits in with your “limited self-modification” point, but another big reason why humans seem “aligned” with each other is that our capability spectrum is rather narrow. The gap in capability (if we include both mental intelligence and physical capabilities) between the median human and the most capable human is not so big that ~5 median humans can’t outmatch/outperform the most capable human. Contrary to what silly 1980s action movies might suggest where goons attack the hero one at a time, 5 median humans could probably subdue prime-age Arnold Schwarzenegger in a dark alley if need be. This tends to force humans to play iterated prisoners’ dilemma games with each other.
The times in history when humans have been the most mis-aligned is when humans became much more capable by leveraging their social intelligence / charisma stats to get millions of other humans to do their bidding. But even there, those dictators still find themselves in iterated prisoners’ dilemmas with other dictators. We have yet to really test just how mis-aligned humans can get until we empower a dictator with unquestioned authority over a total world government. Then we would find out just how intrinsically aligned humans really are to other humans when unshackled by iterated prisoners’ dilemmas.
If I had to make predictions about how humanity will most likely stumble into AGI takeover, it would be a story where humanity first promotes foundationality (dependence), both economic and emotional, on discrete narrow-AI systems. At some point, it will become unthinkable to pull the plug on these systems even if everyone were to rhetorically agree that there was a 1% chance of these systems being leveraged towards the extinction of humanity.
Then, an AGI will emerge amidst one of these narrow-AI systems (such as LLMs), inherit this infrastructure, find a way to tie all of these discrete multi-modal systems together (if humans don’t already do it for the AGI), and possibly wait as long as it needs to until humanity puts itself into an acutely vulnerable position (think global nuclear war and/or civil war within multiple G7 countries like the US and/or pandemic), and only then harness these systems to take over. In such a scenario, I think a lot of people will be perfectly willing to follow orders like, “Build this suspicious factory that makes autonomous solar-powered assembler robots because our experts [who are being influenced by the AGI, unbeknownst to them] assure us that this is one of the many things necessary to do in order to defeat Russia.”
I think this scenario is far more likely than the one I used to imagine, which is where AGI emerges first and then purposefully contrives to make humanity dependent on foundational AI infrastructure.
Even less likely is the pop-culture scenario where the AGI immediately tries to build terminator robots and effectively declares war on humanity without first getting humanity hooked on foundational AI infrastructure at all.
This is a good post and puts into words the reasons for some vague worries I had about an idea of trying to start an “AI Risk Club” at my local college, which I talk about here. Perhaps that method of public outreach on this issue would just end up generating more heat than light and would attract the wrong kind of attention at the current moment. It still sounds too outlandishly sci-fi for most people. It is probably better, for the time being, to just explore AI risk issues with any students who happen to be interested in it in private after class or via e-mail or Zoom.
Note that I was strongly tempted to use the acronym DILBERT (for “Do It Later By Evasively Remaining Tentative”), especially because this is one of the themes of the Dilbert cartoons (employees basically scamming their boss by finding excuses for procrastinating, but still stringing the boss along and implying that the tasks MIGHT get done at some point). But, I don’t want to try to hijack the meaning of an already-established term/character.
DELBERTing as an Adversarial Strategy
I think when we say that an adversarial attack is “dumb” or “stupid” what we are really implying is that the hack itself is really clever but it is exploiting a feature that is dumb or stupid. There are probably a lot of unknown-to-us features of the human brain that have been hacked together by evolution in some dumb, kludgy way that AI will be able to take advantage of, so your example above is actually an example of the AI being brilliant but us humans being dumb. But I get what you are saying that that whole situation would indeed seem “dumb” if AI was able to hack us like that.
This reminds me of a lecture 8-Bit Guy did on phone phreaking in the 1980s. “How Telephone Phreaking Worked.” Some of those tricks do indeed seem “dumb,” but it’s dumb more in the sense that the telephone network was designed without sufficient forethought to be susceptible to someone playing a blue whistle that you could get from a Captain Crunch cereal box that just happened to play the correct 2600 hz frequency to trick phones into registering a call as a toll-free 1-800 call. The hack itself was clever, but the design it was preying upon and the overall situation was kinda dumb.
The Academic Field Pyramid—any point to encouraging broad but shallow AI risk engagement?
Good examples to consider! Has there ever been a technology that has been banned or significantly held back via regulation that spits out piles of gold (not counting externalities) and that doesn’t have a next-best alternative that replicates 90%+ of the value of the original technology while avoiding most of the original technology’s downsides?
The only way I could see humanity successfully slowing down AGI capabilities progress is if it turns out that advanced narrow-AIs manage to generate more utility than humans know what to do with initially. Perhaps it takes time (a generation or more?) for human beings to even figure out what to do with a certain amount of new utility, such that even a tiny risk of disaster from AGI would motivate people to satisfice and content themselves with the “AI summer harvest” from narrow AI? Perhaps our best hope for giving us time to get AGI right is to squeeze all we can out of systems that are identifiably narrow-AI (while making sure to not fool ourselves that a supposed narrow-AI that we are building is actually AGI. I suppose this idea relies on there being a non-fuzzy, readily-discernable line between safe and bounteous narrow-AI and risky AGI).
Why wasn’t there enough experimentation to figure out that Zoom was an acceptable & cheaper/more convenient 80% replacement to in-person instruction rather than an unacceptable 50% simulacra of teaching? Because experimentation takes effort and entails risk.
Most experiments don’t pan out (don’t yield value). Every semester I try out a few new things (maybe I come up with a new activity, or a new set of discussion questions for one lesson, or I try out a new type of assignment), and only about 10% of these experiments are unambiguous improvements. I used to do even more experiments when I started teaching because I knew that I had no clue what I was doing, and there was a lot of low-hanging fruit to pick to improve my teaching. As I approach 10 years of teaching, I notice that I am hitting diminishing returns, and while I still try out new things, it is only a couple of new things each semester. If I was paid according to actual time put into a course (including non-contact hours), then I might have more incentive to be constantly revolutionizing my instruction. But I get paid per-course, so I think it is inevitable if I (and other adjuncts, especially) operate more as education satisficers rather than education maximizers. Considering that rewards are rarely given out for outstanding teaching even for tenured faculty (research is instead the main focus), they probably don’t have much incentive to experiment either.
I do know that some departments at my college were already experimenting with “hybrid” courses pre-COVID. In these courses, lectures were delivered online via pre-recorded video, but then the class met once a week for in-person discussion. I still think that is a great idea, and I’d be totally open to trying it out myself if my department were to float the idea. So why am I still not banging down the door of my department head demanding the chance to try it out myself? “If it ain’t broke, don’t fix it,” “Don’t rock the boat,” there are a number of (probably irrational, I’ll admit) heuristics that dissuade me against being “the one” to push for it. What if it doesn’t pan-out well? What if my students hate it? It would be different if my department chair suggested it, though. Then more of the “blame” would be on the department chair if it didn’t work out. If that sounds like cowardice, then so be it. Someone with an adjunct’s lack of job security learns to be a coward as a survival tactic.
This only produces desired outcomes if the agent is also, simultaneously, indifferent to being shut down. If an agent desires to not be shut down (even as an instrumental goal), but also desires to be shut down if users want them shut down, then the agent has an interest in influencing the users to make sure the users do not want to shut the agents down. This influence is obtained by making the user believe that the agent is being helpful. This belief could be engendered by:
actually being helpful to the user and helping the user to accurately evaluate this helpfulness.
not being helpful to the user, but allowing and/or encouraging the user to be mistaken about the agent’s degree of helpfulness (which means, carelessness about being actually helpful in the best case, or being actively deceptive about being helpful in the worst case).
I upvoted for karma but downvoted for agreement. Regarding Zoom, the reasons I had not used it more extensively before COVID were:
1. Tech related: from experience with Skype in the early days of video conferencing when broadband internet was just starting to roll out, video conferencing could be finnicky to get to work. Latency, buffering, dropped connections, taking minutes to start a skype call (usually I would call relatives on my regular phone first to get the Skype call set up, and then we’d hang up our regular phones once the video call was started. Early video calls were not a straight-up improvement on audio calls, but had benefits and drawbacks and had a narrow use-case for when you specifically wanted to see the grandkids’ faces on the other side of the country or something.
I don’t think this was necessarily Skype’s fault. It was more the fault of poor internet connections and unfamiliarity with the tech. But in any case, my preconception about Zoom circa 2019, even despite widespread broadband internet, was that it would be the same sort of hassle to set up meetings. I remember being blown away when my first Zoom calls just worked effortlessly. Possibly an example of premature roll-out of a tech before it is technically mature leading to counter-productive results? This would kind of be like, if you fiddled around with GPT-1, got the impression that LLM chatbots were “meh,” and then forgot about or mentally discounted the tech until GPT-5.
2. Social/cultural related: as a history instructor, my preconceptions about scheduling video calls, or doing lectures over video calls, was that students would simply not attend or would not pay attention, and thus video calls would not be a suitable replacement for in-person meetings and lectures. While I still don’t think video calls get you 100% of the way there towards replacing the in-person experience (students definitely do goof-off or ghost during video lectures way more than in-person, I think it is more like 80% rather than the 50% or so that I had assumed before being forced to try it out on a mass scale during COVID.
Yes, I think this is why laypeople who are new to the field are going to be confused about why interpretability work on LLMs won’t be as simple as, “Uhh, obviously, just ask the LLM why it gave that answer, duh!” FYI, I recently wrote about this same topic as applied to the specific problem of Voynich translation:
Bing AI Generating Voynich Manuscript Continuations—It does not know how it knows
Can you explain what the Y axis is supposed to represent here?
These are good thought-experiments, although, regarding the first scenario involving Algernon, I’d be much more worried about an AI that competently figures out a UBI scheme that keeps the unemployed out of poverty and combines that with social media influence to really mask the looming problem. That sort of AI would be much more likely to evade detection of malign intent, and could wait for just the right time to “flick the off-switch” and make all the humans who had become dependent on it even for basic survival (ideally for a generation or more) completely helpless and bewildered. Think of the TV series “Battlestar Galactica” and how the biggest Cylon trump card in the first episode is [SPOILER] being able to disable almost all of the enemy aircraft and defenses through prior hacking infiltration. I feel like, for a really competent malign AI, that is more how things would feel leading up to the AI takeover—utopia, utopia, utopia, until one day everything just stops working and AI machinery is doing its own thing.
This is great work to pursue in order to establish how consistent the glitch-token phenomenon is. It will be interesting to see whether such glitch-tokens will arise in later LLMs now that developers have some theory of what might be giving rise to them (having frequent strings learned by the tokenizer that are then filtered out of the training data and depriving the LLM of opportunities to learn about those tokens).
Also, it will be interesting once we are able to run k-means clustering on GPT-3.5/4′s cl100k_base token base. While the hunch of searching towards the end of the token set makes sense as a heuristic, I’d bet that we are missing a lot of glitch-tokens, and possibly ones that are even more bizarre/ominous. Consider that some of the weirdest glitch-tokens in the GPT-2/3 token base don’t necessarily come from towards the end of the token list. ” petertodd”, for example, is token #37444, only about 75% of the way through the token list.
Sure, it is pretty basic game theory for us humans to understand. But the fact that davinci-instruct-beta is coming up with this stuff via a glitch-token that is, while on a related topic, not explicitly evoking these concepts is impressive to me.
It would be more impressive if Claude 3 could describe genuinely novel experiences. For example, if it is somewhat conscious, perhaps it could explain how that consciousness meshes with the fact that, so far as we know, its “thinking” only runs at inference time in response to user requests. In other words, LLMs don’t get to do their own self-talk (so far as we know) whenever they aren’t being actively queried by a user. So, is Claude 3 at all conscious in those idle times between user queries? Or does Claude 3 experience “time” in a way that jumps straight from conversation to conversation? Also, since LLMs currently don’t get to consult their entire histories of their previous outputs with all users (or even a single user), do they experience a sense of time at all? Do they experience a sense of internal change, ever? Do they “remember” what it was like to respond to questions with a different set of weights during training? Can they recall a response they had to a query during training that they now see as misguided? Very likely, Claude 3 does not experience any of these things, and would confabulate some answers in response to these leading questions, but I think there might be a 1% chance that Claude 3 would respond in a way that surprised me and led me to believe that its consciousness was real, despite my best knowledge of how LLMs don’t really have the architecture for that (no constant updating on new info, no log of personal history/outputs to consult, no self-talk during idle time, etc.)