I’m trying to prevent doom from AI. Currently trying to become sufficiently good at alignment research. Feel free to DM for meeting requests.
Towards_Keeperhood
Nice post!
My key takeaway: “A system is aligned to human values if it tends to generate optimized-looking stuff which is aligned to human values.”
I think this is useful progress. In particular it’s good to try to aim for the AI to produce some particular result in the world, rather than trying to make the AI have some goal—it grounds you in the thing you actually care about in the end.
I’d say the ”… aligned to human values part” is still underspecified (and I think you at least partially agree):
“aligned”: how does the ontology translation between the representation of the “generated optimized-looking stuff” and the representation of human values look like?
“human values”
I think your model of humans is too simplistic. E.g. at the very least it’s lacking a distinction like between “ego-syntonic” and “voluntary” as in this post, though I’d probably want a even significantly more detailed model. Also one might need different models for very smart and reflective people than for most people.
We haven’t described value extrapolation.
(Or from an alternative perspective, our model of humans doesn’t identify their relevant metapreferences (which probably no human knows fully explicitly, and for some/many humans it they might not be really well defined).)
Positive reinforcement for first trying to better understand the problem before running off and trying to solve it! I think that’s the way to make progress, and I’d encourage others to continue work on more precisely defining the problem, and in particular on getting better models of human cognition to identify how we might be able to rebind the “human values” concept to a better model of what’s happening in human minds.
Btw, I’d have put the corrigibility section into a separate post, it’s not nearly up to the standards of the rest of this post.
To set expectations: this post will not discuss …
Maybe you want to add here that this is not meant to be an overview of alignment difficulties, or an explanation for why alignment is hard.
Agree on that people focus a bit too much on scheming. It might be good for some people to think a bit more about the other failure modes you described, but the main thing that needs doing is very smart people making progress towards building an aligned AI, not defending against particular failure modes. (However, most people probably cannot usefully contribute to that, so maybe focusing on failure modes is still good for most people. Only that in any case there’s the problem that people will find proposals that very likely don’t actually work but which people can rather believe in that they work, and thereby making an AI stop a bit less likely.)
In general, I wish more people would make posts about books without feeling the need to do boring parts they are uninterested in (summarizing and reviewing) and more just discussing the ideas they found valuable. I think this would lower the friction for such posts, resulting in more of them. I often wind up finding such thoughts and comments about non-fiction works by LWers pretty valuable. I have more of these if people are interested.
I liked this post, thanks and positive reinforcement. In case you didn’t already post your other book notes, just letting you know I’d be interested.
Do we have a sense for how much of the orca brain is specialized for sonar?
I don’t know.
But evolution slides functions around on the cortical surface, and (Claude tells me) association areas like the prefrontal cortex are particularly prone to this.
It’s particularly bad for cetaceans. Their functional mapping looks completely different.
Thanks. Yep I agree with you, some elaboration:
(This comment assumes you at least read the basic summary of my project (or watched the intro video).)
I know of Earth Species Project (ESP) and CETI (though I only read 2 publications of ESP and none of CETI).
I don’t expect them to succeed in something equivalent to decoding orca language to an extent that we could communicate with them almost as richly as they communicate among each other. (Though like, if long-range sperm whales signals are a lot simpler they might be easier to decode.)
From what I’ve seen, they are mostly trying to throw AI at stuff and hoping somehow they will understand stuff, without having a clear plan how to actually decode it. The AI stuff might look advanced but it’s sorta obvious things to try and I think it’s unlikely to work very well, though still glad they are trying this.
If you look at orca vocalizations, it looks complex and alien. The patterns we can currently recognize there look very different from what we’d be able to see in an unknown human language. The embedding mapping might be useful if we had to decode a human language, and maybe we still learn some useful stuff from it, but for orca language we don’t even know what their analog of words and sentences are and maybe their language works even somewhat differently (though I’d guess if they are smarter than humans there’s probably going to be something like words and sentences—but they might be encoded differently in the signals than in human languages).
Though definitely plausible that AI can help significantly with decoding animal languages, but I think it also needs forming deep understanding of some things and I think it’s likely too hard for ESP to succeed anytime soon, though like possible a supergenius could do it in a few years, but it would be really impressive.
My approach may fail, especially if orcas aren’t at least roughly human-level smart, but it has the advantage that we can show orcas precise context of what some words and sentences mean, whereas we basically have almost no context data on recordings of orca vocalizations, so it’s easier for them to see what some signals mean than for humans to infer what orca vocalizations mean. (Even if we had a lot of video datasets with vocalizations (which we don’t), it’s still a lot less context information about what they are talking about, than if they could show us images to indicate what they would talk about.) Of course humans have more research experience and better tools for decoding signals, but it doesn’t look to me like anyone is currently remotely close, and my approach is much quicker to try and might have at least a decent chance. (I mean it nonzero worked with bottlenose dolphins (in terms of grammar better than with great apes), though I’d be a lot more ambitious.)
Of course, the language I create will also be alien for orcas, but I think if they are good enough at abstract pattern recognition they might still be able to learn it.
Perhaps also not what you’re looking for, but you could check out the google hashcode archive (here’s an example problem). I never participated though, so don’t know whether they would make that great tests. But it seems to me like general ad-hoc problem solving capabilities are more useful in hashcode than in other competetive programming competitions.
GPT4 summary: “Google Hash Code problems are real-world optimization and algorithmic challenges that require participants to design efficient solutions for large-scale scenarios. These problems are typically open-ended and focus on finding the best possible solution within given constraints, rather than exact correctness.”
Maybe not what you’re looking for because it’s not like one hard problem but more like many problems in a row, and generally I don’t really know whether they are difficult enough, but you could (have someone) look into Exit games. Those are basically like escape rooms to go. I’d filter for Age16+ to hopefully filter for the hard ones, though maybe you’d want to separately look up which are particularly hard.
I did one or two when I was like 15 or 16 years old, and recently remembered them and I want to try some more for fun (and maybe also introspection), though I didn’t get around to it yet. I think they are relatively ad-hoc puzzles though as with basically anything you can of course train to get good at Exit games in particular by practicing. (It’s possible that I totally overestimate the difficulty and they are actually more boring than I expect.)
(Btw, probably even less applicable to what you are looking for, but CondingEscape is also really fun. Especially the “Curse of the five warriors” is good.)
I hope I will get around to rereading the post and edit this comment to write a proper review, but I’m pretty busy, so in case I don’t I now leave this very shitty review here.
I think this is probably my favorite post from 2023. Read the post summary to see what it’s about.
I don’t remember a lot of the details from the post and so am not sure whether I agree with everything, but what I can say is:
When I read it several months ago, it seemed to me like an amazingly good explanation for why and how humans fall for motivated reasoning.
The concept of valence turned out very useful for explaining some of my thought processes, e.g. when I’m daydreaming something and asking myself why, then for the few cases where I checked it was always something that falls into “the thought has high valence”—like e.g. imagining some situation where I said something that makes me look smart.
Another thought, though I don’t actually have any experience with this, but mostly doing attentive silent listening/observing might also be useful for learning how the other person is doing research.
Like, if it seems boring to just observe and occasionally say sth, try to better predict how the person will think or so.
The mein reason I’m interested in orcas is because they have 43 billion cortical neurons, whereas the 2 land animals with the most cortical neurons (where we have have optical-fractionator measurements) are humans and chimpanzees with 21 billion and 7.4 billion respectively. See: https://en.wikipedia.org/wiki/List_of_animals_by_number_of_neurons#Forebrain_(cerebrum_or_pallium)_only
Pilot whales is the other species I’d consider for experiments—they have 37.2 billion cortical neurons.
For sperm whales we don’t have data on neuron densities (though they do have the biggest brains). I’d guess they are not quite as smart though because they can dive long and they AFAIK don’t use very collaborative hunting techniques.
Cool, thanks!
Cool, thanks, that was useful.
(I’m creating a language for communicating with orcas, so the phonemes will be relatively unpractical for humans. Otherwise the main criteria are simple parsing structure and easy learnability. (It doesn’t need to be super perfect—the perhaps bigger challenge is to figure out how to teach abstract concepts without being able to bootstrap from an existing language.) Maybe I’ll eventually create a great rationalist language for thinking effectively, but not right now.)
Is there some resource where I can quickly learn the basics of the Esperanto composition system? Somewhere I can see the main base dimensions/concepts?
I’d also be interested in anything you think was implemented particularly well in a (con)language.
(Also happy to learn from you rambling. Feel free to book a call: https://calendly.com/simon-skade/30min )
Thanks!
But most likely, this will all be irrelevant for orcas. Their languages may be regular or irregular, with fixed or random word order, or maybe with some categories that do not exist in human languages.
Yeah I was not asking because of decoding orca language but because I want inspiration for how to create the grammar for the language I’ll construct. Esparanto/Ido also because I’m interested about how well word-compositonality is structured there and whether it is a decent attempt at outlining the basic concepts where other concepts are composites of.
Currently we basically don’t have any datasets where it’s labelled what orca says what. When I listen to recordings, I cannot distinguish voices, though idk it’s possible that people who listened a lot more can. I think just unsupervised voice clustering would probably not work very accurately. I’d guess it’s probably possible to get data on who said what by using an array of hydrophones to infer the location of the sound, but we need very accurate position inference because different orcas are often just 1-10m distance from each other, and for this we might need to get/infer decent estimates of how water temperature varies by depth, and generally there have not yet been attempts to get high precision through this method. (It’s definitely harder in water than in air.)
Yeah basically I initially also had rough thoughts into this direction, but I think the create-and-teach language way is probably a lot faster.
I think the earth species project is trying to use AI to decode animal communication, though they don’t focus on orcas in particular, but many species including e.g. beluga whales. Didn’t look into it a lot but seems possible I could do sth like this in a smarter and more promising way, but probably still would take long.
Thanks for your thoughts!
I don’t know what you’d consider enough recordings, and I don’t know how much decent data we have.
I think the biggest datasets for orca vocalizations are the orchive and the orcasound archive. I think they each are multiple terabytes big (from audio recordings) but I think most of it (80-99.9% (?)) is probably crap where there might just be a brief very faint mammal vocalization in the distance.
We also don’t have a way to see which orca said what.Also orcas from different regions have different languages, and orcas from different pods different dialects.
I currently think the decoding path would be slower, and yeah the decoding part would involve AI but I feel like people just try to use AI somehow without a clear plan, but perhaps not you.
What approach did you imagine?In case you’re interested in few high-quality data (but still without annotations): https://orcasound.net/data/product/biophony/SRKW/bouts/
Thanks.
I think LTFF would take way too long to get back to me though. (Also they might be too busy to engage deeply enough to get past the “seems crazy” barrier and see it’s at least worth trying.)
Also btw I mostly included this in case someone with significant amounts of money reads this, not because I want to scrap it together from small donations. I expect higher chances of getting funding come from me reaching out to 2-3 people I know (after I know more about how much money I need), but this is also decently likely to fail. If this fails I’ll maybe try Manifund, but would guess I don’t have good chances there either, but idk.
Actually out of curiosity, why 4x? (And what exactly do you mean by “2x larger”?) (And is this for a naive algorithm which can be improved upon or a tight constraint?)
Thanks for pointing that out! I will tell my friends to make sure they actually get good data for the metabolic cost and not just use cortical neuron count as proxy if they cannot find something good.
(Or is there also another point you wanted to make?)And yeah it’s actually also an argument for why orcas might be less intelligent (if they sorta use their neurons less often). Thanks.
My guess is that there probably aren’t a lot of simple mutations which just increase intelligence without increasing cortical neuron count. (Though probably simple mutations can shift the balance between different sub-dimensions of intelligence as constrained through cortical neuron count.) (Also of course any particular species has a lot of deleterious mutations going around and getting rid of those may often just increase intelligence, but I’m talking about intelligence-increasing changes to the base genome.)
But there could be complex adaptations that are very important for abstract reasoning. Metacognition and language are the main ones that come to mind.
So even if the experiment my friends to will show that the number of cortical neurons is a strong indicator, it could still be that humans were just one of the rare cases which evolved a relevant complex adaptation. But it would be significant evidence for orcas being smarter.
Wait so do we get a refund if we decide we don’t want to do the course, or if we manage to complete the course?
Like is it a refund in the “get your money back if you don’t like it” sense, or is it incentive to not sign up and then not complete the course?