Orca communication project—seeking feedback (and collaborators)
TLDR
It is currently plausible (35%) to me that average orcas have at least as high potential for being great scientists as the greatest human scientists, modulo their motivation for doing science[1]. To see why I think this, see my LW question (including the considerations in my answer).
I now want to test how intelligent orcas are as soon as possible. For this, I am creating a very easily-learnable language, which is easy for orcas to vocalize and ideally also for humans (after some practice), along with a plan for how to teach the language to orcas.
I hope to start teaching orcas the language around the start of January, but I still need to plan where to best do this, buy equipment, and find at least one person who can operate a boat who is willing to do those experiments with me. (EDIT: I now noticed that interacting with orcas is relatively strictly illegal basically everywhere without permit. I don’t think my project has great chances to get permitted in most places, and generally need to reconsider how to best proceed and it might take longer because I need a permit etc.)
Project plan summary
For a rough summary of how I’d go about teaching orcas language, you can either watch the overview video or read the text summary in the appendix below.
I expect we will already get quite useful evidence on orca intelligence from seeing how fast and accurately they can learn the language and the abstract concepts we’re trying to convey. But I will probably also try to create some other tests for testing the orcas’ intelligence, but haven’t thought about it yet.
Video series showing how one might get started teaching a language to orcas
To get a better impression how the project might look like, see my other videos in my video series.
Seeking feedback and collaborators
I think other smart people could probably contribute to this project, and in particular I think experience of people who know much about linguistics could be useful.
If you have feedback or ideas how you would design some part of the language, please comment. Also lmk of resources (e.g. useful wikipedia pages...) that might be useful to know for my project. (Also feel free to book a call if you think you have useful experience to share.)
If you are maybe interested in working with me on this, please PM me or book a call. (I currently don’t have funding but haven’t tried getting some yet so please lmk in either case whether you’d contribute for fun/impact or if you got paid.)
I’m interested in learning about how different languages are structured, especially Esparanto/Ido and Lojban[2], but also other languages that are significantly different from central european languages[3]. If you know the grammar of such a language well, it would be very interesting for me if you would book a call to tell me about it (or explain it in the comments or link a good resource).
Seeking funding
I will probably need at least $2.5k for a waterproof monitor, $1k-$5k for flights back and forth to where I can study orcas, and perhaps money for renting a boat and living somewhere[4].
On top of that ~$2.5k for delegating creating the slides for teaching orcas once it’s clarified how we want them to look like (aka for transforming sketches into something useable).
And then possibly more funding for paying researchers (maybe including myself because I currently just work for free and don’t have super long runway) for figuring out how to best create the language and how to teach concepts to orcas. But I’m not yet sure whether I can find competent people who want to work on this over the next weeks.
So I don’t quite know yet how much money I will need, but if you’d be interested in funding this then please lmk already.
Appendix: Text summary of project
The plan is to design a (vocal) language that’s easy for orcas to learn (only using sounds that orcas can easily imitate), and then to teach orcas that language by conveying the meaning of the words/sentences through showing them pictures and videos (on a waterproof screen which we put under water but in a way it’s still mounted to the boat).
The language won’t be super similar to a human language, maybe more like halfway between human language and logic programming.
I’m going to try to have clear sentence parsing structure (where no parsing order needs to be inferred from semantic context), and have a unique grammatical structure for expressing a sentence[5]. Also mapping semantic similarities to phonetic similarities to make words lower complexity to learn, like e.g. having the same prefix for words that describe particular body parts. Obviously the language should have no synonyms and no same label/word for different concepts.
As example for how one could start teaching the language:
Show images of orcas along with the word for “orca” to teach the word for “orca”. Do the same for “human”, “fish”, “shark”, “seal”.
Teach the word “hunt” by showing multiple things like:
Show video of orcas hunting fish along with sentence for “orca hunt fish”
Analogously for orcas hunting a seal.
Perhaps also show humans fishing and describe it as “human hunt fish”.
Show another video, e.g. a shark hunting a seal, and make some tone that’s supposed to indicate that the orcas should answer, and when they correctly combine the words “shark hunt seal” make some tone indicating that they got it correct (or give them fish initially).
We could teach numbers like “three” by showing pictures of 3 orcas/fish/humans/stones/whatever. Then we can teach addition by inventing words for “plus” and “equals” and then showing/saying lots of true equations like “2+3=5″, and then text whether orcas can notice and complete the pattern “3+1=<Tone indicating expected orca response>”.
- ^
Aka if orcas had similar access to education and information as great human scientists, 35% that most orcas could be superscientists if they also were so obsessed with scientific problems as e.g. Einstein (though most are likely not that obsessed (though they might still try hard for instrumental reasons like preventing extinction)).
- ^
In case you were going to comment, yep I also know about toki pona and I’m going to learn that one.
- ^
E.g. I heard chinese/Mandarin doesn’t have articles, and I’m generally interested in such grammatical differences.
- ^
Perhaps also for paying someone (who can drive a boat) to do the experiments with me, though if I don’t get funding I will try to find someone who does it for free, but not sure if that will succeed.
- ^
Aka I don’t want there to be multiple options for how to assemble some words to express a particular idea, e.g. not the option of using either active or passive voice, or e.g. no two options like “Simon’s phone” or “the phone of Simon”.
I don’t think it will help you to communicate with orcas, but okay.
Esperanto/Ido are more regular that natural languages, simply because languages gradually collect things that are not strictly necessary, such as synonyms, different declinations for different classes of words, or taking one word from one language and then another related word from a different language. For example, in English, compare the etymologies of “see” and “visible”. But the concepts are related, so wouldn’t it be easier to just say “see-able” instead? If you remove these kinds of irregularities (each of them sounds like not a big deal, but those “not big deals” accumulate)… you end up with a language that is 10x easier to learn and remember, while being able to express the same concepts. But it’s essentially still the same thing, only simpler.
I am less familiar with Lojban, but I think the original idea was to make it more precise, kinda like a computer language. But the actual design decisions seem to me more like “Hollywood rationality” or “cargo cult”; making yourself superficially sound more like a computer does not necessarily give you the computer-like clarity or efficiency. For example, all nouns have to be exactly 5 letters long. Uhm, interesting, but what improvement exactly do you think is achieved by that? Or, the original version required you to specify all parameters for words, for example you couldn’t say “go” without specifying where from, where to, by what means, through what, and when (or something like that, maybe I got some of the parameters wrong), in given order. Uhm, interesting, but what if you do not want to specify some of those things; like, if you translate from English to Lojban, and the original text did not contain that information? So you say things like “I am going from unspecified to school through unspecified by unspecified at unspecified”. I guess it is nice to be reminded what exactly is unspecified, but if you talk like this all the time, it becomes pretty annoying. So the language was updated to contain something like prepositions, but reinvented badly—instead of specifying the relation, they specify the numeric order of the parameter—so the sentence now sounds like “I am going #2 school”, because the #2 parameter for “to go” is “where do you go to” (but for a different verb, the destination could be the #1 or #3 parameter, so you need to remember the exact order of the parameters for each verb separately). Ironically, if we follow the analogy with the programming languages, what we would need here is the named parameters from Python. (But that is basically reinventing prepositions.) And so on; it seems to me that the language design contains many ideas that sound impressive, but the actual use… uhm, I am not sure whether someone actually uses the language, so you would have to ask those.
But most likely, this will all be irrelevant for orcas. Their languages may be regular or irregular, with fixed or random word order, or maybe with some categories that do not exist in human languages.
(Off topic, but I like your critique here and want to point you at https://www.lesswrong.com/posts/7RFC74otGcZifXpec/the-possible-shared-craft-of-deliberate-lexicogenesis just in case you’re interested.)
Thanks!
Yeah I was not asking because of decoding orca language but because I want inspiration for how to create the grammar for the language I’ll construct. Esparanto/Ido also because I’m interested about how well word-compositonality is structured there and whether it is a decent attempt at outlining the basic concepts where other concepts are composites of.
I think the hypothesis of high orca intelligence is interesting and plausible. I am all in favor of low-cost low-risk long shot experiments to learn more about other forms of intelligence, whether they end up instrumentally useful to humans or not. Personally I’d be curious just to know whether this kind of attempt is even able to catch the interest of orcas, and if so, to what degree. Even if you just stuck to simple two and three word declarative sentences, it would be interesting to see if they intuit the categories of noun and verb, if they can learn the referent of a word from a picture or video without acoustic or olfactory or stereo visual data.
You mention languages that don’t have articles, but keep in mind this is not rare. Most East Asian, Slavic, and Bantu languages don’t; Latin and Sanskrit don’t/didn’t. Though, Mandarin (the only one of these I know even a little) does use demonstrative adjectives like “this one” and “that one,” and makes extensive use of measure words for some of the same purposes as English uses articles. If it’s not necessary among humans, it shouldn’t be necessary for other mind types in general.
Another factor to consider is that some languages, like Mandarin, have no word modifications for case/number/gender/tense/perspective markers; these are done with extra words where necessarily and not at all otherwise. Probably a feature you want to have when trying to teach with few examples.
When looking at where to do this, consider that in many/most places, orcas are protected species with rules about approaching them and interfering with their natural behavior/environment.
Edit to add: I think if any project like this succeeds, what you’ll be teaching is not so much a language as a pidgin. Whether or not it can become a creole, whether we can learn to incorporate orca vocabulary and grammar to find a medium of communication both species can reproduce and interpret, those are much bigger questions and longer term projects.
Aside: Do we have enough recordings of orca calls to train an LLM on them?
Thanks for your thoughts!
I don’t know what you’d consider enough recordings, and I don’t know how much decent data we have.
I think the biggest datasets for orca vocalizations are the orchive and the orcasound archive. I think they each are multiple terabytes big (from audio recordings) but I think most of it (80-99.9% (?)) is probably crap where there might just be a brief very faint mammal vocalization in the distance.
We also don’t have a way to see which orca said what.
Also orcas from different regions have different languages, and orcas from different pods different dialects.
I currently think the decoding path would be slower, and yeah the decoding part would involve AI but I feel like people just try to use AI somehow without a clear plan, but perhaps not you.
What approach did you imagine?
In case you’re interested in few high-quality data (but still without annotations): https://orcasound.net/data/product/biophony/SRKW/bouts/
I have nowhere near the technical capability to have anything like a clear plan, and your response is basically what I expected. I was just curious. Seems like it could be another cheap “Who knows? Let’s see what happens” thing to try, with little to lose when it doesn’t help anyone with anything. Still, can we distinguish individuals in unlabeled recordings? Can we learn about meaning and grammar (or its equivalent) based in part on differences between languages and dialects?
At root my thought process amounted to: we have a technology that learns complex structures including languages from data without the benefit of the structural predispositions of human brains. If we could get a good enough corpus of data, it can also learn things other than human languages, and find approximate mappings to human languages. I assumed we wouldn’t have such data in this case. That’s as far as I got before I posted.
Currently we basically don’t have any datasets where it’s labelled what orca says what. When I listen to recordings, I cannot distinguish voices, though idk it’s possible that people who listened a lot more can. I think just unsupervised voice clustering would probably not work very accurately. I’d guess it’s probably possible to get data on who said what by using an array of hydrophones to infer the location of the sound, but we need very accurate position inference because different orcas are often just 1-10m distance from each other, and for this we might need to get/infer decent estimates of how water temperature varies by depth, and generally there have not yet been attempts to get high precision through this method. (It’s definitely harder in water than in air.)
Yeah basically I initially also had rough thoughts into this direction, but I think the create-and-teach language way is probably a lot faster.
I think the earth species project is trying to use AI to decode animal communication, though they don’t focus on orcas in particular, but many species including e.g. beluga whales. Didn’t look into it a lot but seems possible I could do sth like this in a smarter and more promising way, but probably still would take long.
I’m less confident than you are about your opening claim, but I do think it’s quite likely that we can figure out how to communicate with orcas. Kudos for just doing things.
I’m not sure how it would fit with their mission, but maybe there’s a way you could get funding from EA Funds. It doesn’t sound like you need a lot of money.
Thanks.
I think LTFF would take way too long to get back to me though. (Also they might be too busy to engage deeply enough to get past the “seems crazy” barrier and see it’s at least worth trying.)
Also btw I mostly included this in case someone with significant amounts of money reads this, not because I want to scrap it together from small donations. I expect higher chances of getting funding come from me reaching out to 2-3 people I know (after I know more about how much money I need), but this is also decently likely to fail. If this fails I’ll maybe try Manifund, but would guess I don’t have good chances there either, but idk.