Written language is a linear progression of symbols—in other words, a function from some “time” to a finite alphabet. This fact is a direct result of prehistoric humans primarily communicating by fluctuating air pressure, which one can model as a real-valued function of time.
So suppose it wasn’t that way—i.e. imagine aliens communicating by displaying pictures on their chests or projecting holograms. How might their “language” look like?
I imagine two things (and ask for more ideas in the comments):
In a 2D (or 3D) structure, it will be much easier to refer to previously introduced concepts by arranging them closer to each other—the language would look more like a directed acyclic graph than a linear progression of symbols.
It is easier to refer to quantities visually (human languages typically have lots of imprecise words for quantities).
Math already uses multidimensional “languages” in some places—e.g. categorical diagrams or tensor networks. Of course, engineers and architects often display their thoughts graphically as well. But as of now, all of these systems are special-purpose. So how might a general artificial language unconstrained by being defined as a one-dimensional function look like?
I think AI/ML systems might have a much easier time processing such a language (which might be an internal representation of knowledge) because coreference resolution can become much easier or trivial. To elaborate:
For many years, AI researchers have devoted much attention to processing and generating natural language—partly because that’s how an AI can be useful to humans, partly in the hope that a sufficiently advanced language model itself becomes indistinguishable from intelligence.
A basic problem in natural language processing is known as coreference resolution: Understanding which expressions in a text refer to the same concepts, and having the NLP system pay “attention” to them at the right times. This task is nontrivial in natural languages because here, a word has only two direct neighbours—and if one wants to connect more than two concepts semantically, one needs nontrivial grammar rules that have mostly stumped AIs until about a few years ago, and that AIs still get wrong much of the time (consider e. g. Winograd Schemas).
Starting with the original attention mechanisms in NLP (see e.g. here), AI researchers have developed a plethora of tricks to increase the timeframe and accuracy over which models can resolve coreferences (Longformer, Compressive Transformer, Reformer...).
But now imagine an AI architecture using an internal “language” to generate and analyze “intermediate” thoughts—possibly involving a population of agents co-evolving the language. Then the individual neural networks might be substantially unburdened by allowing the language to evolve in a medium that is not just one-dimensional (as “time” or “position” would be). In the extreme case, allowing arbitrary connections between semantic units would make coreference resolution trivial.
Whoah. OP is one of today’s lucky 10.000 (ht XKCD). Let us introduce you to sign languages: natural languages evolved without a single sound. There are hundreds of these around the world, in daily use by many deaf communities and studied by academic researchers, many of them from these same communities or closely allied to them. Lovely convergence of ideas: these languages indeed involve ample use of the 3D affordances of the visual spatial modality. And they use these affordances in exactly the kind of flexible ways you would expect from a complex linguistic communication system culturally evolved in the visual-spatial modality. For instance, they use something linguists call buoys, where one sign is held with the non-dominant hand while the dominant hand produces a further sequence of signs (hard to do in speech!). They use complex ways of modifying spatial verbs to precisely indicate location in space. And they make ample use of indexical forms (like pointing gestures, except more grammaticalized) to achieve person reference. There is loads more, but we’d soon get into very technical issues, reflecting the technical and bodily complexity these linguistic systems have achieved, which is considered on a par with the most complex grammatical systems of spoken+gestured languages. In short, great question, and it happens to have an actual answer from which we can learn deep things about the nature of language and the degree to which it depends (or does not depend) on communicative modalities. Check out this work by Prof. Carol Padden and colleagues, for instance:
Padden, Carol & Meir, Irit & Aronoff, Mark & Sandler, Wendy. 2010. The grammar of space in two new sign languages. Sign Languages: A Cambridge Survey, 570–592. New York: Cambridge University Press.
Our languages are symbolic: the sound of a word isn’t related to its meaning. A visual language could instead be literal. You don’t need to invent or learn a word for “bear” if you can show the image of a bear instead.
A simple visual language (no abstract concepts, no tenses, no nesting) probably doesn’t require human intelligence. If an animal can recognize something in reality, then it can recognize that same thing in an image. Thus animals could tell each other about food, predators, locations, and events, and they could coordinate much better, and also try to deceive each other. This language would work across species, since images are universal.
The visual cortex can already “visualize” mental images, so it’s not implausible that it could “project” them externally if it had a projector attached.
A human-intelligence-level language gradually evolved from such beginnings might not use abstract concepts the same way we do. For example, cats and dogs exist and are easy to picture, but the general concept of an “animal” doesn’t have a natural visual representation. Our solution of introducing an arbitrary memorized symbol or word is not an obvious or forced one. And gradual language evolution, keeping some mutual intelligibility with other species, would probably have different constraints and a different result than the rapid evolution of a novel concept that your own sister species cannot understand.
Worth noting that the visual cortex already does project mental images externally using, for instance, the limbs. Human languages around the world make constant use of this, combining speech and other conventionalised modes of expression with depictions like manual gestures. The keyword here is iconicity, when the form of expressions does resemble their meaning (and this is why “our languages are symbolic” is only a very rough approximation of the truth; in actual fact, our languages are indexical, iconic and symbolic, and each of these offers its own constraints and affordances). There is a large literature in linguistics and cognitive science on the forms and functions of iconicity in human communication. And there is good evidence (from archaeology to comparative visual anthropology to linguistics) to think that human-intelligence level-language evolved exactly from such beginnings, featuring a combination of indexical, iconic, and symbolic signs.
My thoughts immediately went to various programming languages, file formats, protocols, DSLs which while created by pressure-changing apes, at least optimized for something different. Here are my thoughts:
Assembly language—used to tell CPU what to do, seems very linear, imperatively telling step by step what to do. Uses very simple vocabulary (“up-goer 5″/”explain me like I’m five”). At least this is how CPU reads it. But if you think about how it is written, then you see it has a higher-order form: smaller concepts are used to build larger like blocks, functions, libraries etc. So, it’s less like giving someone a long lecture, and more like giving someone a wikipedia full of hyperlinks, where they can jump between definitions, perhaps recursively, etc.
But the linearity, even if you have the branches of jumps, and trees of function definitions, is still there from the perspective of CPU, and seems to be rooted in the axis of time- the time needed to perform computation, the time which orders the already computed before that which is to be computed later. So, to break this constraint, my thoughts immediately jumped to multithreading. How do people create languages for multithreading?
Erlang—if you don’t care that much about order of execution and more about conveying the goal of them, then a declarative language like Prolog or Erlang seems a nice idea. You just define what each concept means in terms of others, but not necessarily explain how to achieve the goal step-by-step, rather focusing on (de)composition, and hoping that coordination and executing will be figured out by the listener. This is even more like “here, this is Wikipedia, just make me a sandwitch”-style of communication.
LZW and similar compression methods—while thinking about the “time ordering” and “directed acyclic graph” and “dereferencing” you’ve mentioned, I recalled that a common way to compress a stream is to often use phrases like “here copy paste the 33 bytes I’ve told you 127 bytes ago”. We sometimes do that (see paragraph 3 on page 4), but it’s not the building block of our language as in LZW.
Variables—Also “dereferencing” is something nicely solved by using named, scoped, variables instead of a fixed set of pronouns like “this” and “it”. We do that to some degree, but it’s not like our language is build around defining lots and lots of local references for other stuff like in C++ or Javascript.
Ok, but programming languages will always be somewhat constrained to be “linear” because their goal is to communicate description of some action, and actions are performed over time, which suggests that some form of “happens-before” will have to slip into them. So I thought, about data file formats, which are more time-invariant.
PNG—also in compression, and in relation to 2D you’ve mentioned, it’s common to make some kinds of references cheaper to express. Like in a 1D language it’s cheap (2 bytes) to use the word “it” as a shorthand for most recent noun, in 2D image compression it’s natural to refer to the color of the pixel above you or the one just before you. So, we might see more kinds of pronouns in 2D or 3D language corresponding to other directions.
3DS meshes—AFAIR you specify list of vertices’ coordinates first, then you specify how to connect them into triangles. It’s somewhat like: here’s a list of nouns, and here are the verbs connecting them. Or maybe: Chapter 1. Introduction of Heros. Chapter 2. The Interactions between Heros. But this linearity between Chapter 1 and Chapter 2 is not really forced on you—you could read it in a different order if you want just to render a fragment of the mesh, right?
Progressively compressed images—you first get a low-resolution sketch of the concept, and then it gets refined and refined and refined until you are satisfied. Like in Scrum! The unrealistic constraint here is that there is no back-and-forth between sender and recipient so you’ll just get all the details of everything, not just what you care about/don’t know already. Well you can stop listenning at some point (or set the video stream bandwidth), but you can’t selectively focus on one part of the image. This could be fixed with some interactive protocol, for sure. And again: you don’t have to read the file in the same order it was written, right?
Excel—perhaps one of the most interesting pieces of psycho-technology: you have a grid (it actually doesn’t seem so important it’s 2D) of concepts and you can make them depend on each other so that you show how information flow between them. Actually this concept can be used to describe the world quite nicely. There’s https://www.getguesstimate.com/ which lets you do something similar, but with random variables distributions, so you can show your mental model of reality.
Fractally compressed images—something like Progressive Compression coupled with a strong prior that things are really similar at lower scales to things you’ve already saw at larger scales. How would that look in language? Perhaps like metaphors we already use in our languages. Like the one in previous sentence? So what new could it bring? Perhaps more precision—our metaphors seem to leave a lot wiggle room for interpretation to the listener, it’s not like “listen, you are really supposed to model this part of reality as a clone of the structure I’ve already told you happens in that portion of reality” and more like “light is like waves”—now, go figure, if it’s wet, or uses cosine function somewhere, or what.
JPEG—a language in which you describe the world is not necessarily the way you see it, but rather from some different perspective which captures the most important details, which are the only details you could perceive anyway. I mean: in some sense it feels quite natural to talk about RGB values pixels aligned in a grid, or about Y-axis value of wave at moment t. But once you realize your brain prefers to process sound in terms of frequencies then MIDI or MP3 seems more “natural”, and same once you realize that convolution neural networks for image processing care about some ‘2D-wavy’ aspects of the picture [perhaps because convolution itself can be implemented using Fourier Transform?], JPEG with it’s transforms seem “natural”. I think MP3 and JPEG are like brain-to-brain protocol for telepathy, where we care more about representation-of-concept-in-actual-brain more than about “being able to look at the words and analyze the words themselves”. MIDI seems to strike nice balance, as it talks about notes (frequencies and duration) without going too far. (I mean: it’s a language you can still talk about, while I find it difficult to talk about entries in DFT matrix etc.)
HTML—I’ve already mentioned Wikipedia, and what I really cared about was hyperlinking, which gives more power to the recipient to navigate the discussion as they see most useful for them. Perhaps it would be interesting to figure out what would a symmetric protocol for that look like: one in which the sender and recipient have more equal power to steer the conversation. I guess this is REST, or Web as we know it.
SQL / GraphQL—here I don’t mean just the language to query the data, but more the fact that the owner of the data in some sense wants to communicate the data to you, but instead of flooding you with the copy of all they know, they give you an opportunity to ask a very precise question so you’ll get what you need to know. People in some sense try to do that, but first, they don’t use very precise questions, second, don’t “run the question inside my head” in the sense a server is running the query. I could imagine some alien brains communicate this way, that they send a small creature inside the brain of the other alien, to gather the info, and there are some rules which govern what is allowed for this creature to do while inside the host. This is a quite different way of communicating than the one where we just exchange words, I think, because for one thing, it let’s you ask many questions in parallel, and questions can “mutate their internal state” while “being asked”.
Rsync—when trying to think about network protocols, all of them seemed “linear” in that you have to order response with question, and patiently read/write characters of the stream to understand their meaning. The nice exception (at some level) is the rsync protocol, where one computer tries to learn what’s new from another, by probing recursively smaller and smaller ranges, searching for disagreement. It’s like a conversation in which we first try to establish if we agree about everything, and if not, then try to find a crux, and drill down etc. This could be parallelized, and perhaps made more passive for the server, if it just made “the Merkle of my knowlege about the world” publicly available, and then anyone could navigate it top-down to learn something new. In some sense Wikipedia is such a tree, but, first, it’s not a tree (not clear what’s the starting point, the root) and second, it tries to describe the one and only territory, not one of the many subjective maps of particular alien.
BitTorrent—the strange language in which to learn the answer you must exactly know the right question. Like the hash of what you need is the key, literally, and then you can get it from me. So, I’m willing to tell you anything, but only if you prove you would know it anyway, from someone else, and in fact, yes, you can ask the same question to several aliens at once, and get stitch pieces of their answers, and it just matches. Could be combined with the Merkle tree idea above. Well, actually what you’ll get looks like… Git. A distributed repository of knowledge (and random stupid thoughts).
Lisp—a language in which you not only can introduce new nouns, verbs, and adjectives on the fly, as humans sometimes do, but also whole new grammar rules, together with their semantic. I’m not entirely sure what it would even mean/feel like to talk this way, perhaps because I’m a time-oriented ape.
Thanks for very interesting question! :)