There is a globe in your LLM

Gurnee & Tegmark (2023) trained linear probes to take an LLM’s internal activation on a landmark’s name (e.g. “The London Eye”), and predict the landmark’s longitude and latitude. The results look like this:[1]

Two angles of true world atlas, with predicted atlas hovering above. True locations are red points; predicted locations are blue, in a slightly raised plane, linked to the corresponding true location by a grey line.

So LLMs (or at least, Llama 2, which they used for this experiment) contain a pretty good linear representation of an atlas.

Sometimes, like when thinking about distances, a globe is more useful than an atlas. Do models use the globe representation? To find out, we can train probes to predict the (x,y,z) coordinates of landmarks, viewed as living in 3D space. Here are the results:

Left: Europe faces us. Middle: The Pacific faces us. Right: India faces us. Predicted points are scaled up by 1.5x for visualization purposes.

You can rotate the plot yourself (and see the code used to generate it) in this notebook.

The average Euclidean (i.e. “through-the-Earth”) distance between the true and predicted locations is 535 miles: roughly the distance from Paris to Berlin, from LA to Portland, or from Mumbai to New Delhi.

So LLMs contain a pretty good linear representation of a globe!

I don’t know if there’s any useful takeaway from this. I just thought it was cool.

  1. ^

    These plots are my reproduction. I used Llama-3-8B-Instruct, and Gurnee & Tegmark’s dataset. For ~7,000 of the 40,000 landmarks, the model did not correctly answer the question “which country is {place} in?”, so I removed these from the dataset. I fed landmark names into the model with no prefix, and cached final-token residual-stream activations 70% of the way through the model. All plots and values in this post were generated using a held-out 20% test split.