An eccentric dreamer in search of truth and happiness for all. Formerly posted on Felicifia back in the day under the same name. Been a member of Less Wrong and involved in Effective Altruism since roughly 2013.
Darklight
Another thought I just had was, could it be that ChatGPT, because it’s trained to be such a people pleaser, is losing intentionally to make the user happy?
Have you tried telling it to actually try to win? Probably won’t make a difference, but it seems like a really easy thing to rule out.
Also, quickly looking into how LLM token sampling works nowadays, you may also need to set the parameters top_p to 0, and top_k to 1 to get it to actually function like argmax. Looks like these can only be set through the API if you’re using ChatGPT or similar proprietary LLMs. Maybe I’ll try experimenting with this when I find the time, if nothing else to rule out the possibility of such a seemingly obvious thing being missed.
I’ve always wondered with these kinds of weird apparent trivial flaws in LLM behaviour if it doesn’t have something to do with the way the next token is usually randomly sampled from the softmax multinomial distribution rather than taking the argmax (most likely) of the probabilities. Does anyone know if reducing the temperature parameter to zero so that it’s effectively the argmax changes things like this at all?
p = (n^c * (c + 1)) / (2^c * n)
As far as I know, this is unpublished in the literature. It’s a pretty obscure use case, so that’s not surprising. I have doubts I’ll ever get around to publishing the paper I wanted to write that uses this in an activation function to replace softmax in neural nets, so it probably doesn’t matter much if I show it here.
So, my main idea is that the principle of maximum entropy aka the principle of indifference suggests a prior of 1/n where n is the number of possibilities or classes. P x 2 − 1 leads to p = 0.5 for c = 0. What I want is for c = 0 to lead to p = 1/n rather than 0.5, so that it works in the multiclass cases where n is greater than 2.
Correlation space is between −1 and 1, with 1 being the same (definitely true), −1 being the opposite (definitely false), and 0 being orthogonal (very uncertain). I had the idea that you could assume maximum uncertainty to be 0 in correlation space, and 1/n (the uniform distribution) in probability space.
I tried asking ChatGPT, Gemini, and Claude to come up with a formula that converts between correlation space to probability space while preserving the relationship 0 = 1/n. I came up with such a formula a while back, so I figure it shouldn’t be hard. They all offered formulas, all of which were shown to be very much wrong when I actually graphed them to check.
I was not aware of these. Thanks!
Thanks for the clarifications. My naive estimate is obviously just a simplistic ballpark figure using some rough approximations, so I appreciate adding some precision.
Also, even if we can train and run a model the size of the human brain, it would still be many orders of magnitude less energy efficient than an actual brain. Human brains use barely 20 watts. This hypothetical GPU brain would require enormous data centres of power, and each H100 GPU uses 700 watts alone.
I’ve been looking at the numbers with regards to how many GPUs it would take to train a model with as many parameters as the human brain has synapses. The human brain has 100 trillion synapses, and they are sparse and very efficiently connected. A regular AI model fully connects every neuron in a given layer to every neuron in the previous layer, so that would be less efficient.
The average H100 has 80 GB of VRAM, so assuming that each parameter is 32 bits, then you have about 20 billion per GPU. So, you’d need 10,000 GPUs to fit a single instance of a human brain in RAM, maybe. If you assume inefficiencies and need to have data in memory as well you could ballpark another order of magnitude so 100,000 might be needed.
For comparison, it’s widely believed that OpenAI trained GPT4 on about 10,000 A100s that Microsoft let them use from their Azure supercomputer, most likely the one listed as third most powerful in the world by the Top500 list.
Recently though, Microsoft and Meta have both moved to acquire more GPUs that put them in the 100,000 range, and Elon Musk’s X.ai recently managed to get a 100,000 H100 GPU supercomputer online in Memphis.
So, in theory at least, we are nearly at the point where they can train a human brain sized model in terms of memory. However, keep in mind that training such a model would take a ton of compute time. I haven’t done to calculations yet for FLOPS so I don’t know if it’s feasible yet.
Just some quick back of the envelope analysis.
I ran out of the usage limit for GPT-4o (seems to just be 10 prompts every 5 hours) and it switched to GPT-4o-mini. I tried asking it the Alpha Omega question and it made some math nonsense up, so it seems like the model matters for this for some reason.
So, a while back I came up with an obscure idea I called the Alpha Omega Theorem and posted it on the Less Wrong forums. Given how there’s only one post about it, it shouldn’t be something that LLMs would know about. So in the past, I’d ask them “What is the Alpha Omega Theorem?”, and they’d always make up some nonsense about a mathematical theory that doesn’t actually exist. More recently, Google Gemini and Microsoft Bing Chat would use search to find my post and use that as the basis for their explanation. However, I only have the free version of ChatGPT and Claude, so they don’t have access to the Internet and would make stuff up.
A couple days ago I tried the question on ChatGPT again, and GPT-4o managed to correctly say that there isn’t a widely known concept of that name in math or science, and basically said it didn’t know. Claude still makes up a nonsensical math theory. I also today tried telling Google Gemini not to use search, and it also said it did not know rather than making stuff up.
I’m actually pretty surprised by this. Looks like OpenAI and Google figured out how to reduce hallucinations somehow.
I’m wondering what people’s opinions are on how urgent alignment work is. I’m a former ML scientist who previously worked at Maluuba and Huawei Canada, but switched industries into game development, at least in part to avoid contributing to AI capabilities research. I tried earlier to interview with FAR and Generally Intelligent, but didn’t get in. I’ve also done some cursory independent AI safety research in interpretability and game theoretic ideas my spare time, though nothing interesting enough to publish yet.
My wife also recently had a baby, and caring for him is a substantial time sink, especially for the next year until daycare starts. Is it worth considering things like hiring a nanny, if it’ll free me up to actually do more AI safety research? I’m uncertain if I can realistically contribute to the field, but I also feel like AGI could potentially be coming very soon, and maybe I should make the effort just in case it makes some meaningful difference.
Thanks for the reply!
So, the main issue I’m finding with putting them all into one proposal is that there’s a 1000 character limit on the main summary section where you describe the project, and I cannot figure out how to cram multiple ideas into that 1000 characters without seriously compromising the quality of my explanations for each.
I’m not sure if exceeding that character limit will get my proposal thrown out without being looked at though, so I hesitate to try that. Any thoughts?
I already tried discussing a very similar concept I call Superrational Signalling in this post. It got almost no attention, and I have doubts that Less Wrong is receptive to such ideas.
I also tried actually programming a Game Theoretic simulation to try to test the idea, which you can find here, along with code and explanation. Haven’t gotten around to making a full post about it though (just a shortform).
So, I have three very distinct ideas for projects that I’m thinking about applying to the Long Term Future Fund for. Does anyone happen to know if it’s better to try to fit them all into one application, or split them into three separate applications?
Recently I tried out an experiment using the code from the Geometry of Truth paper to try to see if using simple label words like “true” and “false” could substitute for the datasets used to create truth probes. I also tried out a truth probe algorithm based on classifying with the higher cosine similarity to the mean vectors.
Initial results seemed to suggest that the label word vectors were sorta acceptable, albeit not nearly as good (around 70% accurate rather than 95%+ like with the datasets). However, testing on harder test sets showed much worse accuracy (sometimes below chance, somehow). So I can probably conclude that the label word vectors alone aren’t sufficient for a good truth probe.
Interestingly, the cosine similarity approach worked almost identically well as the mass mean (aka difference in means) approach used in the paper. Unlike the mass mean approach though, the cosine similarity approach can be extended to a multi-class situation. Though, logistic regression can also be extended similarly, so it may not be particularly useful either, and I’m not sure there’s even a use case for a multi-class probe.
Anyways, I just thought I’d write up the results here in the unlikely event someone finds this kind of negative result as useful information.
Update: I made an interactive webpage where you can run the simulation and experiment with a different payoff matrix and changes to various other parameters.
This sounds rather like the competing political economic theories of classical liberalism and Marxism to me. Both of these intellectual traditions carry a lot of complicated baggage that can be hard to disentangle from the underlying principles, but you seem to have a done a pretty good job of distilling the relevant ideas in a relatively apolitical manner.
That being said, I don’t think it’s necessary for these two explanations for wealth inequality to be mutually exclusive. Some wealth could be accumulated through “the means of production” as you call it, or (as I’d rather describe it to avoid confusing it with the classical economic and Marxist meaning) “making useful things for others and getting fair value in exchange”.
Other wealth could also, at the same time, be accumulated through exploitation, such as taking advantage of differing degrees of bargaining power to extract value from the worker for less than it should be worth if we were being fair and maybe paying people with something like labour vouchers or a similar time-based accounting. Or stealing through fraudulent financial transactions, or charging rents for things that you just happen to own because your ancestors conquered the land centuries ago with swords.
Both of these things can be true at the same time within an economy. For that matter, the same individual could be doing both in various ways, like they could be ostensibly investing and building companies that make valuable things for people, while at the same time exploiting their workers and taking advantage of their historical position as the descendent of landed aristocracy. They could, at the same time, also be scamming their venture capitalists by wildly exaggerating what their company can do. All while still providing goods and services that meet many people’s needs and ways that are more efficient than most possible alternatives, and perhaps the best way possible given the incentives that currently exist.
Things like this tend to be multifaceted and complex. People in general can have competing motivations within themselves, so it would not be strange to expect that in something as convoluted as a society’s economy, there could be many reasons for many things. Trying to decide between two possible theories of why, misses the possibility that both theories contain their own grain of truth, and are each, by themselves, incomplete understandings and world models. The world is not just black or white. It’s many shades of grey, and also, to push the metaphor further, a myriad of colours that can’t accurately be described in greyscale.