↘↘↘↘↘↘↙↙↙↙↙↙
Checkout my Biography.
↗↗↗↗↗↗↖↖↖↖↖↖
Johannes C. Mayer
I ate tons of beluga lentils. Sometimes 1kg (cooked) a day. That wasn’t enough. However, now I switched to eating 600g (cooked) soybeans every day, and that was a very significant improvement (like solving the problem to 75% or so). Soy is a complete protein. Soy beans are also very cheap.
- Dec 23, 2024, 12:57 PM; 7 points) 's comment on Vegans need to eat just enough Meat—emperically evaluate the minimum ammount of meat that maximizes utility by (
Vegans need to eat just enough Meat—emperically evaluate the minimum ammount of meat that maximizes utility
Doing Sport Reliably via Dancing
Note this 50% likely only holds if you are using a main stream language. For some non-main stream language I have gotten responses that where really unbelivably bad. Things like “the name of this variable wrong” which literally could never be the problem (it was a valid identifier).
And similarly, if you are trying to encode novel concepts, it’s very different from gluing together libraries, or implementing standard well known tasks, which I would guess is what habryka is mostly doing (not that this is a bad thing to do).
Maybe you include this in “stack overflow substitute”, but the main thing I use LLMs for is to understand well known technical things. The workflow is: 1) I am interested in understanding something, e.g. how a multiplexed barrel bit shifter works. 2) I ask the LLM to explain the concept. 3) Based on the initial response I create seperate conversation branches with questions I have (to save money and have the context be closer. Didn’t evaluate if this actually makes the LLM better.). 4) Once I think I understood the concept or part of the concept I explain it to GPT. (Really I do this all the time during the entire process.) 5) The LLM (hopeful) corrects me if I am wrong (it seems it detects mistakes more often than not).
The last part of the conversation can then looks like this:
User: Ah, now I think I understand. You have mechanism that shifts by a fixed ammount (powers of 2). Each of these multiplexers can be set up such that the data flows through them, or though an ID wire, into the next multiplexer. Now by controlling whether the value flows through the multiplexer or the ID wire, we can “sum up the shifts”. Specifically if we have shifters of 1, 2, 4, …, 32, then we can achive any shift (for 64-bit numbers) by activating the appropirate multiplexers. This design has not 64^64 gates, but instead 64^6 gates. 64 gates for each multiplexer.
AI: Exactly, you’ve got the concept! The design you described efficiently uses a series of multiplexers to perform shifts by fixed amounts that are powers of two (1, 2, 4, …, 32 for a 64-bit number). By selecting which multiplexers to activate, you can achieve any desired shift amount.
I had probably ~200,000 words worth of conversation with LLMs, mainly in this format.
I am not sure what next leap you are talking about. But I intuit based on some observations that GPT-4o is much better for this than GPT-3 (you might talk about more recent “leaps”). (Didn’t test o1 extensively because it’s so expensive).
I totally agree with this. I expect the majority early AI researchers where falling into this trap. The main problem I am focusing on is how a mind can construct a model of the world in the first place.
The goal is to have a system where there are no unlabeled parameters ideally. That would be the world modeling system. It then would build a world model that would have many unlabeled parameters. By understanding the world modeler system you can ensure that the world model has certain properties. E.g. there is some property (which I don’t know) of how to make the world model not contain dangerous minds.
E.g. imagine the AI is really good at world modeling, and now it models you (you are part of the world) so accurately that you are now basically copied into the AI. Now you might try to escape the AI, which would actually be really good because then you could save the world as a speed intelligence (assuming the model of you would really accurate which is probably wouldn’t be). But if it models another mind (maybe it considers dangerous adversaries) then maybe they could also escape, and would not be aligned.
By understanding the system you could put constraints on what world models can be generated, such that all generated world models can’t contain such dangerous minds, or at least make such minds much less likely.
I propose that a more realistic example would be “classifying images via a ConvNet with 100,000,000 weights” versus “classifying images via 5,000,000 lines of Python code involving 1,000,000 nonsense variable names”. The latter is obviously less inscrutable on the margin but it’s not a huge difference.
Python code is a discrete structure. You can do proofs on more easily than for a NN. You could try to apply program transformations on it that preserve functional equality, trying to optimize for some measure of “human understandable structure”. There are image classification alogrithms iirc that are worse than NN but much more interpretable, and these algorithms would at most be hundets of lines of code I guess (haven’t really looked a lot at them).
Anyway, it’s fine to brainstorm on things like this, but I claim that you can do that brainstorming perfectly well by assuming that the world model is a Bayes net (or use OpenCog AtomSpace, or Soar, or whatever), or even just talk about it generically.
You give examples of recognizing problems. I tried to give examples of how you can solve these problems. I’m not brainstorming on “how could this system fail”. Instead I understand something, and then I just notice without really trying, that now I can do a thing that seems very useful, like making the system not think about human psycology given certain constraints.
Probably I completely failed at making clear why I think that, because my explanation was terrible. In any case I think your suggested brainstorming this is completely different from the thing that I am actually doing.
To me it just seems that limiting the depth of a tree search is better that limiting the compute of a black box neural network. It seems like you can get a much better grip on what it means to limit the depth, and what this implies about the system behavior, when you actually understand how tree search works. Of cause tree search here is only an example.
John’s post is quite wierd, because it only says true things, and implicitly implies a conclusion, namely that NNs are not less interpretable than some other thing, which is totally wrong.
Example: A neural network implements modular arithmetic with furier transforms. If you implement that furier algorithm in python, it’s harder to understand for a human than the obvious modular arithmetic implementation in python.
It doesn’t matter if the world model is inscruitable when looking directly at it, if you can change the generating code such that certain properties must hold. Figuring out what these properties is not directly solved by understading intelligence of cause.
This is bad because, if AGI is very compute-efficient, then when we have AGI at all, we will have AGI that a great many actors around the world will be able to program and run, and that makes governance very much harder.
This is bad because, if AGI is very compute-efficient, then when we have AGI at all, we will have AGI that a great many actors around the world will be able to program and run, and that makes governance very much harder.
Totally agree, so obviously try super hard to not leak the working AGI code if you had it.
But you won’t get insight into those distinctions, or how to ensure them in an AGI, by thinking about whether world-model stuff is stored as connections on graphs versus induction heads or whatever.
No you can. E.g. I could define theoretically a general algoritm that identifies the minimum concrepts neccesary, if I know enough about the structure of the system, specifically how concepts are stored, for solving a task. That’s of cause not perfect, but it would seem that for very many problems it would make the AI unable to think about things like human manipulation, or that it is a constrained AI, even if that knowledge was somewhere in a learned black box world model. This is just an example of something you can do by knowing the structure of a system.
If your system is some plain code with for loops, just reduce the number the for loops of seach processes do. Now decreasing/incleasing the iterations somewhat will correspond to making the system dumber/smarter. Again obviously not solving the problem completely, but clearly a powerful thing to be able to do.
Of cause many low level details do not matter. Often you’d only care that something is a sequence, or a set. I am talking about a higher level program structure.
It feels like you are somewhat missing the point. The goal is to understand how intelligence works. Clearly that would be very useful for alignment? Even if you would get a blackbox world model. But of cause it would also enable you to think about how to make such a world model more interpretable. I think that is possible, it’s just not what I am focusing on now.
I specifically am talking about solving problems that nobody knows the answer to, where you are probably even wrong about what the problem even is. I am not talking about taking notes on existing material. I am talking about documenting the process of generating knowledge.
I am saying that I forget important ideas that I generated in the past, probably they are not yet so refined that they are impossible to forget.
A robust alignment scheme would likely be trivial to transform into an AGI recipe.
Perhaps if you did have the full solution, but it feels like that there are some things of a solution that you could figure out, such that that part of the solution doesn’t tell you as much about the other parts of the solution.
And it also feels like there could be a book such that if you read it you would gain a lot of knowledge about how to align AIs without knowing that much more about how to build one. E.g. a theoretical solution to the stop button problem seems like it would not tell you that much about how to build an AGI compared to figuring out how to properly learn a world model of Minecraft. And knowing how to build a world model of minecraft probably helps a lot with solving the stop button problem, but it doesn’t just trivially yield a solution.
If you had a system with “ENTITY 92852384 implies ENTITY 8593483” it would be a lot of progress, as currently in neural networks we don’t even understand the interal structures.
I want to have an algorithm that creates a world model. The world is large. A world model is uninterpretable by default through it’s sheer size, even if you had interpretable but low level abels. By default we don’t get any interpretable labels. I think there are ways to have generic dataprocessing procedures that don’t talk about the human mind at all, that would yield more interpretable world model. Similar to how you could probably specify some very general property about python programs, such that that program becomes easier to understand by humans. E.g. a formalism of what it means that the control flow is straightforward: Don’t use goto in C.
But even if you wouldn’t have this, understanding the system still allows you to understand what the structure of the knowledge would be. It seems plausible that one could simply by understanding the system very well, make it such that the learned datastrucutres need to take particular shapes, such that these shapes correspond some relevant alignment properties.
In any case, it seems that this is a problem that any possible way to build an intelligence runs into? So I don’t think it is a case against the project. When building an AI with NN you might not even think about that the interal representations might be wierd and alien (even for an LLM trained on human text)[1], but the same problem persists.
- ^
I haven’t looked into this, or thought about at all, though that’s what I expect.
- ^
You Need a Research Log
I definitely very often run into the problem that I forget why something was good to do in the first place. What are the important bits? Often I get sidetracked, and then the thing that I am doing seems not so got, so I stop and do something completely different. But then later on I realize that actually the original reason that led me down the path was good and that it would have been better to only backtrack a bit to the important piece. But often I just don’t remember the important piece in the moment.
E.g. I think that having some kind of linking structure in your world model, that links objects in the model to the real world is important such that you can travel backward on the links to identify where exactly in your world model the error is. Then I go off and construct some formalism for a bit, but before I got to the point of adding the links I forgot that that was the original motivation, and so I just analyzed the model for a couple of hours before realizing that I still haven’t added the linking structure. So it even happens during the same research session for me if I am not careful. And if you want to continue the next day, or a week later, having organized your thoughts in a way that isn’t so painful to go through that you won’t do it is extremely helpful.
I recognized a couple of things as important so far for being able to do it correctly:
Make it fun to make the notes. If you can’t make this information processing activity fun you basically can’t do it.
My brain somehow seems to like doing it much more when I put all the notes on a website.
Also taking lots of ADHD medication helps.
Make the notes high quality enough such that they are readable, instead of a wall of garbage text.
Writing thoughts mainly on a whiteboard, and analog journals (including reflection) seems to help a lot (in general actually).
Integrate note-taking tightly into your research workflow.
Don’t rely on postprocessing, i.e. having a separate step of producing research notes. At least I didn’t manage to get this to work at all so far. As much as possible make the content you produce in the first place as good as possible (analog tools help a lot with this). That means writing up notes and reflections as you are working, not at some time later (which never actually comes).
I’d think you can define a tedrahedron for non-euclidean space. And you can talk about and reason about a set of polyhedra with 10 verticies as an abstract object without talking or defining any specific such polyhedra.
Just consider if you take the assumption that the system would not change in arbitrary ways in response to it’s environment. There might be certain constrains. You can think about what the constrains need to be such that e.g. a self modifying agent would never change itself such that it would expect that in the future it would get less utility than if it would not selfmodify.
And that is just a random thing that came to mind without me trying. I would expect that you can learn useful things about alignment by thinking about such things. Infact the line between understanding intelligence and figuring out alignment in advance really doesn’t exist I think. Clearly understanding something about alignment is understanding something about intelligence.
When people say to only figure out alignment thing, maybe what they mean is to figure out things about intelligence that won’t actually get you much closer to being able to build a dangerous intelligence. And there do seem to be such things. It is just that I expect that just trying to work on these will not actually make you generate the most useful models about intelligence in your mind, making you worse/slower at thinking on average per unit of time working.
And that’s of cause not a law. Probably there are some things that you want to understand through an abstract theoretical lens at certain points in time. Do whatever works best.
The way I would approach this problem (after not much thought): Come up with a concrete system architecture A of a maimizing computer program that has an explicit utility function, and is known to behave optimally. E.g. maybe it plays tic tac toe or 4-in a row optimally.
Now mutate the source code of A slightly such that it is no longer optimal to get a system B. The objective is not modified. Now B still “wants” to basically be A, in the sense that if it is a general enough optimizer and has access to selfmodification facilities, it would try to make itself be A, because A is better at optimizing the objective.
I predict by creating a setup where the delta between B and A is small, you can create a tractable problem, without sidestepping the core bottlecks, i.e. solving “correct selfmodification” for small delta between A and B, seems like it needs to solve some hard part of the problem. Once you solved it increase the delta, and solve it again.
Unsure about the exact setup for giving the systems the ability to selfmodify. I intuit one can construct a toy setup that can generate good insight such that B doesn’t actually need to be that powerful, or that general of an optimizer.
To me it seems that understanding how a system that you are building actually works (i.e. have good models about its internal) is the most basic requirement to be able to reason about the system coherently at all.
Yes if you’d actually understood how intelligence works in a deep way you don’t automatically solve alignment. But it sure will make it a lot more tractable in many ways. Especially when only aiming for a pivotal act.
I am pretty sure you can figure out alignment in advance as you suggest. That might be the overall saver route… if we didn’t have coordination problems. But it seems slower, and we don’t have time.
Obviously, if you figure out the intelligence algorithm before you know how to steer it, don’t put it on GitHub or the universe’s fate will be sealed momentarily. Ideally don’t even run it at all.
So far working on this project seems to have created ontologies in my brain that are good for thinking about alignment. There are a couple of approaches that now seem obvious, which I think wouldn’t seem obvious before. Again having good models about intelligence (which is really what this is about) is actually useful for thinking about intelligence. And Alignment research is mainly thinking about intelligence.
The approach many people take of trying to pick some alignment problem seems somewhat backward to me. E.g. embedded agency is a very important problem, and you need to solve it at some point. But it doesn’t feel like the problem such that when you work on it, you build up the most useful models of intelligence in your brain.
As an imperfect analogy consider trying to understand how a computer works by first understanding how modern DRAM works. To build a practical computer you might need to use DRAM. But in principle, you could build a computer with only S-R latch memory. So clearly while important it is not at the very core. First, you understand how NAND gates work, the ALU, and so on. Once you have a good understanding of the fundamentals, DRAM will be much easier to understand. It becomes obvious how it needs to work at a high level: You can write and read bits. If you don’t understand how a computer works you might not even know why storing a bit is an important thing to be able to do.
Goal: Understand Intelligence
It’s becomes more interresting when the people constrain their output based on what they expect is true information that the other person does not yet know. It’s useful to talk to an expert, who tells you a bunch of random stuff they know that you don’t.
Often some of it will be useful. This only works if they understand what you have said though (which presumably is something that you are interested in). And often the problem is that people’s models about what is useful are wrong. This is especially likely if you are an expert in something. Then the thing that most people will say will be worse what you would think on the topic. This is especially bad if the people can’t immediately even see why what you are saying is right.
The best strategy around this I have found so far is just to switch the topic to the actually interesting/important things. Suprisingly usually people go along with it.
2024-10-14 Added the “FUUU 754 extensions M and S” section.
The reason I mention chicken is that last time I ran this experiment with beef my body started to hurt really bad such that I woke up in the middle of the night. I am pretty sure that the beef was the reason. Maybe something weird was going on in my body at the same time. However, when I tried the same one week later with chicken I didn’t have this issue.