Q Home

Karma: 271

Discord: LemonUniverse (lemonuniverse). Reddit: u/Smack-works. About my situation: here.

I wrote some worse posts before 2024 because I was very uncertain how the events may develop.

Q Home May 20, 2025, 4:13 PM
1 point
0
in reply to: TristanTrim’s comment on: A single principle related to many Alignment subproblems?
Could you reformulate the last paragraph as “I’m confused how your idea helps with alignment subrpoblem X”, “I think your idea might be inconsistent or having a failure mode because of Y”, or “I’m not sure how your idea could be used to define Z”?

Wrt the third paragraph. The post is about corrigible task ASI which could be instructed to protect humans from being killed/brainwashed/disempowered (and which won’t kill/brainwash/disempower people before it’s instructed to not do this). The post is not about value learning in the sense of “the AI learns plus-minus the entirety of human ethics and can build an utopia on its own”. I think developing my idea could help with such value learning, but I’m not sure I can easily back up this claim. Also, I don’t know how to apply my idea directly to neural networks.

Q Home May 20, 2025, 8:48 AM
2 points
0
in reply to: TristanTrim’s comment on: A single principle related to many Alignment subproblems?
I think I understand you now. Your question seems much simpler than I expected. You’re basically just asking “but what if we’ll want infinitely complicated / detailed values in the future?”

If people iterativly modified themselves, would their preferences become ever more exacting? If so, then it is true that the “variables humans care about can’t be arbitrarily complicated”, but the variables humans care about could define a desire to become a system capable of caring about arbitrarily complicated variables.

It’s OK if the principle won’t be true for humans in the future, it only needs to be true for the current values. Aligning AI to some of the current human concepts should be enough to define corrigibility, low impact, or avoid goodharting. I.e. create a safe Task AGI. I’m not trying to dictate to anyone what they should care about.

Q Home May 17, 2025, 7:20 AM
2 points
0
in reply to: TristanTrim’s comment on: A single principle related to many Alignment subproblems?
Don’t worry about not reading it all. But could you be a bit more specific about the argument you want to make or the ambiguity you want to clarify? I have a couple of interpretations of your question.

Interpretation A:
1. The post defines a scale-dependent metric which is supposed to tell how likely humans are to care about something.
2. There are objects which are identical/similar on every scale. Do they break the metric? (Similar questions can be asked about things other than “scale”.) For example, what if our universe contains an identical, but much smaller universe, with countless people in it? Men In Black style. Would the metric say we’re unlikely to care about the pocket universe just because of its size?
Interpretation B:
1. The principle says humans don’t care about constraining things in overly specific ways.
2. Some concepts with low Kolmogorov Complexity constrain things in infinitely specific ways.
My response to B is that my metric of simplicity is different from Kolmogorov Complexity.

Q Home May 15, 2025, 5:57 AM
1 point
0
in reply to: Alaric’s comment on: Q Home’s Shortform
Reasonable advice. Sadly, I can’t do math or programming. My work is philosophical/conceptual/informal in the way some Arbital articles are informal. In terms of teaching, maybe I could teach chess. I play somewhat good (2100+ in different time formats on Lichess and Chess. com).

Q Home May 12, 2025, 10:34 AM
1 point
0
in reply to: ProgramCrafter’s comment on: Q Home’s Shortform
Thanks a lot for willingness to go into details. And for giving advice on messaging other researchers.

No offense taken. The marriage option was funny, hope I never get that desperate. Getting official grants is probably not possible for me, but thanks for the suggestion.

by both sides, to be precise

My wording was deliberate. It’s one thing to sanction another country, and another thing to “sanction yourself”.

Q Home May 11, 2025, 10:00 AM
20 points
3
on: Q Home’s Shortform
I’m an independent alignment researcher in Russia. Imagine someone wants to donate money to me (from Europe/UK/America/etc). How can I receive the money? It’s really crucial for me to receive at least 100$ per month, at least for a couple of months. Even 40$ per month would be a small relief. [EDIT: my latest, most well-received, and only published alignment research is here.]
Down below are all the methods I learned about after asking people, Google, Youtube and LLMs:
1. Crypto. The best option. But currently I’m a noob at crypto.
2. There are some official ways to get money into Russia, but the bank can freeze those transfers or start an inquiry.
3. Some freelance artists use Boosty (Patreon-like site). But Boosty can stall money transfers for months and more. If your account doesn’t have subscribers and legitimate content, it can rise suspicion of the site.
4. Someone from a ‘friendly country’ or from Russia itself could act as an intermediary. The last link on this page refers to a network of russian alignment researchers. (Additional challenge: I don’t have a smartphone. With a dumbphone it’s not practically possible to register in Telegram. But most russian alignment researchers are there.)
5. Get out of Russia. Impossible for me, even with financial help.
What should I do? Is there any system for supporting russian researchers?
Also, if I approach a fellow russian researcher about my problem, what should I say? I don’t have experience in this.
Needless to say, the situation is pretty stressful to me. Imagine getting a chance to earn something for your hard work, but then you can’t get even pennies because of absolutely arbitrary restrictions imposed by your own state.
EDIT 2: I got help. Thanks everyone!

Q Home May 4, 2025, 6:10 AM
2 points
0
on: “Superhuman” Isn’t Well Specified
Even with chess there are some nuances:
- Chess engines use much more brute force than humans. Though I think it’s not that easy to compare who does more calculation, since humans have a lot of memory and pattern recognition. Also, I’ve heard about strong chess engines “without search” (grandmaster level), but haven’t looked into it.
- This might be outdated, but chess engines struggle with “fortresses” (a rare position type in chess).

Q Home May 2, 2025, 7:15 AM
1 point
0
in reply to: Charlie Steiner’s comment on: A single principle related to many Alignment subproblems?
You at various points rely on an assumption that there is one unique scale of complexity (one ladder of $V_{1} . . . V_{k}$ ), and it’ll be shared between the humans and the AI. That’s not necessarily true, which creates a lot of leaks where an AI might do something that’s simple in the AI’s internal representation but complicated in the human’s.
I think there are many somewhat different scales of complexity, but they’re all shared between the humans and the AI, so we can choose any of them. We start with properties ( $X$ ) which are definitely easy to understand for humans. Then we gradually relax those properties. According to the principle, $X$ properties will capture all key variables relevant to the human values much earlier than top human mathematicians and physicists will stop understanding what those properties might describe. (Because most of the time, living a value-filled life doesn’t require using the best mathematical and physical knowledge of the day.) My model: “the entirety of human ontology >>> the part of human ontology a corrigible AI needs to share”.
This raises a second problem, which is the “easy to optimize” criterion, and how it might depend on the environment and on what tech tree unlocks (both physical and conceptual) the agent already has. Pink paint is pretty sophisticated, even though our current society has commodified it so we can take getting some for granted. Starting from no tech tree unlocks at all, you can probably get to hacking humans before you can recreate the Sherwin Williams supply chain.
There are three important possibilities relevant to your hypothetical:
- If technology T and human hacking are equally hard to comprehend, then (a) we don’t want the AI to build technology T or (b) the AI should be able to screen off technology T from humans more or less perfectly. For example, maybe producing paint requires complex manipulations with matter, but those manipulations should be screened off from humans. The last paragraph in this section mentions a similar situation.
- Technology T is easier to comprehend than human hacking, but it’s more expensive (requires more resources). Then we should be able to allow the AI to use those resources, if we want to. We should be controlling how much resources the AI is using anyway, so I’m not introducing any unnatural epicycles here.^[1]
- If humans themselves built technology T which affects them in a complicated way (e.g. drugs), it doesn’t mean the AI should build similar types of technology on its own.
My point here is that I don’t think technology undermines the usefulness of my metric. And I don’t think that’s a coincidence. According to the principle, one or both of the below should be true:
1. Up to this point in time, technology never affected what’s easy to optimize/comprehend on a deep enough level.
2. Up to this point in time, humans never used technology to optimize/comprehend (on a deep enough level) most of their fundamental values.
If neither were true, we would believe that technology radically changed fundamental human values at some point in the past. We would see life without technology as devoid of most non-trivial human values.
When the metric is a bit fuzzy and informal, it’s easy to reach convenient/hopeful conclusions about how the human-intended behavior is easy to optimize, but it should be hard to trust those conclusions.
The selling point of my idea is that it comes with a story for why it’s logically impossible for it to fail or why all of its flaws should be easy to predict and fix. Is it easy to come up with such story for other ideas? I agree that it’s too early to buy that story. But I think it’s original and probable enough to deserve attention.
1. ^
  Remember that I’m talking about a Task-directed AGI, not a Sovereign AGI.

A single principle related to many Alignment subproblems?

Q HomeApr 30, 2025, 9:49 AM

34 points

10 comments16 min readLW link

Q Home Apr 22, 2025, 10:59 AM
1 point
0
in reply to: Alex Flint’s comment on: Clarifying the Agent-Like Structure Problem
Can anybody give/reference an ELI5 or ELI15 explanation of this example? How can we use the models without creating them? I know that gradient descent is used to update neural networks, but how can you get the predictions of those NNs without having them?

Q Home Apr 21, 2025, 10:34 AM
5 points
0
on: Clarifying the Agent-Like Structure Problem
I feel very confused about the problem. Would appreciate anyone’s help with the questions below.
1. Why doesn’t the Gooder Regulator theorem solve the Agent-Like Structure Problem?
2. The separation between the “world model”, “search process” and “problem specification” should be in space (not in time)? We should be able to carve the system into those parts, physically?
3. Why would problem specification nessecerily be outside of the world model??? I imagine it could be encoded as an extra object in the world model. Any intuition for why keeping them separate is good for the agent? (I’ll propose one myself, see 5.)
4. Why are the “world model” and “search process” two different entities, what does each of them do? What is the fundamental difference between “modeling the world” and “searching”? Like, imagine I have different types of heuristics (A, B, C) for predicting the world, but I also can use them for search.
5. Doesn’t the inner alignment problem resolve the Agent-Like Structure Problem? Let me explain. Take a human, e.g. me. I have a big, changing brain. Parts of my brain can be said to want different things. That’s an instance of the inner alignment problem. And that’s a reason why having my goals completely entangled with all other parts of my brain could be dangerous (in such case it could be easier for any minor misalignment to blow up and overwrite my entire personality).
6. As I understand, the arguments from here would at least partially solve the problem, right? If they were formalized.

[Question] Is Peano arithmetic trying to kill us? Do we care?

Q HomeMar 18, 2025, 8:22 AM

17 points

2 comments2 min readLW link

Q Home Mar 17, 2025, 9:26 AM
5 points
−1
on: Q Home’s Shortform
I have a couple of silly, absurd questions related to mesa-optimizers and mesa-controllers. I’m asking them to get a fresh look on the problem of inner alignment. I want to get a better grip on what basic properties of a model make it safe.

Question 1. How do we know that Quantum Mechanics theory is not plotting to kill humanity?

It’s a model, so it could be unsafe just like an AI.
- QM is not an agent, but its predictions strongly affect humanity. Oracles can be dangerous.
- QM is highly interpretable, so we can check that it’s not doing internal search. Or can we? Maybe it does search in some implicit way? Eliezer brought up this possibility: if you prohibit an AI from modeling its programmers’ psychology, the AI might start modelling something seemingly irrelevant which is actually equivalent to modeling the programmers’ psychology.
Maybe the AI reasons about certain very complicated properties of the material object on the pedestal… in fact, these properties are so complicated that they turn out to contain implicit models of User2′s psychology
- Even if QM doesn’t do search in any way… maybe it still was optimized to steer humanity towards disaster?
Or maybe QM is “grounded” in some special way (e.g. it’s easy to split into parts and verify that each part is correct), so we’re very confident that it does physics and only physics?

Question 2. Crazier version of the previous question: how do we know that Peano arithmetic isn’t plotting to destroy humanity? how do we know that the game of chess isn’t plotting to end humanity?

Maybe Peano arithmetic contains theorems trying to prove which steers the real world towards disaster. How can we know and when do we care?

Question 3. Imagine you came up with a plan to achieve your goals. You did it yourself. How do you know that this plan is not optimizing for your ruin?

Humans do go insane and fall into addictions. But not always. So why are our thoughts relatively safe to us? Why doesn’t every new thought / experience turn into addiction which wipes out all of your previous personality?

Question 4. You’re the Telepath. You can read the mind of the Killer. The Killer can reason about some things which aren’t comprehensible to you, but otherwise your cognition is very similar. Can you always tell if the Killer is planning to kill you?

Here are some thoughts the Killer might think:
1. “I need to do <something incomprehensible> so the Telepath dies.”
2. “I need to get the Telepath to eat this food with <something incomprehensible> in it.”
3. “I need to do <something incomprehensible> without any comprehensible reason.”
With 1 we can understand the outcome and that’s all that matters. With 2 we can still tell that something dodgy is going on. Even in 3 we see that the Killer tries to make his reasoning illegible. Maybe the Killer can never deceive us if the incomprehensible concepts he’s thinking about are “embedded” into the comprehensible concepts?

Q Home Feb 12, 2025, 1:33 AM
1 point
0
in reply to: Charlie Steiner’s comment on: Half-baked idea: a straightforward method for learning environmental goals?
So at first I though this didn’t include a step where the AI learns to care about things—it only learns to model things. But I think actually you’re assuming that we can just directly use the model to pick actions that have predicted good outcomes—which are going to be selected as “good” according the the pre-specified P-properties. This is a flaw because it’s leaving too much hard work for the specifiers to do—we want the environment to do way more work at selecting what’s “good.”
I assume we get an easily interpretable model where the difference between “real strawberries” and “pictures of strawberries” and “things sometimes correlated with strawberries” is easy to define, so we can use the model to directly pick the physical things AI should care about. I’m trying to address the problem of environmental goals, not the problem of teaching AI morals. Or maybe I’m misunderstanding your point?
The object level problem is that sometimes your AI will assign your P-properties to atoms and quantum fields (“What they want is to obey the laws of physics. What they believe is their local state.”), or your individual cells, etc.
If you’re talking about AI learning morals, my idea is not about that. Not about modeling desires and beliefs.
The meta level problem is that trying to get the AI to assign properties in a human-approved way is a complicated problem that you can only do so well without communicating with humans. (John Wentworth disagrees more or less, check out things tagged Natural Abstractions for more reading, but also try not to get too confirmation-biased.)
I disagree too, but in a slightly different way. IIRC, John says approximately the following:
1. All reasoning systems converge on the same space of abstractions. This space of abstractions is the best way to model the universe.
2. In this space of abstractions it’s easy to find the abstraction corresponding to e.g. real diamonds.
I think (1) doesn’t need to be true. I say:
1. By default, humans only care about things they can easily interact with in humanly comprehensible ways. “Things which are easy to interact with in humanly comprehensible ways” should have a simple definition.
2. Among all “things which are easy to interact with in humanly comprehensible ways”, it’s easy to find the abstraction corresponding to e.g. real diamonds.

Q Home Feb 8, 2025, 9:53 AM
5 points
0
in reply to: Capybasilisk’s comment on: Half-baked idea: a straightforward method for learning environmental goals?
The subproblem of environmental goals is just to make AI care about natural enough (from the human perspective) “causes” of sensory data, not to align AI to the entirety of human values. Fundamental variables have no (direct) relation to the latter problem.

However, fundamental variables would be helpful for defining impact measures if we had a principled way to differentiate “times when it’s OK to sidestep fundamental variables” from “times when it’s NOT OK to sidestep fundamental variables”. That’s where the things you’re talking about definitely become a problem. Or maybe I’m confused about your point.

Q Home Feb 7, 2025, 11:01 AM
5 points
0
in reply to: Capybasilisk’s comment on: Half-baked idea: a straightforward method for learning environmental goals?
Thank you for actually engaging with the idea (pointing out problems and whatnot) rather than just suggesting reading material.

Btw, would you count a data packet as an object you move through space?

A couple of points:
- I only assume AI models the world as “objects” moving through space and time, without restricting what those objects could be. So yes, a data packet might count.
- “Fundamental variables” don’t have to capture all typical effects of humans on the world, they only need to capture typical human actions which humans themselves can easily perceive and comprehend. So the fact that a human can send an Internet message at ²⁄₃ speed of light doesn’t mean that “2/3 speed of light” should be included in the range of fundamental variables, since humans can’t move and react at such speeds.
- Conclusion: data packets can be seen as objects, but there are many other objects which are much easier for humans to interact with.
- Also note that fundamental variables are not meant to be some kind of “moral speed limits”, prohibiting humans or AIs from acting at certain speeds. Fundamental variables are only needed to figure out what physical things humans can most easily interact with (because those are the objects humans are most likely to care about).
This range is quite huge. In certain contexts, you’d want to be moving through space at high fractions of the speed of light, rather than walking speed. Same goes for moving other objects through space.

What contexts do you mean? Maybe my point about “moral speed limits” addresses this.

Hopefully the AI knows you mean moving in sync with Earth’s movement through space.

Yes, relativity of motion is a problem which needs to be analyzed. Fundamental variables should refer to relative speeds/displacements or something.

The paper is surely at least partially relevant, but what’s your own opinion on it? I’m confused about this part: (4.2 Defining Utility Functions in Terms of Learned Models)

For example a person may be specified by textual name and address, by textual physical description, and by images and other recordings. There is very active research on recognizing people and objects by such specifications (Bishop, 2006; Koutroumbas and Theodoris, 2008; Russell and Norvig, 2010). This paper will not discuss the details of how specifications can be matched to structures in learned environment models, but assumes that algorithms for doing this are included in the utility function implementation.

Does it just completely ignore the main problem?

I know Abram Demski wrote about Model-based Utility Functions, but I couldn’t fully understand his post too.

(Disclaimer: I’m almost mathematically illiterate, except knowing a lot of mathematical concepts from popular materials. Halting problem, Godel, uncountability, ordinals vs. cardinals, etc.)

Half-baked idea: a straightforward method for learning environmental goals?

Q HomeFeb 4, 2025, 6:56 AM

16 points

7 comments5 min readLW link

Q Home Jan 28, 2025, 7:56 AM
6 points
0
on: Q Home’s Shortform
Epistemic status: Draft of a post. I want to propose a method of learning environmental goals (a super big, super important subproblem in Alignment). It’s informal, so has a lot of gaps. I worry I missed something obvious, rendering my argument completely meaningless. I asked LessWrong feedback team, but they couldn’t get someone knowledgeable enough to take a look.

Can you tell me the biggest conceptual problems of my method? Can you tell me if agent foundations researchers are aware of this method or not?

If you’re not familiar with the problem, here’s the context: Environmental goals; identifying causal goal concepts from sensory data; ontology identification problem; Pointers Problem; Eliciting Latent Knowledge.

Explanation 1

One naive solution

Imagine we have a room full of animals. AI sees the room through a camera. How can AI learn to care about the real animals in the room rather than their images on the camera?

Assumption 1. Let’s assume AI models the world as a bunch of objects interacting in space and time. I don’t know how critical or problematic this assumption is.

Idea 1. Animals in the video are objects with certain properties (they move continuously, they move with certain relative speeds, they have certain sizes, etc). Let’s make the AI search for the best world-model which contains objects with similar properties (P properties).

Problem 1. Ideally, AI will find clouds of atoms which move similarly to the animals on the video. However, AI might just find a world-model (X) which contains the screen of the camera. So it’ll end up caring about “movement” of the pixels on the screen. Fail.

Observation 1. Our world contains many objects with P properties which don’t show up on the camera. So, X is not the best world-model containing the biggest number of objects with P properties.

Idea 2. Let’s make the AI search for the best world-model containing the biggest number of objects with P properties.

Question 1. For “Idea 2” to make practical sense, we need to find a smart way to limit the complexity of the models. Otherwise AI might just make any model contain arbitrary amounts of any objects. Can we find the right complexity prior?

Question 2. Assume we resolved the previous question positively. What if “Idea 2” still produces an alien ontology humans don’t care about? Can it happen?

Question 3. Assume everything works out. How do we know that this is a general method of solving the problem? We have an object in sense data (A), we care about the physical thing corresponding to it (B): how do we know B always behaves similarly to A and there are always more instances of B than of A?

One philosophical argument

I think there’s a philosophical argument which allows to resolve Questions 2 & 3 (giving evidence that Question 1 should be resolvable too).
- By default, we only care about objects with which we can “meaningfully” interact with in our daily life. This guarantees that B always has to behave similarly to A, in some technical sense (otherwise we wouldn’t be able to meaningfully interact with B). Also, sense data is a part of reality, so B includes A, therefore there are always more instances of B than of A, in some technical sense. This resolves Question 3.
- By default, we only care about objects with which we can “meaningfully” interact with in our daily life. This guarantees that models of the world based on such objects are interpretable. This resolves Question 2.
- Can we define what “meaningfully” means? I think that should be relatively easy, at least in theory. There doesn’t have to be One True Definition Which Covers All Cases.
If the argument is true, the pointers problem should be solvable without Natural Abstraction hypothesis being true.

Anyway, I’ll add a toy example which hopefully helps to better understand what’s this all about.

One toy example

You’re inside a 3D video game. 1st person view. The game contains landscapes and objects, both made of small balls (the size of tennis balls) of different colors. Also a character you control.

The character can push objects. Objects can break into pieces. Physics is Newtonian. Balls are held together by some force. Balls can have dramatically different weights.

Light is modeled by particles. Sun emits particles, they bounce off of surfaces.

The most unusual thing: as you move, your coordinates are fed into a pseudorandom number generator. The numbers from the generator are then used to swap places of arbitrary balls.

You care about pushing boxes (as everything, they’re made of balls too) into a certain location.

...

So, the reality of the game has roughly 5 levels:
1. The level of sense data (2D screen of the 1st person view).
2. A. The level of ball structures. B. The level of individual balls.
3. A. The level of waves of light particles. B. The level of individual light particles.
I think AI should be able to figure out that it needs to care about 2A level of reality. Because ball structures are much simpler to control (by doing normal activities with the game’s character) than individual balls. And light particles are harder to interact with than ball structures, due to their speed and nature.

Explanation 2

An alternative explanation of my argument:
1. Imagine activities which are crucial for a normal human life. For example: moving yourself in space (in a certain speed range); moving other things in space (in a certain speed range); staying in a single spot (for a certain time range); moving in a single direction (for a certain time range); having varied visual experiences (changing in a certain frequency range); etc. Those activities can be abstracted into mathematical properties of certain variables (speed of movement, continuity of movement, etc). Let’s call them “fundamental variables”. Fundamental variables are defined using sensory data or abstractions over sensory data.
2. Some variables can be optimized (for a long enough period of time) by fundamental variables. Other variables can’t be optimized (for a long enough period of time) by fundamental variables. For example: proximity of my body to my bed is an optimizable variable (I can walk towards the bed — walking is a normal activity); the amount of things I see is an optimizable variable (I can close my eyes or hide some things — both actions are normal activities); closeness of two particular oxygen molecules might be a non-optimizable variable (it might be impossible to control their positions without doing something weird).
3. By default, people only care about optimizable variables. Unless there are special philosophical reasons to care about some obscure non-optimizable variable which doesn’t have any significant effect on optimizable variables.
4. You can have a model which describes typical changes of an optimizable variable. Models of different optimizable variables have different predictive power. For example, “positions & shapes of chairs” and “positions & shapes of clouds of atoms” are both optimizable variables, but models of the latter have much greater predictive power. Complexity of the models needs to be limited, by the way, otherwise all models will have the same predictive power.
5. Collateral conclusions: typical changes of any optimizable variable are easily understandable by a human (since it can be optimized by fundamental variables, based on typical human activities); all optimizable variables are “similar” to each other, in some sense (since they all can be optimized by the same fundamental variables); there’s a natural hierarchy of optimizable variables (based on predictive power). Main conclusion: while the true model of the world might be infinitely complex, physical things which ground humans’ high-level concepts (such as “chairs”, “cars”, “trees”, etc.) always have to have a simple model (which works most of the time, where “most” has a technical meaning determined by fundamental variables).
Formalization

So, the core of my idea is this:
1. AI is given “P properties” which a variable of its world-model might have. (Let’s call a variable with P properties P-variable.)
2. AI searches for a world-model with the biggest amount of P-variables. AI makes sure it doesn’t introduce useless P-variables. We also need to be careful with how we measure the “amount” of P-variables: we need to measure something like “density” rather than “amount” (i.e. the amount of P-variables contributing to a particular relevant situation, rather than the amount of P-variables overall?).
3. AI gets an interpretable world-model (because P-variables are highly interpretable), adequate for defining what we care about (because by default, humans only care about P-variables).
How far are we from being able to do something like this? Are agent foundations researches pursuing this or something else?

[Question] Popular materials about environmental goals/agent foundations? People wanting to discuss such topics?

Q HomeJan 22, 2025, 3:30 AM

5 points

0 comments1 min readLW link

Q Home Jan 18, 2025, 3:43 AM
2 points
0
on: Q Home’s Shortform
Sorry if it’s not appropriate for this site. But is anybody interested in chess research? I’ve seen that people here might be interested in chess. For example, here’s a chess post barely related to AI.

Intro

In chess, what positions have the longest forced wins? “Mate in N” positions can be split into 3 types:
1. Positions which use “tricks” to get a big number of moves before checkmate. Such as cycles of repeating moves. For example, this manmade mate in 415 (see the last position) uses obvious cycles. Not to mention mates in omega.
2. Tablebase checkmates, discovered by brute force, showing absolutely incomprehensible play with no discernible logic. See this mate in 549 moves. One should assume it’s based on some hidden cycles or something?
3. Positions which are similar to immortal games. Where the winning variation requires a combination without any cycles. For example: Kasparov’s Immortal (14 moves long combination), Stoofvlees vs. Igel (down a rook for 21 moves) - the examples lack optimal play tho.
Surprisingly, nobody seems to look for the longest mates of Type 3. Well, I did look for them and discovered some. Down below I’ll explain multiple ways to define what exactly I did. Won’t go into too much detail. If you want more detail—Research idea: the longest non-trivial middlegames. There you also can see the puzzles I’ve created.

My longest puzzle is 42 moves: https://lichess.org/study/sTon08Mb/JG4YGbcP Overall, I’ve created 7 unique puzzles. Worked a lot on 1 more (mate in 52 moves), but couldn’t make it work.

Among other things, I made this absurd mate in 34 puzzle. Almost the entire board is filled with pieces (62 pieces on the board!), only two squares are empty. And despite that the position has deep content. It’s kinda a miracle. I think it deserves recognition.

Definition 1

Unlike Type 1 and Type 2 mates, my mates involve many sacrifices of material. So my mates can be defined as “the longest sacrificial combinations”.

Definition 2

We can come up with important metrics which make a long mate more special, harder to find, more rare. Material disbalance, amount of non-check moves, amount of freedom of pieces, etc. Then we can search for the longest mates compatible with high enough values of those metrics.

Well, that’s what I did.

Definition 3

This is an idea of a definition rather than a definition. But it might be important.
- Take a sequential game with perfect information.
- Take positions with the longest forced wins.
- Out of those positions, choose positions where the defending side has the greatest control over the attacking side’s optimal strategy.
My mates are an example of positions where the defending side has especially great control over the flow of the game.

Deeper meaning?

Can there be any deep meaning behind researching my type of mates? I think yes. There are two relevant things.
1. First thing is hard to explain, because I’m not a mathematician. But I’ll try. Math can often be seen as skipping stuff which is the most interesting to humans. For example, math can prove theorems about games in general, without explaining why a specific game is interesting or why a specific position is interesting. However, here it seems like we can define something very closely related to subjective “interestingness”.
2. Hardness of defining valuable things is relevant to Alignment. The definitions above reveal that maybe sometimes valuable things are easier to define than it seems.
Reception

How did chess community receive my work?
- On Reddit, some posts got a moderate amount of upvotes (enough to get into daily top). A silly middlegame position. With checkmate in 50-80 moves? (110+); Does this position set any record? (60+). Sadly the pattern didn’t continue: New long non-trivial middlegame mate found. Nobody asked for this. (1).
- On a computer chess forum, people mostly ignored it. I hoped they could help me find the longest attacks in computer games.
- On the Discord of chess composers, a bunch of people complimented my project. But nobody showed any proactive interest (e.g. “hey, I’d like to preserve your work”). One person reacted like ~”I’m not a specialist on that type of thing, I don’t know with whom you could talk about that”
- On Reddit communities where you can ask mathematicians things, people told that game theory is too abstract for tackling such things.

Q Home

A sin­gle prin­ci­ple re­lated to many Align­ment sub­prob­lems?

[Question] Is Peano ar­ith­metic try­ing to kill us? Do we care?

Half-baked idea: a straight­for­ward method for learn­ing en­vi­ron­men­tal goals?

Explanation 1

One naive solution

One philosophical argument

One toy example

Explanation 2

Formalization

[Question] Pop­u­lar ma­te­ri­als about en­vi­ron­men­tal goals/​agent foun­da­tions? Peo­ple want­ing to dis­cuss such top­ics?

Intro

Definition 1

Definition 2

Definition 3

Deeper meaning?

Reception

A single principle related to many Alignment subproblems?

[Question] Is Peano arithmetic trying to kill us? Do we care?

Half-baked idea: a straightforward method for learning environmental goals?

[Question] Popular materials about environmental goals/agent foundations? People wanting to discuss such topics?