I find this project very interesting and thought a lot about it in the last 2 weeks. The way I understand the main goal of the project is the following:
providing us (AI researchers) with a model that has an additional output dimension (the “thoughts”)
training the model in such a way that this new dimension is semantically linked directly to the primary output dimension (the “prompt”)
especially linked in some kind of temporal causality (“early” thoughts producing the prompt), not too close the the primary output (so that it contains semantic meaning that cannot be induced by interpreting the prompt alone), but not too far away either (so that it actually “causes” the prompt—as accurately as we can get it. Hence @Eliezer’s authoring technique of one person writing the thoughts and another writing the prompt)
such a model could then be analyzed and experimented with in several ways. One obvious study: intervention on the thought-level and observation of the effect on the prompt level. With the big alignment goal in mind: If we can put safe guards on an AI’s thoughts, before they lead to action, we are safer than if we put guards on only the actions.
I understand that the project does not aim at creating a “true” interpretation of the inner workings of a neural network. (“true” in the sense of a map reflecting the territory in a way that actually helps navigating the world / predicting the behavior of the model / helping to create an AI from scratch without training).
Upon reflecting on this goal, I noticed myself coming back to several points that I identified as the following three somewhat distinct topics:
(1) internal thought structure
I believe that—because it is stated clearly that you want the actual thoughts of a human DM to be recorded—you wish the new model to provide “thoughts” that are similar to how “we would think” (probably so that we can then later interpret these thoughts easier). From this I draw the following conclusion:
We should record the “thoughts” in such a way that the structure of the recording (=the written thoughts = the training data) matches our actual thoughts as close as possible.
When I try to introspect my thoughts when I play the role of a DM in a classical PnP RPG, I find myself not thinking a “bullet list with bracket annotations”, though. Actual thoughts are often non-verbal in nature. Of course we need to press them into English sentences (to be able to process them with our current tech neural networks, and to be able to record them in the first place). But beside that, I think that there is more structure in the typical human DM’s thoughts than this project tries to make use of. The different types of brackets and parentheses capture quite a bit already, but not all of the structure that I believe to observe.
I elaborate:
object level thoughts:
I keep some kind of “world model” alive in my RAM (the [square brackets] try to implement this structure a little)
I update this model “over in-game time”—and not only in reaction to player actions
My world model is not complete. There are explicit and implicit “white spots”. When a player action or a plot twist leads into one of these blank areas, I consciously generate new world data.
The world model especially contains NPCs, which are models of other agents in the world that have their own agenda and may also act on their own.
(plot-level thoughts):
Each RGP starts with a lot of plot ideas in the DM’s mind. This is different for dungeon runs, but in any case, there is this separate “storage” for the current laid-out plot ideas.
There is a constant or at least regular check-up-thought on how far we are in the current plot arc, and whether we are drifting of and I need to either get the player on track or adjust the plot planning.
In these cases, I make a decision (more or less conscious) which is a different kind of thought than both the object-level story-writing (which involves lots of small “creative” decisions) and the meta-level observation/reasoning on the plot/the player.
...
You see, this is just an attempt at grasping the internal thought structure, I’m nowhere near done, and it definitively does not contain a concrete proposal on how to write down the thoughts instead.
My question to the project team is:
Have you thought about the internal structure of a DM’s thoughts and different possible ways of how to express these verbally? What are the reasons that you chose this bullet-list-with-bracket-annotations over some other formats?
(1b) Remark on the bracket-annotation:
When trying to write down some though-annotated dungeon run steps, I noticed myself having to re-evaluate the written thoughts in order to determine whether I should put them in parentheses, or brackets, or both. This evaluation is a separate thought, of course, - which I did not record, or course. But it slows me down and gets me out of the writing flow. Maybe this fades as you get used to writing though-annotations. I actually believe it does at some point. But if it doesn’t, or it does too late: Maybe it’s not worth it to have these bracket? Maybe rather generate 1.5 times the training data and let the model figure out which though belongs to which level? Unless of course, we need the brackets in the output for further research (only intervene on “(” and “{” thoughts, directly change the content of the “[” long term memory, …)
(2) human creativity does not hover in space
Every author draws on the entirety of his experiences when writing stories. These include personal experiences as well as written works (both facts and fiction) that they’ve read. Human DM’s similarly will often think of pre-existing content, both on the object-level and the plot-level. And this content is not included in the dungeon run itself.
I believe that the current AI dungeon model do the same, in some way. Their pool of experience is their training data set.
My question is:
Can we capture references to pre-existing content in such a way the new model will learn from it to explicitly reference their training data?
Or must (and can) we prevent the authors that generate the prized training set of 110 thought-annotated dungeon runs to draw from pre-existing content that is not implicitly available to the current models, too, (that should then be re-trained with the new thought-annotations)?
(3)AI-DM vs AI-player
Currently, when playing an AI dungeon, the human always takes the role of the player and the AI is the DM.
Does it have to be like this? Could we train a model to perform as a player in a dungeon run with a human DM? (Or are there such models already that I don’t know of?)
If yes, maybe we should ask the authors that contribute to this project to provide thought-annotations for the player, as well?
I see 3 advantages here:
This is probably done much fasternow in one go instead of asking for another batch of 110 dungeon runs with thought-annotations for the player inputs in a later stage of the research. Especially when authors team up and take different roles each—so then both authors share the “workload” of generation the thought-annotations.
Thinking far ahead into the future, a smarter-than-human-AI would rather be a player (agent) in a dungeon run (the real world) than the other way round. It thus might especially be fruitful to investigate in how intervention on the thought-level of the player effects the player actions.
Having AIs for both roles let us play them “against” each other. This could speed up the process of generating more training data for even better models (probably with a human reviewing the AI-vs-AI dungeon runs)
I was “archiving” the link to this page and thought I’d see what’s been going on. Updates seem to only be on the discord. Anyway, since they allowed me to post longer thoughts there, figured it would be fine for me to drop it here as well. https://sd-marlow.medium.com/slaying-the-ml-dragon-7ce0a2e4e3a6
From your post, you’re looking at this in much the same way I was when I attempted to do a short run (to work the bugs out and really understand whats involved). However, “actual thoughts of the DM” is the wrong explanation for what they want. The examples of of what they are accepting look to be nothing more than the “common sense” stuff current ML models fail to capture (thus, explicitly stated in the runs). Also, from comments in the discord, it seems like the info captured is post-process, despite the desire for pre-prompt thoughts. Not trying to discourage; just showing my thinking on the process, and that it wasn’t what they wanted.
I find this project very interesting and thought a lot about it in the last 2 weeks. The way I understand the main goal of the project is the following:
providing us (AI researchers) with a model that has an additional output dimension (the “thoughts”)
training the model in such a way that this new dimension is semantically linked directly to the primary output dimension (the “prompt”)
especially linked in some kind of temporal causality (“early” thoughts producing the prompt), not too close the the primary output (so that it contains semantic meaning that cannot be induced by interpreting the prompt alone), but not too far away either (so that it actually “causes” the prompt—as accurately as we can get it. Hence @Eliezer’s authoring technique of one person writing the thoughts and another writing the prompt)
such a model could then be analyzed and experimented with in several ways. One obvious study: intervention on the thought-level and observation of the effect on the prompt level. With the big alignment goal in mind: If we can put safe guards on an AI’s thoughts, before they lead to action, we are safer than if we put guards on only the actions.
I understand that the project does not aim at creating a “true” interpretation of the inner workings of a neural network. (“true” in the sense of a map reflecting the territory in a way that actually helps navigating the world / predicting the behavior of the model / helping to create an AI from scratch without training).
Upon reflecting on this goal, I noticed myself coming back to several points that I identified as the following three somewhat distinct topics:
(1) internal thought structure
I believe that—because it is stated clearly that you want the actual thoughts of a human DM to be recorded—you wish the new model to provide “thoughts” that are similar to how “we would think” (probably so that we can then later interpret these thoughts easier). From this I draw the following conclusion:
We should record the “thoughts” in such a way that the structure of the recording (=the written thoughts = the training data) matches our actual thoughts as close as possible.
When I try to introspect my thoughts when I play the role of a DM in a classical PnP RPG, I find myself not thinking a “bullet list with bracket annotations”, though. Actual thoughts are often non-verbal in nature. Of course we need to press them into English sentences (to be able to process them with our current tech neural networks, and to be able to record them in the first place). But beside that, I think that there is more structure in the typical human DM’s thoughts than this project tries to make use of. The different types of brackets and parentheses capture quite a bit already, but not all of the structure that I believe to observe.
I elaborate:
object level thoughts:
I keep some kind of “world model” alive in my RAM (the [square brackets] try to implement this structure a little)
I update this model “over in-game time”—and not only in reaction to player actions
My world model is not complete. There are explicit and implicit “white spots”. When a player action or a plot twist leads into one of these blank areas, I consciously generate new world data.
The world model especially contains NPCs, which are models of other agents in the world that have their own agenda and may also act on their own.
(plot-level thoughts):
Each RGP starts with a lot of plot ideas in the DM’s mind. This is different for dungeon runs, but in any case, there is this separate “storage” for the current laid-out plot ideas.
There is a constant or at least regular check-up-thought on how far we are in the current plot arc, and whether we are drifting of and I need to either get the player on track or adjust the plot planning.
In these cases, I make a decision (more or less conscious) which is a different kind of thought than both the object-level story-writing (which involves lots of small “creative” decisions) and the meta-level observation/reasoning on the plot/the player.
...
You see, this is just an attempt at grasping the internal thought structure, I’m nowhere near done, and it definitively does not contain a concrete proposal on how to write down the thoughts instead.
My question to the project team is:
Have you thought about the internal structure of a DM’s thoughts and different possible ways of how to express these verbally? What are the reasons that you chose this bullet-list-with-bracket-annotations over some other formats?
(1b) Remark on the bracket-annotation:
When trying to write down some though-annotated dungeon run steps, I noticed myself having to re-evaluate the written thoughts in order to determine whether I should put them in parentheses, or brackets, or both. This evaluation is a separate thought, of course, - which I did not record, or course. But it slows me down and gets me out of the writing flow. Maybe this fades as you get used to writing though-annotations. I actually believe it does at some point. But if it doesn’t, or it does too late: Maybe it’s not worth it to have these bracket? Maybe rather generate 1.5 times the training data and let the model figure out which though belongs to which level? Unless of course, we need the brackets in the output for further research (only intervene on “(” and “{” thoughts, directly change the content of the “[” long term memory, …)
(2) human creativity does not hover in space
Every author draws on the entirety of his experiences when writing stories. These include personal experiences as well as written works (both facts and fiction) that they’ve read. Human DM’s similarly will often think of pre-existing content, both on the object-level and the plot-level. And this content is not included in the dungeon run itself.
I believe that the current AI dungeon model do the same, in some way. Their pool of experience is their training data set.
My question is:
Can we capture references to pre-existing content in such a way the new model will learn from it to explicitly reference their training data?
Or must (and can) we prevent the authors that generate the prized training set of 110 thought-annotated dungeon runs to draw from pre-existing content that is not implicitly available to the current models, too, (that should then be re-trained with the new thought-annotations)?
(3) AI-DM vs AI-player
Currently, when playing an AI dungeon, the human always takes the role of the player and the AI is the DM.
Does it have to be like this?
Could we train a model to perform as a player in a dungeon run with a human DM? (Or are there such models already that I don’t know of?)
If yes, maybe we should ask the authors that contribute to this project to provide thought-annotations for the player, as well?
I see 3 advantages here:
This is probably done much faster now in one go instead of asking for another batch of 110 dungeon runs with thought-annotations for the player inputs in a later stage of the research. Especially when authors team up and take different roles each—so then both authors share the “workload” of generation the thought-annotations.
Thinking far ahead into the future, a smarter-than-human-AI would rather be a player (agent) in a dungeon run (the real world) than the other way round. It thus might especially be fruitful to investigate in how intervention on the thought-level of the player effects the player actions.
Having AIs for both roles let us play them “against” each other. This could speed up the process of generating more training data for even better models (probably with a human reviewing the AI-vs-AI dungeon runs)
I was “archiving” the link to this page and thought I’d see what’s been going on. Updates seem to only be on the discord. Anyway, since they allowed me to post longer thoughts there, figured it would be fine for me to drop it here as well. https://sd-marlow.medium.com/slaying-the-ml-dragon-7ce0a2e4e3a6
From your post, you’re looking at this in much the same way I was when I attempted to do a short run (to work the bugs out and really understand whats involved). However, “actual thoughts of the DM” is the wrong explanation for what they want. The examples of of what they are accepting look to be nothing more than the “common sense” stuff current ML models fail to capture (thus, explicitly stated in the runs). Also, from comments in the discord, it seems like the info captured is post-process, despite the desire for pre-prompt thoughts. Not trying to discourage; just showing my thinking on the process, and that it wasn’t what they wanted.